Skip to content

Commit 48c5b3c

Browse files
committed
Added staging
1 parent e788a35 commit 48c5b3c

File tree

2 files changed

+6
-11
lines changed

2 files changed

+6
-11
lines changed

Gemfile.lock

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -70,14 +70,10 @@ GEM
7070
rb-fsevent (~> 0.10, >= 0.10.3)
7171
rb-inotify (~> 0.9, >= 0.9.10)
7272
mercenary (0.4.0)
73+
mini_portile2 (2.8.8)
7374
namae (1.1.1)
74-
nokogiri (1.15.1-arm64-darwin)
75-
racc (~> 1.4)
76-
nokogiri (1.15.1-x64-mingw-ucrt)
77-
racc (~> 1.4)
78-
nokogiri (1.15.1-x86_64-darwin)
79-
racc (~> 1.4)
80-
nokogiri (1.15.1-x86_64-linux)
75+
nokogiri (1.15.1)
76+
mini_portile2 (~> 2.8.2)
8177
racc (~> 1.4)
8278
pathutil (0.16.2)
8379
forwardable-extended (~> 2.6)
@@ -105,6 +101,7 @@ GEM
105101
PLATFORMS
106102
arm64-darwin-21
107103
arm64-darwin-23
104+
arm64-darwin-24
108105
x64-mingw-ucrt
109106
x86_64-darwin-19
110107
x86_64-darwin-20

_blogs/codemonkeys.md

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -146,13 +146,11 @@ materials:
146146

147147
<section id="results">
148148
<h2>Results by task</h2>
149-
<p>We evaluate each component of our system on SWE-bench Verified:</p>
150-
151149
<img src="/imgs/blog/codemonkeys/results_by_stage.png" alt="" style="width: 100%; height: auto;">
152150
153151
<div class="component-results">
154152
<h3>Context</h3>
155-
<p>With the 128k token limit that we use for later experiments, 92.6% of instances have the correct files in context. .</p>
153+
<p>With the 128k token limit, 92.6% of instances have the correct files in context.</p>
156154
157155
<h3>Generation</h3>
158156
<p>By running multiple state machines in parallel and allowing each to iterate multiple times, we achieve 69.8% coverage. This means that for about 70% of problems, at least one of our candidate solutions is correct. Interestingly, we found that different ways of distributing compute between parallel scaling (more state machines) and serial scaling (more iterations per machine) often lead to similar coverage values.</p>
@@ -163,7 +161,7 @@ materials:
163161
</section>
164162

165163
<section id="costs">
166-
<h3>Cost Analysis</h3>
164+
<h2>Cost Analysis</h2>
167165

168166
<table class="cost-table">
169167
<thead>

0 commit comments

Comments
 (0)