You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This page contains statistical tables and resources from our comprehensive survey on Issue Resolution in Software Engineering.
Evaluation & Training Datasets
A comprehensive survey and statistical overview of issue resolution datasets. We categorize these datasets based on programming language, modality support, source repositories, data scale (Amount), and the availability of reproducible execution environments.
A survey of trajectory datasets used for agent training or analysis. We list the programming language, number of source repositories, and total trajectories for each dataset.
Dataset
Language
Repos
Amount
Link
SWE-Fixer
Python
856
69,752
SWE-rebench
Python
1,823
67,074
SWE-rebench V2
Python, JS, TS, Go, Rust, C, C++, Java, etc.
3,600+
32,000+
-
Multi-SWE-RL
Python
-
4,723
-
daVinci-Dev
Python
-
-
OpenSWE
Python
12,800
45,320
R2E-Gym
Python
10
3,321
SWE-Synth
Python
11
3,018
SWE-Factory
Python
10
2,809
SWE-Gym
Python
11
491
SWE-Lego
Python
3251
14.6k
Scale-SWE
Python
5,200
100,000
SWE-Next
Python
3,971
2,308
SWE-Universe
Python
-
807,693
-
SFT-based Methods
Overview of SFT-based methods for issue resolution. This table categorizes models by their base architecture and training scaffold (Sorted by Performance).
Model Name
Base Model
Size
Arch.
Training Scaffold
Res.(%)
Code
Data
Model
SWE-rebench-openhands-Qwen3-235B-A22B
Qwen3-235B-A22B
235B-A22B
MoE
OpenHands
59.9
-
SWE-Lego-Qwen3-32B
Qwen3-32B
32B
Dense
OpenHands
57.6
CGM-SWE-PY
Qwen2.5-Coder-72B
72B
Dense
Graph RAG
50.4
-
SWE-rebench-openhands-Qwen3-30B-A3B
Qwen3-30B-A3B
30B-A3B
MoE
OpenHands
49.7
-
Devstral
Mistral Small 3
22B
Dense
OpenHands
46.8
-
-
Co-PatcheR
Qwen2.5-Coder-14B
3$\times$14B
Dense
PatchPilot-mini
46.0
-
SWE-Swiss-32B
Qwen2.5-32B-Instruct
32B
Dense
Agentless
45.0
SWE-Lego-Qwen3-8B
Qwen3-8B
8B
Dense
OpenHands
44.4
Lingma SWE-GPT
Qwen2.5-72B-Instruct
72B
Dense
SWESynInfer
30.2
-
-
SWE-Gym-Qwen-32B
Qwen2.5-Coder-32B
32B
Dense
OpenHands, MoatlessTools
20.6
-
SWE-Gym-Qwen-14B
Qwen2.5-Coder-14B
14B
Dense
OpenHands, MoatlessTools
16.4
-
SWE-Gym-Qwen-7B
Qwen2.5-Coder-7B
7B
Dense
OpenHands, MoatlessTools
10.6
-
RL-based Methods
A comprehensive overview of specialized models for issue resolution, categorized by parameter size. The table details each model's base architecture, the training scaffold used for rollout, the type of reward signal employed (Outcome vs. Process), and their performance results (Res. %) on issue resolution benchmarks.
Model Name
Base Model
Size
Arch.
Train. Scaffold
Reward
Res.(%)
Code
Data
Model
560B Models (MoE)
LongCat-Flash-Think
LongCatFlash-Base
560B-A27B
MoE
R2E-Gym
Outcome
60.4
-
72B Models
Kimi-Dev
Qwen 2.5-72B-Base
72B
Dense
BugFixer + TestWriter
Outcome
60.4
-
SWE-RL
Llama-3.3-70B-Instruct
70B
Dense
Agentless-mini
Outcome
41.0
-
-
Multi-turn RL(Nebius)
Qwen2.5-72B-Instruct
72B
Dense
SWE-agent
Outcome
39.0
-
-
-
Agent-RLVR-RM-72B
Qwen2.5-Coder-72B
72B
Dense
Localization + Repair
Outcome
27.8
-
-
-
Agent-RLVR-72B
Qwen2.5-Coder-72B
72B
Dense
Localization + Repair
Outcome
22.4
-
-
-
32B Models
OpenHands Critic
Qwen2.5-Coder-32B
32B
Dense
SWE-Gym
-
66.4
-
KAT-Dev-32B
Qwen3-32B
32B
Dense
-
-
62.4
-
-
SWE-Swiss-32B
Qwen2.5-32B-Instruct
32B
Dense
-
Outcome
60.2
FoldAgent
Seed-OSS-36B-Instruct
36B
Dense
FoldAgent
Process
58.0
-
-
SeamlessFlow-32B
Qwen3-32B
32B
Dense
SWE-agent
Outcome
45.8
-
-
DeepSWE
Qwen3-32B
32B
Dense
R2E-Gym
Outcome
42.2
SA-SWE-32B
-
32B
Dense
SkyRL-Agent
-
39.4
-
-
-
OpenHands LM v0.1
Qwen2.5-Coder-32B
32B
Dense
SWE-Gym
-
37.2
-
SWE-Dev-32B
Qwen2.5-Coder-32B
32B
Dense
OpenHands
Outcome
36.6
-
Satori-SWE
Qwen2.5-Coder-32B
32B
Dense
Retriever + Code editor
Outcome
35.8
SoRFT-32B
Qwen2.5-Coder-32B
32B
Dense
Agentless
Outcome
30.8
-
-
-
Agent-RLVR-32B
Qwen2.5-Coder-32B
32B
Dense
Localization + Repair
Outcome
21.6
-
-
-
14B Models
Agent-RLVR-14B
Qwen2.5-Coder-14B
14B
Dense
Localization + Repair
Outcome
18.0
-
-
-
SEAlign-14B
Qwen2.5-Coder-14B
14B
Dense
OpenHands
Process
17.7
-
-
-
7-8B Models
SeamlessFlow-8B
Qwen3-8B
8B
Dense
SWE-agent
Outcome
27.4
-
-
SWE-Dev-7B
Qwen2.5-Coder-7B
7B
Dense
OpenHands
Outcome
23.4
-
SoRFT-7B
Qwen2.5-Coder-7B
7B
Dense
Agentless
Outcome
21.4
-
-
-
SWE-Dev-8B
Llama-3.1-8B
8B
Dense
OpenHands
Outcome
18.0
-
SEAlign-7B
Qwen2.5-Coder-7B
7B
Dense
OpenHands
Process
15.0
-
-
-
SWE-Dev-9B
GLM-4-9B
9B
Dense
OpenHands
Outcome
13.6
-
General Foundation Models
Overview of general foundation models evaluated on issue resolution. The table details the specific inference scaffolds (e.g., OpenHands, Agentless) employed during the evaluation process to achieve the reported results.