It's finally out!!! @METR_Evals found that more than half of SWEBench results is unmergeable slop. FrontierCode represents over 1000+ hours of maintainer validated software engine

#24X / BuilderExperimental未读

It's finally out!!! @METR_Evals found that more than half of SWEBench results is unmergeable slop. FrontierCode represents over 1000+ hours of maintainer validated software engine

来源:@swyx / x ·

暂无摘要,建议先打开原文快速判断。

推荐理由:推荐理由待生成,可根据标题、标签和来源先判断优先级。

X / Builder

打开原文