Changing the Edit Format Improved 15 LLMs at…

blog.can.ac

ksl

|

Feb 16, 2026

Can Bölük ran a single afternoon experiment that moved coding benchmark scores dramatically – Grok Code Fast 1 jumped from 6.7% to 68.3% – without touching any model weights. The trick was replacing the edit tool itself. His “hashline” format tags each line with a short content hash, so models reference anchors instead of reproducing exact text. Patch-based formats, it turns out, are the worst performer for nearly every model tested. Fifteen different LLMs improved, some cutting output tokens by over 60%. The finding lands squarely in a growing conversation around how much of what we attribute to model capability is actually harness and scaffolding design. Vendors restricting open-source harness experimentation may be leaving the easiest gains on the table.

Source link

What's Hot

Claude Code Remote Control Bridges Desktop a…

Jharkhand, Odisha judicial officers to be drafted to speed up Bengal SIR | India News

After Rs 50 lakh reward from Bihar CM, 14-year-old Vaibhav Sooryavanshi flaunts new Rs 22 lakh car | Cricket News

T20 World Cup | Brook’s special knock guides England into the semifinals

Cricket fan travels from U.K. to Hubballi for Ranji Trophy final

Ranji Trophy final: Pundir, Yawer help J & K take opening day’s honours

Sunil Joshi Pavilion unveiled at Hubbali stadium

T20 World Cup IND vs ZIM | Samson’s long net session a sign of top-order rejig for India?

Changing the Edit Format Improved 15 LLMs at…

Claude Code Remote Control Bridges Desktop a…

Why One Startup Banned Morning Coding for En…

Product teams must shift to AI orchestration

Bessemer’s AI Pricing Playbook Flags a 2026 …

Why AI Agent Security Fails Beyond OpenClaw

spaCy Creator on Why AI Progress Won’t Plateau

News

Company

Services

What's Hot

Changing the Edit Format Improved 15 LLMs at…

Keep Reading

News

Company

Services

Subscribe to Updates