Spreadsheet Arena Benchmarks LLMs on Real Ex…

meridian.ai

ksl

|

6d ago

Meridian partnered with Cornell, CMU, and Scale AI to launch Spreadsheet Arena, where users submit prompts and blindly compare LLM-generated spreadsheets head to head. The most revealing finding isn’t about formulas – formatting and visual structure drive user preference more than lookup functions or conditionals. Claude-based models scored higher on numerical accuracy but lost on polish, while weaker models failed basic prompt compliance entirely. Finance professionals and crowd voters agreed only about half the time, especially around color-coding conventions. The benchmark fills a gap that coding-focused evaluations like SWE-bench never touched: most knowledge workers interact with AI through spreadsheets, not terminals, and until now there was no structured way to measure that.

Source link

What's Hot

One Research Report Tanked Software Stocks i…

PM chairs 1st Cabinet meeting at Seva Teerth, pushes for reforms | India News

Sanjay Leela Bhansali: Ismail Darbar opens up on fallout with Sanjay Leela Bhansali, says they had a good bond till ‘Heeramandi’: ‘He betrayed me twice’ |

T20 World Cup | Brook’s special knock guides England into the semifinals

Cricket fan travels from U.K. to Hubballi for Ranji Trophy final

Ranji Trophy final: Pundir, Yawer help J & K take opening day’s honours

Sunil Joshi Pavilion unveiled at Hubbali stadium

T20 World Cup IND vs ZIM | Samson’s long net session a sign of top-order rejig for India?

Spreadsheet Arena Benchmarks LLMs on Real Ex…

One Research Report Tanked Software Stocks i…

What Management Styles Teach About Agent Orc…

Developers Push Back on Claude Code’s Hidden…

Lovable Dropped Subscription-Only Pricing an…

Meta Patents AI to Run Dead Users’ Accounts

Vatican Deploys AI Live Translation for Mass

News

Company

Services

What's Hot

Spreadsheet Arena Benchmarks LLMs on Real Ex…

Keep Reading

News

Company

Services

Subscribe to Updates