I've seen this reply to Simon's benchmark for 2 years running now, and yet you s...

sarreph · 2026-06-09T17:26:08 1781025968

I had intended to caveat that: I'm sure I'm not the first person to ask about this!

> you still see improvements

This is expected if they are training their models on it, right?

> objectively-bad results

Keen to learn when this has been the case, i.e. across version increments in major models.

simonw · 2026-06-09T17:29:15 1781026155

I've written about this a couple of times, most notably here: https://simonwillison.net/2025/Nov/13/training-for-pelicans-...

I've been enjoying seeing how the quality of individual models differ based on the amount of reasoning effort you give them. If they were baking an a good pelican you wouldn't expect them to differ so much.

(Google Gemini are the only lab that have very clearly paid attention to the quality of SVG animals-riding-vehicles, see their announcement for Gemini 3.1: https://twitter.com/JeffDean/status/2024525132266688757 )

sarreph · 2026-06-09T17:31:18 1781026278

Amazing, thank you Simon! Look forward to reading.

mrandish · 2026-06-10T06:32:36 1781073156

Hence it has become a meta-benchmark of relative progress in SVG image generation of a known target which has leaked into the training data and for which "every frontier AI team has/had a person at least partially dedicated to" at least checking if not optimizing.

llm_nerd · 2026-06-09T17:32:26 1781026346

I honestly assumed their comment was tongue in cheek humour, because positively no one actually cares how these models generate an SVG pelican riding a bicycle. It's some meme thing that this stuff always appears here.

BrokenCogs · 2026-06-09T17:38:54 1781026734

Yeah this is not a real benchmark, it's just a fun tradition everytime a new model is released

pelipost123 · 2026-06-09T17:47:39 1781027259

"fun" / boringly predictable meme thread with 30+ replies already

brazukadev · 2026-06-09T19:43:26 1781034206

It is telling that people need to create throwaway accounts to criticize simonw's behavior in this website.