More

vorticalbox · 2026-06-26T19:31:34 1782502294

You want to compact early though as sending the whole chat you will end up with a lot of tokens not in the cache which 1. Costs way more and 2. Will slow the request down as it has to process it all.

vorticalbox · 2026-06-24T13:34:33 1782308073

i jump about a lot, for coding gemini and grok are definitely not as strong as gpt 5.5/opus/sonnet/composer.

composer 2.5 is actually very good and use it for a good chunk of tasks.

vorticalbox · 2026-06-22T16:32:46 1782145966

There are actually fine tunes of qwen on opus “thinking” tokens that teach it to think like opus does.

https://huggingface.co/Jackrong/Qwen3.5-27B-Claude-4.6-Opus-...

ACCount37 · 2026-06-22T18:18:27 1782152307

And those are "amateur hour" distillations that don't have the scale of actual Chinese labs.

vorticalbox · 2026-06-20T12:19:55 1781957995

even if they did it it wouldn't be of much use because correct or not the output was the likely output 100% of the time.

vorticalbox · 2026-06-19T21:38:00 1781905080

Well one can no longer search for information in the big search engines without it just giving you the answer.

This ruins “search and topic and write about it”

setopt · 2026-06-19T22:11:05 1781907065

I guess Norwegian schools will have to use smaller / alternative search engines now?

throwaway2037 · 2026-06-20T03:47:58 1781927278

Or they can disable "AI" from the search results to only get "traditional" search results.

pesus · 2026-06-19T23:20:53 1781911253

Or books.

estetlinus · 2026-06-20T05:09:14 1781932154

Yes, please.

vorticalbox · 2026-06-19T17:26:36 1781889996

> This happened after he received CPR for several hours at a cabin on the Hardangervidda plateau before rescue arrived.

This means although his heart wasn’t technically beating, he did have blood being circulated via cpr.

When I read the title I assumed he was alone before rescuing.

vorticalbox · 2026-06-18T08:12:46 1781770366

I find opus for planning and sonnet for coding but codex for code review.

vorticalbox · 2026-06-17T21:29:22 1781731762

security by only obscurity is bad. Having both is better.

For example say I have a hollowed out wall that is hidden behind a painting.

Just putting my money in the hole is bad once it’s found it’s gone but if I put my money in a safe in the hole. Well now you need to find it and break the safe and a hidden safe is objectively better than just having a safe on the floor because you need to find it first.

dessimus · 2026-06-18T06:42:51 1781764971

Sure, if there's many paintings scattered around the house of various sizes, but if there's only one painting, in your office, behind the desk, mounted to cover a safe at standing height, then you might as well hang a neon sign saying "Look Here!" next to it.

vorticalbox · 2026-06-17T11:00:32 1781694032

This is a problem I find with opus is will spend so long thinking then going “but wait what if”

To point where I stop it and simple tell it to “start writing code you can work it out as you go along”

Seems writers block also effects LLM

robertkarl · 2026-06-17T14:11:35 1781705495

https://arxiv.org/abs/2606.00206

In this paper they nerf an LLMs ability to emit waffling thinking tokens like "wait", "but", "alternatively", and the models (they're old, small models in the paper) terminate reasoning faster and perform better. I bet Anthropic is tuning this on their backend.

addandsubtract · 2026-06-18T02:33:16 1781749996

Didn't they originally introduce those tokens to make the models smarter by second guessing their "thoughts"?

meatmanek · 2026-06-17T17:24:16 1781717056

This is super cool. Do you know if any of the inference backends (llama.cpp, vllm, etc) support this technique?

iaw · 2026-06-17T22:12:25 1781734345

vLLM supports "banning" certain tokens but I don't know if it can dynamically reduce them.

To my knowledge you can also "ban" with llama.cpp but it is passed in the API call rather than to the server at initialization.

orbital-decay · 2026-06-17T22:11:50 1781734310

I imagine Anthropic would rather train a small control model instead of resorting to sampling hacks

giancarlostoro · 2026-06-17T12:26:00 1781699160

I usually have Claude build a plan first, then I put it into an XML file it updates with phases, usually we talk about some of those tasks, and then once its good and I like it, I have Claude implement the plan.

Another thing I tell Claude to do is to not guess, but look at documentation, it messes up a lot less, might use some tokens reading docs, but at least it has a higher success rate code wise.

xstas1 · 2026-06-17T12:56:15 1781700975

XML??

giancarlostoro · 2026-06-17T13:08:10 1781701690

Apparently because of how Claude is trained, even the system level prompts go through as XML, it works better with XML "prompting" so I figured I could have it write plans in XML. I need to update my ticketing tool to output XML maybe by default.

https://www.reddit.com/r/ClaudeAI/comments/1psxuv7/anthropic...

saltsucker · 2026-06-17T13:48:49 1781704129

Comments later in thread say markdown works just as fine and that it’s more important to organize your plan into sections.

Also just think about it, why would a model trained on the world’s corpus of text (that isnt formatted in xml) perform better with XML? It would be a better study if that post tested markdown, org, xml, json, etc. 10 times to see if their is a difference

swingboy · 2026-06-17T15:53:09 1781711589

Anthropic’s best practices still include the use of XML: https://platform.claude.com/docs/en/build-with-claude/prompt...

adastra22 · 2026-06-17T15:08:56 1781708936

A year or so ago XML worked more reliably for long-lived prompt instructions. Now it is cargo culting.

orbital-decay · 2026-06-17T22:15:03 1781734503

XML consistently performed better than markdown and JSON in all evals I've ever seen on any model, except for a couple very specific ones.

root-parent · 2026-06-17T13:33:47 1781703227

XML stands for Xtra ML....

noworriesnate · 2026-06-17T15:23:07 1781709787

I'd like to switch to a sales career--can you give me any pointers?

aesthesia · 2026-06-17T20:59:57 1781729997

One reason to use XML-like formatting is that it makes the beginning and end of sections explicit. This is less of an issue when the model is generating text but can still be helpful when using templated prompts.

mikeocool · 2026-06-17T11:51:57 1781697117

Seriously. Whenever I read the thinking output I get mad and turn down effort to medium or low.

Just output the code and we’ll work through it!

I feel similarly about having codex review claude’s plans. I don’t think I’ve ever seen it catch a major issue. It just points out things that would have inevitably been addressed during implementation anyway.

SubiculumCode · 2026-06-17T16:18:59 1781713139

A lot of times this is how humans work. Just start 'putting words on paper', 'think by doing', etc. sometimes it's more efficient to see why something won't work after writing a bit of it, and sometimes you get lucky and it works right off the bat

epolanski · 2026-06-17T11:14:23 1781694863

Fable was 20 times worse on that.

It's clear it was the vibe coding model, as like no other model before, fully turned you into his assistant instead of the other way around.

RyanHamilton · 2026-06-17T11:56:54 1781697414

Could it be possible, these firms are optimizing for two things: a) Better performance. b) Gathering data from you to further improve performance later. I've also found the huge amount of planning rather than iteration frustrating. I've felt like I'm teaching a junior!

epolanski · 2026-06-17T12:03:09 1781697789

I think they simply optimize around E2E benchmarks, none of those benchmarks is designed as multi turn assistance to the user, but going from a prompt straight to the final solution.

celrod · 2026-06-17T18:11:26 1781719886

Exactly. How can "we" develop and encourage benchmarks for multi-turn user assistance? That is what I want. I feel like the models and harnesses push much too hard against this workflow -- that they push you towards letting go and vibe coding, with only your discipline (and desire for a quality and maintainable product) holding it back.

happyPersonR · 2026-06-17T13:54:51 1781704491

more thinking == more tokens === more money LOLL

overfeed · 2026-06-17T15:53:49 1781711629

Os there a cost benchmark out there? I wonder how frontier models are doing over time for cost per problem solved.

drob518 · 2026-06-17T16:20:08 1781713208

I think they are optimizing for one-shot performance because that will drive usage. They can’t afford to look bad in the benchmarks. And if that means consuming an order of magnitude more tokens, well, that’s good for business, too.

drob518 · 2026-06-17T16:16:51 1781713011

Qwen is notorious for this, too. It’ll sometimes spin in a long loop of “But wait…” paragraphs.

thinkingtoilet · 2026-06-17T11:56:57 1781697417

I've been having success with Opus but you REALLY have to tame it. Long prompts that list what files to look at, relationships between entities, etc... I went from regularly hitting my daily limit to almost never hitting it. Oh, and also I was being lazy with small changes and stopping that helped a lot too. As you said, it gets in these loops where it's just churning and if you don't stop it it can go on for way too long.

vorticalbox · 2026-06-16T12:58:03 1781614683

Anthropic are already paying $15 b to space X for compute.

CodesInChaos · 2026-06-16T13:05:27 1781615127

Buying depreciating nvidia hardware and renting it out to competitors isn't why SpaceX has a trillion dollar valuation.

vorticalbox · 2026-06-16T13:11:48 1781615508

true, can't hurt though.