More

lumost · 2026-06-16T22:03:26 1781647406

I don't think this is really true, there are plenty of engineers/managers who rotate through major tech companies. Many Meta folks will head off to new companies which would pay at similar levels.

lumost · 2026-06-16T13:25:33 1781616333

This is true if you take the ai market as equal to the market for labor discounted to 5-10% penetration.

It’s not a totally unreasonable assertion, it’s the implication of the assertion that we are uncomfortable with. There is no reason for the models to stop their improvements in the near future.

ben_w · 2026-06-16T15:29:59 1781623799

> There is no reason for the models to stop their improvements in the near future.

Sure there is.

1. The cost of each new generation of training runs appears to be rapidly rising

2. The Trump admin just told the leading model to stop making it available to non-Americans, which in practice meant stop providing it at all

3. The factories to make the hardware are hitting bottlenecks, and while they've currently been navigated around, there's never a guarantee the next one will be

Currently I'm wondering at what point the direct impact on the US energy supply gives the US a taste of Baumol's cost disease as AI companies continue to outbid everyone else for electricity.

pixl97 · 2026-06-16T16:31:42 1781627502

There are some counters to this, especially in electricity. We'll see massive expansions of wind and solar in the US because of this. Both the speed of install and low costs will guarantee it.

ben_w · 2026-06-16T17:02:57 1781629377

> We'll see massive expansions of wind and solar in the US because of this. Both the speed of install and low costs will guarantee it.

Implausible while Trump remains in office. He hates renewables, shuts them down even when doing so actively costs money.

Between AI hallucinated content and the politicisation of the numbers, I'm not sure how much AI compute capacity is being planned right now; would you accept a claim of 300 GW? It's a number I heard recently.

Given the capacity factor of PV, even China would have to think carefully before supplying that much PV over the next few years (300 GW avg ~= 3TW nameplate).

(Not sure about wind, wind's CF seems to vary between years).

pixl97 · 2026-06-17T02:46:05 1781664365

Unless it's something on federal land businesses are pretty much ignoring Trump on renewables.

And 300GW power planned doesn't seem too far out of bounds, there are a huge number of 'planned' data centers all over the US.

World wide over 800GW of solar and wind was installed in 2025 and 2026 numbers should be over 1TW of renewables. How much of that will the US install itself is a much smaller percent, but as power prices increase the pace to profit off of it will quicken. I know China installed over 300GW themselves last year.

ben_w · 2026-06-17T07:18:09 1781680689

> Unless it's something on federal land businesses are pretty much ignoring Trump on renewables.

Except for all of the tariffs etc.

And that's without Trump seeming to be actively choosing winners based on favour to him, as with supporting Grok despite the data centre pollution in another thread and *possibly* (I don't wish to overstate my case) the ban on Fable.

> World wide over 800GW of solar and wind was installed in 2025 and 2026 numbers should be over 1TW of renewables. How much of that will the US install itself is a much smaller percent, but as power prices increase the pace to profit off of it will quicken. I know China installed over 300GW themselves last year.

This is why I wrote:

  Given the capacity factor of PV, even China would have to think carefully before supplying that much PV over the next few years (300 GW avg ~= 3TW nameplate).

When the recent good news is that "the world installs 800 GW of PV" (TBH, I thought this was closer to the last 12 months of just PV than the sum PV+wind), that's the nameplate capacity, not the actual year-long-average output, which is about a tenth of that.

The most recent PV capacity factor number on Wikipedia was 13%, which would make "800 GW" only 104 GW in real output; the figures I see for wind are that the CF is 25% (with much higher variability) but the nameplate capacity is lower, so they're pretty close as totals in real total currently installed output.

TrackerFF · 2026-06-16T22:26:54 1781648814

As long as Chinese companies keep pushing on, so must US companies too.

It would not surprise me at all if we suddenly start seeing top US AI companies lobby against Chinese models, or even the gov. making it illegal to use Chinese AI models.

But in this day and age, I just don't think it is possible. A distant third option would be that the big AI companies try to make hardware so expensive that people simply can't run their own models, while blocking access to foreign models.

sph · 2026-06-16T15:48:36 1781624916

> There is no reason for the models to stop their improvements in the near future.

You speak as if "improvements to models" is just function of time, and resources are infinite.

Models keep improving as long as there are resources to allow for larger and larger datacenters, if we hit a scientific breakthrough once LLM technology become the bottleneck, if the economy is infinite to allow infinite growth, and (geo)politics is not a thing to worry about. Or we discover ASI, machine improve themselves and we reach the technological singularity.

I know everybody is drinking the kool aid by the gallon, but can we maintain a little bit of objectivity?

AceJohnny2 · 2026-06-16T16:06:42 1781626002

yeah, it's funny how so many think the beginning of the S-curve is an exponential.

Granted, we don't know when the S-curve will inflect, but predicting too great an outcome is just as silly as discounting it altogether.

lumost · 2026-06-16T22:59:02 1781650742

The s curve won’t inflect until it becomes difficult to allocate additional resources due to economic limitations. There is no sign that training a model on 10x the compute won’t lead to at least an equivalent improvement as the last order of magnitude increase.

If we define the Pareto frontier’s input in terms of a magic “compute equivalent unit”. We get a free order of magnitude from nvidia hardware improvements every 2-3 years. We get another order of magnitude from capital expenditure every 6-12 months. Kernel improvements to the models themselves likely yield an order of magnitude gain at some periodicity.

lumost · 2026-06-16T02:20:48 1781576448

The abstraction of capital and money get a bit funny when wealth is sufficiently concentrated. If there is a monopsomy (one buyer), then they can largely dictate the price of anything. If they also control violent coercion via a captured state or other means, then they can compel production at that price point.

The idea of capitalism only really makes sense when wealth is reasonably distributed such that there is still reasonable competition in both the marketplace and control of the state.

lumost · 2026-06-16T01:36:46 1781573806

its worse at code compared to qwen 3.6 coder.

stymaar · 2026-06-16T06:00:22 1781589622

How can it be worse than something that doesn't exist?

amunozo · 2026-06-16T11:29:56 1781609396

Sometimes non-existing is better than existing for unnecessary or harmful things. I know that is not what you mean but I just found it relevant in the age in which making new stuff is so fast and easy due to LLMs. Main enshitification would come, imo, not from bad things but for unnecessary things that nobody asked for.

lumost · 2026-06-15T21:36:25 1781559385

This just looks like a capex problem. There is no evidence that Anthropic has secret sauce above and beyond access to capital. If there is secret sauce, it's unclear that it changes the required amount of capital by all that much.

China will spend all of the money required to catch up, Google and OpenAI will both spend money to catch up as well. NVidia and others will not allow a frontier lab to become the AI bottleneck.

lumost · 2026-06-14T21:32:26 1781472746

bear in mind, elon musk now has the wealth of 10k "100 millionaires" - we truly lack comprehension of how wealthy the wealthy have gotten.

lumost · 2026-06-14T19:15:18 1781464518

I really think the environmental movements were a red herring. It was always impossible to make a meaningful dent in your personal emissions while still existing in your location. There was never any reduction proposal which could mitigate this.

Government mandates for e.g. large nuclear construction, geo-engineering, BEV adoption, or other similar proposals would have had an impact. These all exposed the real tradeoffs which would need to be accepted of cost, hardship, or whatever the opposition to nuclear was.

The environmental movements of the last 60 years focused on impossible goals which were easy to rally behind.

JeremyNT · 2026-06-14T19:23:41 1781465021

> The environmental movements of the last 60 years focused on impossible goals which were easy to rally behind.

Is this true? Americans elect leaders who won't even acknowledge the issue is real. We have at times managed to gather some momentum towards using government to address the issues through incentives and regulation - even as recent as the Biden administration - then reactionaries gain power and dismantle the efforts.

The "environmental movement" was not focused on the personal responsibility angle, it's just that the American political system rejected proposals to do any meaningful government interventions because long term thinking is never rewarded.

lumost · 2026-06-13T22:08:37 1781388517

This is either a complete own goal by Amazon… a play to consolidate compute/model access.

Will Chinese models be allowed on the market… at all? Will startups be banned from training models of equivalent capacity?

gopher_space · 2026-06-13T23:43:42 1781394222

At this point would I be outsourcing my knowledge work or would I be entering self-exile?

lumost · 2026-06-12T21:18:43 1781299123

The KV cache is order dependent and dependent on the context of tokens which exist before the KV cache.

There are some transformation approaches to re-use the kv cache across inferences, but none are in wide use due to accuracy concerns following the transformation.

Eridrus · 2026-06-12T21:33:43 1781300023

The paper has a section on "Reusing precomputed KV across queries" which talks about how other papers have tried to address this problem, but yeah, this paper adds nothing on its own besides a catchy title.

xg15 · 2026-06-12T23:18:18 1781306298

Isn't it also, most fundamentally, dependent on the model weights?

My understanding was that what the KV cache stores is nothing else than the "activations" of the W_k and W_v matrices of an attention module for a given input sequence.

So I don't quite understand how this is supposed to work:

> Let a publisher precompute a document's KV cache, and let every other agent buy the right to load it and skip prefill.

Should a publisher precompute the cache for every popular model that is out there?

xg15 · 2026-06-13T10:04:24 1781345064

...not to mention, which KV cache? Every attention module has its own, and how many attention modules there are, what inputs they get, how many internal features and attention heads they have, etc, all depends on the architecture of the specific model.

TZubiri · 2026-06-12T22:47:34 1781304454

Absolute slop paper. Replace document with text and you'll get it.

"People are asking the same questions and an answer is generated every time, what if we could like cache the questions and their answers..."

Sounds like someone was using chatgpt to understand how chatgpt works and then asked it to generate a paper based on his proposal to improve it.

amelius · 2026-06-12T23:11:14 1781305874

At least it wasn't a patent.

dgellow · 2026-06-12T21:27:03 1781299623

Just curious, do you have links to read more about transformations or other techniques for KV cache reuse?

evrydayhustling · 2026-06-12T21:33:39 1781300019

All major model providers offer prefix caching, which is this.

lumost · 2026-06-12T22:06:24 1781301984

No, reusing segments of the kv cache for different purposes in an order independent manner is an active research area.

dgellow · 2026-06-12T22:16:37 1781302597

Any keyword or paper I can search for?

dvmazur · 2026-06-12T22:44:48 1781304288

AsyncResoning[1] does a trick of that sort to give agents concurrent cache views.

You basically have two agents look at the same cache under different views. Say agent_0 gets [a_1, a_0] and agent_1 gets [a_0, a_1]. They also write to this cache concurrently while decoding. To solve positional embedding inconsistencies they rotate the query projections for each block (a_0 and a_1) separately.

The computations you get that way do not exactly match the setup where you would naively prefill on every step, but are close enough.

Same trick could be used for the setup discussed here, I guess: prefill the document cache separately (p), prepend the system prompt (s) and get a cache view [s, p] from which you can then decode.

1. https://arxiv.org/abs/2512.10931

kolinko · 2026-06-13T05:25:02 1781328302

But this would work only for first layer, or am I missing something?

lumost · 2026-06-09T13:53:12 1781013192

Is this the inevitable outcome of frontier labs who own their hardware? the GPUs and datacenters are the major cost. The inference and training a higher tier value proposition, if the company gets nervous that the investment in hardware won't pay off - renting it becomes a major topic of conversation.

A frontier model team having to fight their board on whether to monetize the datacenters directly or continue to invest in AI work is going to have a hard time.