Language models will get better in these ways

08 Jan, 2025

Simon Willison recently wrote a fantastic post titled Things we learned about LLMs in 2024. If you haven’t read it yet, I highly recommend diving in. It’s obvious that the latest developments in language models will continue to be a major story in technology and society moving forward.

Here are some assumptions I have about where language models are headed in 2025. This is a foundation for thinking through how to build products in the near-future.

Models provide better results

Reasoning is all the rage lately, especially because OpenAI ended their 12 Days of OpenAI by previewing o3 and sparking questions of whether or not we’ve reached AGI. Maybe we should define AGI as the point at which AI eats us, so we can stop debating it.

The AI community is at odds as to whether we’ve achieved the most significant gains in result quality via pre-training, and if further gains must come post-training. I think regardless of where the improvements come from, smarter models are inevitable.

Context windows will grow

Google’s Gemini 1.5 Pro model has a 2-million-token context window, the largest of all the major model providers. OpenAI offers 128k on their $200/mo Pro plan and Enterprise plan. Anthropic offers 200k for Claude Pro and 500k for Claude Enterprise.

As developers and users discover more use cases that require larger and larger context windows, the foundation model providers will be incentivized to further increase them. Google seems hellbent upon catching up to OpenAI in overall usage, so I expect them to pressure both OpenAI and Anthropic to increase their context windows given their already sizable lead on this dimension.

LLM costs keep coming down

Inference costs coming down has been a sure thing in the LLM ecosystem for the past few years.

A16Z calls this LLMflation:

…the price decline in LLMs is even faster than that of compute cost during the PC revolution or bandwidth during the dotcom boom: For an LLM of equivalent performance, the cost is decreasing > by 10x every year.

That post goes on to question if the rate at which inference costs become cheaper will decrease, and that remains to be seen. But if you’re building something that is expensive today because of per-token prices, it will be cheaper tomorrow. This includes if your use case requires a reasoning model, which are much more expensive than non-reasoning models.

Smaller models become more useful

As gains from pre-training slow down and frontier model providers can’t justify the cost of pre-training bigger (and are constrained by available data), I assume that there will be more resources devoted to delivering higher-quality small models.

Small models targeted towards specific use cases are more efficient than using an LLM for everything. They can also run locally on-devices, which is appealing to folks interested in building hardware that harnesses the power of language models. Meta specifically has 1B and 3B versions of its Llama 3.2 models that it advertises “for on device applications such as summarizing a discussion from your phone or calling on-device tools like calendar.”

To recap, we have a lot to look forward to.

Follow me on Twitter

#technology