On the Latest Batch of AI Models
== *** ==

On the Latest Batch of AI Models

Est. 7m read

In Q4 2025, Anthropic released Opus 4.5, a generative AI model that was relatively expensive to use but noticeably better than anything that came before it. It’s so good that it’s changing the way people think about programming. In this post I want to write about how the current crop of AI models is changing the software landscape, provide concrete comparisons to past models, and lastly the personal impact it has had on me.

Today, Code is Cheap

Before generative AI, creating any kind of software had a sizable barrier to entry. The closest alternative for learning to code was “No-code” or “Low-code” platforms which involves the use of predefined building blocks to create applications. Often, these solutions would still require programming knowledge to fill in the gaps due to their inherit limitations, and the good ones weren’t free.

Screenshot of Retool
Retool in 2019

Low-code solutions like Retool were primarily targeting business users who needed to visualize data or to create simple forms.

When generative AI came onto the scene it immediately eroded the barrier to creating software and among the first impacted were these low-code platforms and it forced them all to adapt or die. Many of the early products built around generative AI were themselves low-code platforms that allowed users to create (typically) web applications with nothing more than a single prompt.

In the early days of generative AI, it was apparent that these solutions were far from perfect and still required a lot of hand-holding, but in hindsight, the writing was on the wall…

“Spotify says its best developers haven’t written a line of code since December, thanks to AI”

“The creator of Clawd: ‘I ship code I don’t read’”

“Top engineers at Anthropic, OpenAI say AI now writes 100% of their code”

Claude Code and Codex

Surely you’ve heard of Claude Code or OpenAI Codex by now. They can arguably do everything that Retool did and far more. The Google Trends data shows a decline in the interest of Retool during the time that Claude Code and Codex experienced parabolic growth. These tools free you from the constrains of low-code tools to build just about anything.

Retool on Google Trends
Retool on Google Trends
Retool compared to Claude Code and OpenAI Codex on Google Trends
Retool: Blue, Claude Code: Red, OpenAI Codex: Yellow

I’m not sharing this to pick on Retool or to convince you that Retool needs saving, I’m sharing this because it points out a general market trend: Generative AI is the new way to code.

The Next Generation of Software

The aforementioned Clawd[bot] (now OpenClaw) is the poster child for what’s to come. A generative AI product, built by generative AI, documented by generative AI, and marketed by generative AI. It is now the fastest growing project in GitHub’s history, is sitting at over 200,000 stars and as mentioned earlier, the developer didn’t read the code. The cost of building software is steadily declining while the quality of AI output is only increasing.

Not long after the success of OpenClaw, there were clones being built overnight (e.g. ZeroClaw) at breakneck speeds.

The current consensus (in my opinion) is that writing code by hand is not nearly as valuable as it once was. What is valuable is prompting the AI to create something that people want to use. Like a replacement for Slack. I’ve seen some people say that “understanding systems and how it all connects” is valuable, but I’m not convinced that we won’t be delegating that to the AI very soon. The opinion that seems the most accurate to me, is that the value of a software developer is shifting from “how to build” towards “what to build.”

How Good is Good?

We are not at the point where AI is generating an entire product from a single prompt… yet.

However, we may have just hit an inflection point. In the early days of “AI assisted programming”, I would find myself needing to provide ample context to have any chance of getting a decent response. Nowadays, the AI seems to have all the context built-in and can fetch additional context without me having to lift a finger.

To give you a baseline idea of how far we’ve come, I’d like to point you to simonw’s 2026 article about Sonnet 4.6. He has been using the “Pelican riding a bicycle as an SVG” test since 2024 and while this certainly is not a measure of intelligence, it shows that AI companies have been filling in the cracks. Compare his 2024 results with his 2026 results. The difference is impressive.

Pelican riding a bike. 2024 vs 2026
Oct. 2024 vs. Feb. 2026

Again, this is not a measure of intelligence– maybe the AI companies specifically trained their latest models to draw pelicans on bicycles to make them appear smarter than they are. Luckily, there are other benchmarks to rely on and the results are impressive and slightly terrifying.

Time Horizon 1.1

In this chart, the Y-axis is the number of hours it takes a human expert to complete various tasks given the same information as the AI models. The models are positioned on the Y-axis to the corresponding tasks that they’re able to complete 50% of the time. The AI models are unsurprisingly faster than the experts when completing the task, often finishing in less than half the amount of time. You can see how long each model takes to complete the task by hovering over the data point in the original article from METR.

“At current rates of compute growth and algorithmic progress, there will be >99% automation of AI R&D […] and 300x-3000x research output by 2035” - METR

Sounds to me like the singularity is on its way.

It’s not just programming that will be disrupted. The chart below shows that each generation of AI models is getting better at chemistry, math, physics, biology, robotics, general computer-related tasks (like editing a photo in Photoshop), self driving and likely many other domains.

Time Horizon in various domains

You can read the full article about how AI models are improving in other domains here. The most interesting part about that article to me is that they found that the “doubling time”– or how long it takes for the model to double their time horizon has possibly shortened from every 7 months to every 4 months in 2024.

GenAI’s Impact on Me

I used to take comfort in the fact that products built primarily by AI were easy to spot. It was so easy to spot in fact that we had a word for it: “slop.” But year after year, the gap between what is slop and what is human creativity is closing– so much so that we might need a new word unless we’re willing to call our own creations slop too.

I’ve been conflicted in the past few years as to where I fit into this equation– my main hobby has always been programming and I genuinely enjoy writing code. The fact that there is a well-paying industry that coincides with my hobby is lucky to say the least. However, the entire industry is being disrupted by generative AI and the activity I enjoy most has largely been delegated to our new AI overlords. An article surfaced on HN a couple of weeks back titled “We mourn our craft”1 and it eloquently captures how I feel about the situation– like I’m mourning the craft of writing code.

For now, I’m going to continue to read and write code for domains that interest me most. I will use generative AI, but only after I’ve done the hard work to deeply understand the tasks I delegate. I’m also going to spend more time thinking about product design before building my next full-scale project since that’s where the real value is at.