abstractions

Why Google might win? (5/5)

Rakshit Ranjan — Mon, 27 Apr 2026 16:40:55 GMT

So far in this series, I have argued that Google has four supply-side moats.

Custom TPUs give it a cost edge on every token it serves.
Private infrastructure–efficient, integrated, end-to-end owned–eliminates the margin leakage every other lab pays.
Proprietary data that is complete, multimodal, and compounding as the public internet runs dry.
A frontier model, Gemini, trained on top of all three and is fairly competitive with the other players.

If the argument holds at four layers, the question becomes whether any of it actually reaches users. Cost advantages are inert without distribution. Better models are invisible without surfaces. The fifth layer is where structural advantage either compounds into a business or stays trapped as an interesting lab result.

In software, where distribution and transaction costs are zero, Ben Thompson’s Aggregation Theory, now a decade old, is the seminal thesis on who wins:

The most important factor determining success is the user experience: the best distributors/aggregators/market-makers win by providing the best experience, which earns them the most consumers/users, which attracts the most suppliers, which enhances the user experience in a virtuous cycle.

The question I had when I started thinking about it was whether it would still hold in the AI era. I think it will.

I strongly believe that we are in an interim, transient state in the development of AI as a technology.

When ChatGPT broke out in late 2022, the chat interface was essentially an accident. OpenAI was sitting on GPT-3.5 and needed a way to demo it. A dialogue box was the obvious shell. What no one expected was that the shell itself would go viral, with one of the fastest consumer product adoption rates in history. Every other lab scrambled to ship a chat interface in response. That is how chat became the default.

But chat is not the end state. We are already seeing three stages unfolding in parallel.

Standalone chat: ChatGPT, Claude, Gemini as destinations. You go to the product, type, read, leave. This is where most attention currently sits.
Embedded Copilot: Gemini in Docs, Copilot in Excel, Cursor in the IDE, Wispr Flow in the keyboard. You don’t go anywhere; the intelligence meets you inside the workflow.
Autonomous action: Agents that take a goal and execute across products–researching, drafting, scheduling, purchasing. The user specifies outcomes; the software handles the intermediate steps. e.g., Claude Code and Cowork.

As new long-run paradigms emerge, the constant underlying them is that users adopt technology to get a job done. Each major platform shift–the PC, the web, the smartphone–succeeded because it let people accomplish more with less friction. AI is the next layer in that sequence, and the one most likely to dissolve the boundary between “using an application” and “getting a thing done”.

Google has been laying the groundwork for this longer than any company competing in AI today. Cutting-edge machine learning has been quietly embedded across its products for over a decade, improving user experience in ways most people don’t recognise as AI at all.

Autocomplete and spelling correction in Search.
Magic Eraser and visual search in Google Photos.
Near-native translation in Google Translate.
Smart Compose finishing sentences in Gmail.
Live traffic and rerouting in Google Maps.
Call Screening and Circle to Search on Pixel.
Autonomous driving at Waymo.

All of it is downstream of foundational research by the Google Brain team, which has always focused on commercialisation–on embedding AI into products billions of people already use every day.

That is the point about distribution.

Getting a thing done entails three steps:

Capture
Understand
Serve

On capture, Google operates seven products with over two billion users each–Search, Android, Chrome, YouTube, Gmail, Maps, Photos. The nearest peer is Apple, only because the iPhone is itself a universal intent-capture device.

Muscle memory already routes users: Search for facts, Maps for places, YouTube for how-tos, Gmail for correspondence. As AI advances, the range of inputs will expand–voice, text, vision, ambient sensors, eventually neural. Google already controls most of the clients through which those inputs will flow.

On understanding, the proprietary data across services compounds. Location from Maps, queries from Search, calendar from Gmail, photos from Photos, viewing history from YouTube–all resolved to the same user account. That level of context is what turns a vague ask into a specific task.

On serving, the products needed to act on intent–calendar, email, maps, documents, browser, storage–are ones Google already ships. Every step of an agentic workflow is likely to reduce to an existing Google surface.

While Google is broadly dominant, advances in each of these three layers are also coming and will continue to come from different companies. Wispr Flow shows what voice intent capture looks like when done natively across devices. Perplexity shows what retrieval-first search can be when built AI-first. And Claude Code shows what agentic reasoning looks like when embedded in a developer workflow.

But subsuming these layers, either through acquisition or by shipping a native feature, is not a stretch for Google. A platform advantage of this breadth can be deployed anytime.

However, caveats do exist. There are at least three places where this thesis could break.

The first is that an incumbent subsuming a challenger is not automatic. Microsoft spent years trying to absorb Slack with Teams and largely succeeded, but only because enterprise buyers chose bundling. Google spent similar years trying to subsume WhatsApp with a succession of products–Allo, Duo, Chat, Meet–and mostly failed, because consumer network effects sit with the incumbent. The bet in this essay is that AI belongs to the first pattern, not the second. The case: unlike social, where the network is the product, AI capability is increasingly a commoditised input that integrates most cleanly where the user already is. Distribution compounds.

The second is form factor. If the dominant interface of the next decade is hardware, Google does not make–OpenAI’s collaboration with Jony Ive, Meta’s Ray-Bans, Apple’s on-device intelligence–then Google’s surface area stops being universal. Android is a hedge, not a guarantee. The ambient-compute assumption in this essay depends on Google retaining at least parity at the device layer.

The third is speed. Google is a public company with a billion-user Search business to defend and active antitrust cases in multiple jurisdictions. It will never move like a startup. It compensates with scale, bundling, and the ability to absorb a category through acquisition or feature launch. That trade–slower but broader–only holds if the next two or three years don’t produce a challenger large enough to refuse acquisition.

None of these is fatal on its own. But “Google might win” was always a probabilistic claim, not a foregone one.

Why Google might win? (4/5)

Rakshit Ranjan — Sun, 05 Apr 2026 18:26:22 GMT

So far in this series, I have argued that Google has three underlying moats.

Custom TPUs give it a cost edge on every token it serves.
Private infrastructure that was built to run Search and YouTube eliminates the margin leakage every other lab pays.
Proprietary data–complete, multimodal, and compounding as the public internet runs dry–is the layer that capital alone cannot replicate.

None of this matters without a frontier model on top. Thankfully, Google has one.

But in early 2023, that was not obvious. ChatGPT had reached 100 million users within two months of launch, Google had declared code red internally, and its first response, “Bard”, gave an erroneous answer in its launch demo and wiped $100 billion in market cap in a single day.

In April 2023, Sundar Pichai, acting like a wartime CEO, took a bold step. He merged1 Google Brain and Google DeepMind–the two AI research labs within the company.

Google Brain was the applied research arm: obsessed with getting AI into products, optimising for what could be trained, deployed, and scaled across Search, Ads, and Translate. Its theory of intelligence was statistical: expose a model to enough data, and useful patterns emerge. Scale is the mechanism. The Transformer paper that every major model is built on came out of Brain.

Google DeepMind was solving a different problem: artificial general intelligence, a system that could reason through problems it had never seen. Its method was reinforcement learning: define a precise feedback signal, then train the model to maximise it. Think of it like learning a video game by trial and error. You try something, see whether the score went up, adjust, try again (except across millions of attempts).

Over the course of the past two years, this collaboration, combined with the discussed structural moats, has led to Gemini having distinctive characteristics.

Gemini architectures allow the processing of up to 1-2 million tokens in the context window (among the highest in commercial models).
Gemini serving costs have fallen 78% year-over-year, and API pricing runs 5-10x cheaper than OpenAI’s comparable models.
Gemini is natively multimodal (a single neural network trained on text, images, video, and audio together from the start). When competing models process an image, a separate encoder translates pixels into a representation the text model can read (like handing a language-only brain a written description of a photograph). In practice, this gives Gemini an edge in tasks that require reasoning across modalities simultaneously–analysing a video’s visual content alongside its audio, or interpreting a chart embedded in a document in relation to surrounding text.

Consequently, it is imperative to analyse what this translates to on the inference layer–in terms of utility and applications. To evaluate the models holistically, I believe three parameters matter:

The first is whether people prefer using it.

On Arena (formerly LMArena), where real users vote on two anonymous models across whatever tasks they actually care about, Gemini has two models in the top five in the Text Arena. More than any other company. Claude leads the leaderboard; Gemini trails narrowly.

On Text-to-Image and Text-to-Video Arena, Google is dominant with Gemini and Veo taking most of the positions.

However, in the Code Arena, Claude is the undisputed king with Gemini coming in at seventh.

The second is performance on the hardest problems with canonical right answers–competition mathematics, formal science, and competitive programming. Gemini Deep Think holds gold-medal level across multiple Olympiads. On Humanity’s Last Exam2, which tests frontier expert knowledge across all domains, it holds the highest published score.

The third is genuinely novel reasoning, measured by ARC-AGI-2. This benchmark was specifically designed to be immune to the normal tricks–it cannot be gamed by training on similar problems, because the problems are genuinely novel visual reasoning puzzles that have never appeared in any training corpus. At launch in early 2025, every major model scored near zero. Chain-of-thought reasoning, extended compute, and even brute force failed to make a mark. The benchmark was designed such that it could not be gamified.

Gemini Deep Think, however, scored 84.6%, verified independently. Gemini 3.1 Pro scored 77.1%. Going from near zero to above 80% was made possible through a reinforcement learning-based reasoning approach, applied to a problem class that scaling-only approaches cannot crack.

These artefacts prove that Google has been able to achieve best in class outcomes with Gemini. To be able to do that within two years of the Bard fiasco is a truly commendable feat.

These performances are not set in stone. And as is evident from the leaderboards above, other frontier labs are not far behind. Talent is fungible, techniques are being replicated, and access to capital is also democratised to a large extent. Hence, the only thing that matters at this layer is sustained competitiveness. Combining it with the other structural advantages and distribution (which will be discussed next), Google can ensure dominance in the time to come.

Under its 2014 acquisition terms, DeepMind was guaranteed operational autonomy and an independent ethics board, allowing the London-based lab to operate in a silo that actively resisted integration with Google Brain until the competitive threat of OpenAI forced their 2023 merger.

Humanity’s Last Exam, a multi-modal benchmark at the frontier of human knowledge, designed to be the final closed-ended academic benchmark of its kind with broad subject coverage. The dataset consists of 2,500 challenging questions across over a hundred subjects. They publicly release these questions, while maintaining a private test set of held out questions to assess model overfitting.

Why Google might win? (3/5)

Rakshit Ranjan — Wed, 25 Mar 2026 17:47:54 GMT

Scaling laws have shown, consistently, that model performance improves with data. The frontier labs have been scaling models and data together, and the public internet is the shared corpus everyone has been drawing from.

A data moat should largely come from having access to high volumes of proprietary data to train on. Google does sit on a huge stockpile of private data across its various services but so does every other major tech company.

Meta has the social graph. Microsoft has two decades of enterprise productivity data. Apple has billions of devices generating behavioural signals that no one else can access, and by design, cannot aggregate them centrally for training.

I believe that Google stands out because of 3 unique data properties that it possesses: completeness, multimodality, and compounding.

1. Completeness

Most publicly available AI training data captures a fragment of human cognition. Wikipedia captures knowledge. Reddit captures opinion. GitHub captures code. These are useful snapshots of thought but not behaviour.

Google can trace the full arc of a human decision and map it accurately to individual users.

Search captures real-time intent in natural language.
YouTube captures what people watch after they search, revealing the interest and learning signals that sit between intent and action.
Chrome records browsing behaviour across the entire web, not just Google properties.
Google Analytics, installed on roughly 85% of the web, captures what those same people do next: what they buy, where they click, how they behave commercially.
Google Pay captures direct transaction data, closing the loop between intent and purchase.
Gmail captures the downstream evidence of decisions: order confirmations, travel bookings, subscription receipts. It sees what people did after they decided.
Nest and Google Home record domestic behaviour: when people are present, how they live, what their environment looks like.
Maps and Waymo record where they end up in the physical world.
Fitbit captures biological response.

No other competitor comes close to this breadth.

A model trained on intent alone becomes better at predicting what people want. A model trained on intent and outcome becomes better at helping people get there.

2. Multimodality

Text has historically been the dominant modality of AI training and all frontier labs excel at it.

Google’s advantage is structurally significant in the other modalities.

YouTube has billions of videos where audio, visual, and text describe the same moment simultaneously. Videos carry a spoken track, a visual frame, and an auto-generated transcript timed to the millisecond.
Street View and Waymo add spatial and physical understanding at a scale no competitor has approached. Street View is millions of miles of continuous visual imagery, each frame tagged with precise geographic coordinates. Waymo layers in LiDAR and real-world sensor data from live driving conditions. Together they give a model something close to a physical map of the world–as it actually appears.
Google Photos holds trillions of images, each tagged with time, location, and the objects and scenes they contain.
Google Translate has produced the most comprehensive multilingual parallel corpus in existence across 100+ languages, where the same meaning is expressed in structurally different linguistic forms simultaneously.

The unique characteristic here is how the modalities align–audio, visual, and text describing the same event. That alignment is what lets a model learn the relationship between modalities.

This can’t be assembled on demand. It accumulated over decades because the products required it. That is why it is structurally different from data assets built with AI in mind.

3. Compounding

Google’s data grows because it is generated by use. The behavioural signals embedded in that use provide a form of continuous feedback at a scale no lab can replicate through paid annotation. Every interaction is grounded in what a real person actually did, not what a model predicted they would do.

That distinction matters because of what scaling laws tell us about data. As frontier models scale, the data requirement scales with them. The public internet, which every major lab has been drawing from, cannot keep pace. Common Crawl, Wikipedia, arXiv, GitHub are finite reservoirs, and researchers estimate they will be effectively exhausted for training purposes by 2026.

The obvious response is synthetic data. If you run out of human-generated text, you generate more using existing models. This works up to a point. Models trained on successive generations of synthetic data tend to drift from reality, a phenomenon researchers have started calling model collapse. Google’s live behavioural feed reduces this risk because a significant portion of its training signal is anchored to real-world actions. It is not immune to the problem, but it is structurally less exposed than labs whose pipelines depend primarily on synthetic generation and licensed text.

The first generation of AI products, chatbots, copilots, summarisation tools etc. required models that understand language well. The next generation requires something harder. Agents that book travel, execute multi-step research, navigate physical environments, and handle purchases need models that understand how the world actually works: spatial relationships, causal chains, how decisions play out over time.

OpenAI and Anthropic are approaching this from text, trying to infer physical and causal structure from written descriptions. Google has that structure already embedded in its data, accumulated over decades of building products that operated in the physical world. These are different starting points with different ceilings.

Here is where the argument needs to be tested.

Google is not moving fast enough on agents.

Anthropic has built Claude Code, and Cowork. Just this week, they announced that Claude now has the capability to remotely trigger tasks on a computer.

OpenClaw took the internet by storm and got acquired by OpenAI.

Perplexity launched Perplexity Computer.

These are small, fast teams shipping genuine agent products at a pace that Google has not matched. Gemini is technically capable, but Google has not yet translated its structural advantages into a category-defining agent product.

While chat has low switching costs, for agents, they could be significantly high due to the complexity that they deal with. Google needs to translate this structural advantage into embedding users deeply into agentic products and workflows. And it needs to do it quickly.

Why Google might win? (2/5)

Rakshit Ranjan — Wed, 04 Mar 2026 18:21:36 GMT

In part 1, I wrote about how Google makes its own AI chips cheaper than anyone can buy Nvidia’s. That’s a real advantage, but chips are only part of the story. A chip sitting in a poorly-cooled data centre, drawing expensive power, transmitting data across rented pipes, is not cheap. The TPU advantage only compounds if the infrastructure around it is efficient.

I’m calling this the infrastructure layer. It has five components: grid access, power reliability, power efficiency, intra-data centre networking, and inter-data centre networking. On most of these, Google is strong–but not uniquely so. Other hyperscalers secured grid connections decades ago. Everyone is diversifying into nuclear. Meta is building subsea cables at a pace that rivals Google’s.

One metric, though, tells a different story.

When a data centre buys $100 of electricity, not all of it reaches the processors. A large chunk gets burned on cooling, power conversion, and overhead. The ratio of total power consumed to power that actually does useful computation is called PUE. Lower is better. A perfect score is 1.0.

The industry average is 1.56. For every $100 of electricity, $36 is wasted.

Google’s number is 1.09. Microsoft is 1.16. Amazon is 1.15.

And then there’s Meta, at 1.08–slightly better than Google.

That last number is important because it tells you why this efficiency exists. It isn’t a Google-specific secret. Meta got there the same way Google did. Both companies built their data centres to run their own products–Google for Search and YouTube, Meta for Facebook and Instagram. When you own both the hardware and the workload, you can design them as one integrated system. Cooling, power delivery, and server layout are co-optimised for a specific job.

Microsoft and Amazon built for a different purpose: selling cloud infrastructure to thousands of customers with wildly different workloads. That flexibility comes at a thermodynamic cost. Generalised infrastructure is inherently less efficient than purpose-built infrastructure.

So if Meta matches Google on efficiency, why does Google still win this layer?

Because efficiency without distribution is just a cost saving. Meta runs some of the most efficient data centres in the world, and cannot sell a single dollar of that advantage to an external customer. It has no cloud business. The efficiency stays locked inside Facebook and Instagram, unable to compound into a higher-margin revenue stream.

Google runs the same efficient infrastructure and sells it. Every marginal efficiency gain flows through to Google Cloud’s pricing and then to Google’s margins. Stack this on the chip argument from Part 1: Google pays less per chip, then pays less to power each chip. At roughly 50% gross margins for AI–considerably thinner than Search’s 89%, being structurally cheaper at both layers is the difference between a sustainable business and a money-losing one. It is also why Gemini’s API pricing is cheaper than OpenAI’s for comparable models.

No single metric gives Google a decisive moat. But completeness does.

Google designs its own chips. It runs tier-one efficient data centres. It operates a private global network. It builds a frontier model. And it sells all of this as a cloud service.

Now look at everyone else.

Meta matches Google on efficiency and networking but doesn’t sell cloud and its models are open-sourced rather than commercially hosted.
Microsoft sells cloud at enormous scale but doesn’t make its own chips, runs less efficient data centres, and licenses its model from OpenAI without controlling the research roadmap.
Amazon sells cloud and is investing in chips (Trainium) and cables (Fastnet), but both are early, and it has no frontier model.
OpenAI and Anthropic build the models everyone talks about. But they own nothing underneath. No data centres, no chips, no cables. Every token they serve includes a landlord’s tax to whichever hyperscaler hosts them. OpenAI pays Microsoft. Anthropic pays Google and Amazon.

Dario Amodei made this fragility vivid in a recent conversation with Dwarkesh Patel. Dwarkesh pushed him hard: if you really believe we’re a few years from having “a country of geniuses in a data centre,” why aren’t you buying far more compute?

Dario’s answer was striking. He explained that Anthropic’s revenue has been growing roughly 10x per year–from zero to $100 million in 2023, to $1 billion in 2024, to $9–10 billion in 2025. So he sat down and did the math. If that 10x rate continues, revenue would be $100 billion by end of 2026 and $1 trillion by end of 2027. He could, in theory, buy $1 trillion of compute starting in 2027 to keep pace with that demand.

But data centre commitments are made years in advance. If the revenue curve is even slightly off, if growth is 5x a year instead of 10x, or if the “country of geniuses” arrives in mid-2028 instead of mid-2027–Anthropic is stuck paying for compute it can’t monetise. As Dario put it: “if you’re off by only a year, you destroy yourselves.”

This is the existential arithmetic of being a tenant. The numbers confirm it. Anthropic projected roughly 40% gross margins in 2025–sixty cents of every revenue dollar going straight to compute costs. It burned $5.6 billion in cash in 2024, expects to burn another $3 billion in 2025, and plans to spend roughly $80 billion on cloud infrastructure through 2029. Break-even isn’t projected until 2028. The revenue is real. The product is excellent. But the company is running on investor confidence that the revenue curve outpaces the cost curve before the money runs out.

Google’s position is structurally different–not risk-free, but categorically different. It plans to spend $175–185 billion on infrastructure in 2026, nearly double the $91 billion it spent in 2025. If AI demand doesn’t materialise as expected, its margins compress and shareholders are unhappy. What doesn’t happen is bankruptcy.

Alphabet earned over $130 billion in net income in 2025 on $400 billion in revenue. Search, YouTube, and Cloud keep generating cash regardless of whether frontier AI justifies its training costs on schedule. Google can be wrong by a year, or two, and come out bruised but alive. Anthropic cannot.

If scaling laws hold–if more compute continues to produce better models, then the company that trains cheapest trains most. Owning every layer means every efficiency gain compounds with no intermediary taking a cut. There’s no landlord. There’s no margin leakage. That’s a structural advantage that can’t be closed by a funding round.

And right now, only Google has no gaps in it.

Why Google might win? (1/5)

Rakshit Ranjan — Tue, 17 Feb 2026 19:11:55 GMT

In The Capital Theory of AI, I argued that the AI progress is a capital allocation problem. Scale purchases capability. The winners will be those with massive capital to meet the compute scaling demands, or companies that sit on specific data moats.

While listening to this amazing Acquired episode on “Google: The AI Company”, I realised that my framing was a slight oversimplification of what the AI value chain looks like.

My revised hypothesis is that it is a 5-layer structure:

Google has strengths across all 5 layers, and that might be why it wins the AI race in the short/medium-term1 2.

Silicon: custom TPU (tensor processing unit) chips
Infrastructure: Private fibre and cloud connecting those chips
Data: YouTube, Search, Maps, Scholar
Model: Gemini, built on cost-effective self-owned hardware
Distribution: Consumer apps spanning almost all touchpoints of human experience

The strategic logic is different at every level. This series examines each layer. I’m starting with silicon because it sits at the bottom of the stack, and because it’s where the most underappreciated battle is playing out.

For decades, AI ran on CPUs–general-purpose processors that handle instructions one at a time.

It was only 2012 when Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton figured that the GPUs (traditionally used for rendering video game graphics) could be leveraged for training image recognition models due to the similarities in inherent computation (both required matrix multiplications).

Today, every major AI lab runs on Nvidia GPUs. OpenAI trains on Microsoft Azure’s Nvidia clusters. Meta assembled over 350,000 H100 GPUs to train Llama 4. xAI built the Colossus cluster with 200,000+ H100 and H200 GPUs.

Google took a different path. As early as 2013, they realised that given the scale of their consumer apps, relying solely on Nvidia could potentially mean buying their entire production. So, they started building their own processors–TPUs.

TPUs are purpose-built for AI math, which makes them more efficient but less flexible than Nvidia’s GPUs. This enables a unique margin structure.

Nvidia designs and sells complete chips. It controls the full stack from architecture to software ecosystem (CUDA), and prices accordingly (leading to roughly 80% gross margins).

Google’s arrangement is different. It designs the chip logic–what the chip should do, how data flows through it. Broadcom then takes that design and turns it into something that can actually be fabricated–physical transistor layouts, memory interfaces, power delivery. TSMC manufactures the final product.

Google pays Broadcom a ~50% margin for this value-added. Still significant, but it is reasonably lower than what Nvidia charges for the whole package.

When chips represent over half of data centre cost, that margin gap is a structural cost advantage on every token that Google serves. Consequently, Gemini’s API pricing is cheaper than OpenAI’s equivalent models.

The clearest validation of this cost advantage comes from an unlikely source. Last October, Anthropic–arguably Google’s most direct competitor in frontier models–signed a cloud deal for up to one million TPU chips worth tens of billions of dollars. This is a commercial agreement: Anthropic pays for TPU compute through Google Cloud. It is separate from Google’s $2 billion equity investment in the company.

This is why silicon is the foundation of Google’s AI stack. AI runs at roughly 50% gross margins, considerably lower than Search’s 89%. At those margins, having low-cost compute is almost existential. And right now, no one produces it cheaper than Google.

Winning the AI race in itself doesn’t guarantee profits or long-term market dominance. The current ways of experiencing AI are at odds with Google’s dominant search business–it remains to be seen how things progress, and how Google adjusts/makes up for the alternative reality.

Structural advantage doesn’t guarantee dominance. The biggest counterforces that Google faces are organisational drag, the innovator’s dilemma, and the risk that AI would never monetise as effectively as Search. But Google has navigated well through this over the last couple of years. Pichai merged DeepMind and Google Brain–a seemingly difficult task given the circumstances of the DeepMind acquisition. And Sergey Brin returned to work on Gemini.

Vibe coding a visualisation for historical AQI trends

Rakshit Ranjan — Sun, 08 Feb 2026 17:44:23 GMT

The premise

I have been a long-time admirer of Our World in Data and the book Factfulness. While mainstream media optimises for clickbait that showcases the direness of our existence, these two make a data-driven case for how this is the best time to be alive.

I love Delhi but am disappointed every year when we don’t take any measurable steps to curb the menace of air pollution.

As a pragmatic optimist, I wanted to study how air pollution has trended historically across major cities of the world. And more importantly, what they did to fix it.

So I vibe coded a historical AQI tracker for 8 cities (across 6 pollutants) that have successfully combated air pollution. It shows how the concentration of pollutants has trended over the last several years and also plots key policy interventions that potentially drove the improvement.

This research served two purposes:

It gave me hope that things will get better
The entire building experience was super-fun

Notes on building

Link to GitHub

I wanted to build a data journalism-style site that tracks historical air quality across 8 cities, 6 pollutants, and up to 70 years of data for some cities.
I vibe coded the entire thing in ~4 days with Claude Code and Codex. This was my first time using agentic AI tools to build and deploy a site end-to-end.
I have always been an amateur coder and my side projects in the pre-LLM era relied heavily on cloning GitHub repos, copying and editing source code from inspiration websites and scouring through Stack Overflow for solving maddening bugs. Most of the issues that I faced then were more to do with plumbing than the actual coding logic. Installing the right packages, structuring files correctly, and deployment overheads.
Claude Code and Codex, with their repo-level access, make this a breeze. They helped me focus more on the problem solving and the experience.
Claude Code is the better architect. It is great at taking a vision and fleshing it out. It works amazingly well in plan mode to create an elaborate context document, chart out the steps that it will take, and makes great architecture and tooling decisions.
The only constraint that I have faced with Claude Code is that it has a fairly small limit on the Pro plan, which makes it hard to work in long sessions on a single problem.
Codex came in handy when Claude Code hit its limit. Overall, I found it to be the better machinist. Precise, fast, and reliable for isolated fixes like tooltip clipping or restructuring a component without breaking six other things.
The stack is minimal. React 18, Vite, Tailwind, Recharts. Deployed on Vercel.
The design language is editorial, not dashboard. I took inspiration from Our World in Data. Everything fits in one viewport. Seeing the chart and the interventions at the same time helps establish the relationship between policy and possible outcome.
The project has 96 commits. I didn’t write a single line of code myself.
- 33 of the 65 commits are authored by Claude Code. The rest were through Codex.
- I was writing prompts, reviewing pull requests, pushing back on implementation choices, and deciding what “done” looked like. Claude and Codex were writing React components, building data transparency features, and fixing mobile bugs.
The most useful thing I did was ask Claude to write a brutal critique of my own site. What would a researcher say? What would a data journalist say? What would Twitter say? The critique helped improve the site significantly. Some examples:
- Inadequate citations.
- Implied causation. Markers on intervention years strongly suggest those policies caused the improvements. But air quality is multi-factorial.
- Survivorship bias. I’d only picked cities where things got better.
- Geographic bias. Zero cities from Africa, South America, or the Middle East.
- Interpolated data displayed identically to measured data.
Claude then built every fix into the site.
- Proper source attribution so every number is traceable.
- Causation-vs-correlation disclaimers in the sidebar.
- A selection bias disclosure that tells you upfront: this dataset is not representative. It lists what’s missing and warns about success-oriented bias.
- Interpolated data now renders as hollow circles with dashed lines and an “Est.” badge in tooltips.
Desktop site was fairly straightforward. But perfecting the mobile experience took a fair bit of work.
- I wrote a full CLAUDE.md spec with breakpoint architecture, desktop protection rules, and a regression checklist. I had to explicitly instruct it to consider the desktop to be frozen and focus on mobile-only.
- 31 pull requests later, including multiple reverts and redos, the mobile experience works.
The experience reinforced what I wrote in How to ask?. Now the constraint is specification and taste. Can you describe what you want with enough precision? Can you tell when the output is wrong, not syntactically, but conceptually? These are fundamentally different skills from writing code, and they draw on a different kind of expertise.

The capital theory of AI

Rakshit Ranjan — Sun, 01 Feb 2026 17:41:24 GMT

Computer scientist Richard Sutton hypothesised the “bitter truth“ in his 2019 essay:

The bitter lesson is based on the historical observations that 1) AI researchers have often tried to build knowledge into their agents, 2) this always helps in the short term, and is personally satisfying to the researcher, but 3) in the long run it plateaus and even inhibits further progress, and 4) breakthrough progress eventually arrives by an opposing approach based on scaling computation by search and learning. The eventual success is tinged with bitterness, and often incompletely digested, because it is success over a favored, human-centric approach.

This was a hard-to-accept conclusion based on years of empirical evidence. Researchers have been trying to invent novel ways of embedding intelligence into machines, but these efforts have been consistently surpassed by advancements in compute tech (lower costs & enhanced capabilities).

In the following year (2020), OpenAI published a paper titled “Scaling Laws for Neural Language Models“, which further crystallised Sutton’s hypothesis.

The core insight was that AI performance improves with mathematical predictability, following power laws, as you scale compute, data, and model parameters. This proves that raw scale is a far more powerful driver of intelligence than specific architectural design.

Why does this matter?

It matters because technological progress is seldom predictable. The evolution of the internet, the advent of mobile, and the shift to cloud were all characterised by huge unpredictability.

Scaling laws, however, change this equation. If you know that a 10x increase in compute yields a reliable improvement in capability, then the frontier of AI is not a research problem; it’s a capital allocation problem.

The implications ripple outward from this core insight.

Consider the competitive dynamics among AI labs. If capability is purchased with compute, then the labs that can raise the most capital and secure the most chips will produce the best models. This is not a cottage industry for brilliant researchers in garages; it is a capital-intensive manufacturing process with massive economies of scale.

This is a major shift. For the past two decades, the defining logic of the technology industry has been software-centric. Software scales infinitely at zero marginal cost, which is why the aggregators–Google, Meta, Amazon–could build global monopolies with relatively modest infrastructure investments. The valuable asset was the code, the algorithm, the network effect.

This doesn’t mean that there is no value to be captured for other players. There are, in fact, several layers in this emerging stack:

At the bottom: compute infrastructure. Nvidia (and by extension TSMC, ASML Holdings, etc.) is the obvious winner here. Their GPUs are the de facto platform for AI training. But beyond Nvidia, the hyperscalers (Microsoft, Google, Amazon) are investing tens of billions of dollars annually in data centers.

In the middle: foundation model providers. OpenAI, Anthropic, Google DeepMind, Meta’s AI labs–these are the entities actually training frontier models. Their advantage is the combination of talent, capital, and proprietary training techniques. But here’s the uncomfortable reality: if scaling laws continue to hold, the moat is not the model itself (which can be approximated by anyone with enough compute) but the rate of improvement.

There is no switching or multi-homing cost, and hence it is very hard for these model providers to build a defensible moat beyond being ahead of the curve on capabilities and perhaps experimenting with interesting subscription models (e.g., ChatGPT Go, Google’s bundling & Perplexity’s Airtel Partnership).

At the top: applications and integrations. This is where most businesses will play, and owning proprietary data, specific workflows, or customer relationships can be the key to differentiation.

The a16z podcast “The AI opportunity that goes beyond models“ highlights three amazing examples of startups doing this:

Slingshot AI: They build an AI mental health companion. They started by taking notes for human therapists—a classic way to “do things that don’t scale.” By helping with the paperwork, they saw thousands of private sessions that aren’t on the internet. This gave them the “shadow data” needed to train a model that actually understands clinical empathy, which a general AI simply can’t learn from Reddit.
FamilySearch: They run the world’s largest genealogy database. Their moat is physical friction: for decades, they sent people to remote villages to microfilm paper birth and death records. You can’t scrape what isn’t digital. They own the data because they were willing to do the tedious, manual work that Silicon Valley usually tries to avoid.
vLex: This is a legal research platform that digitised fragmented, analog law records in Spain and beyond. By turning messy, offline paper trails into a “system of record,” they built a walled garden. Even the most powerful AI is useless if it can’t access the source of truth, and in this niche, vLex owns the keys to the library.

It would be a mistake to treat scaling laws as guaranteed to continue indefinitely. There are legitimate reasons for caution.

First, there is the data wall. As we exhaust the “low-hanging fruit” of high-quality human-generated internet text, the industry is forced into the expensive, artisanal world of synthetic data and human-in-the-loop labelling. There are early indications of performance degradation for models trained on outputs of other AIs.

Second, there is the distinction between loss (the statistical measure that scaling laws predict) and capability (the practical abilities we actually care about). Loss decreases smoothly, but capabilities often emerge abruptly and unpredictably. A model might go from 0% to 90% accuracy on a task with a single increment of scale. We cannot yet predict which capabilities will emerge when.

Third, AI follows power laws, where linear gains require exponential costs. We’ve reached a point where the next 1% of intelligence requires billions in hardware and massive energy, making “brute force” economically unsustainable for most.

That being said, we are still far off from reaching that scaling wall. As Tomasz Tunguz writes in “The scaling wall was a mirage“:

Then Gemini 3 launched. The model has the same parameter count as Gemini 2.5, one trillion parameters, yet achieved massive performance improvements. It’s the first model to break 1500 Elo on LMArena & beat GPT-5.1 on 19 of 20 benchmarks.
Oriol Vinyals, VP of Research at Google DeepMind, credited improving pre-training & post-training for the gains. He continued that the delta between 2.5 & 3.0 is as big as Google has ever seen with no walls in sight.
This is the strongest evidence since o1 that pre-training scaling still works when algorithmic improvements meet better compute.

Second, Nvidia’s earnings call reinforced the demand.
“We currently have visibility to $0.5 trillion in Blackwell and Rubin revenue from the start of this year through the end of calendar year 2026. By executing our annual product cadence and extending our performance leadership through full stack design, we believe NVIDIA will be the superior choice for the $3 trillion to $4 trillion in annual AI infrastructure build we estimate by the end of the decade.”
“The clouds are sold out and our GPU installed base, both new and previous generations, including Blackwell, Hopper and Ampere is fully utilized. Record Q3 data center revenue of $51 billion increased 66% year-over-year, a significant feat at our scale.”

This brings us back to Sutton’s bitter lesson. The evidence suggests we haven’t yet exhausted what scale can buy. But “not yet” is doing a lot of work in that sentence. The economics are getting harder. The next order of magnitude in compute will cost tens of billions of dollars. At some point, the capital allocation problem becomes a capital availability problem. Until then, the race continues. The winners will be those who can write the biggest checks, or those clever enough to build moats that don’t depend on writing checks at all.

How to feature?

Rakshit Ranjan — Mon, 26 Jan 2026 18:38:26 GMT

Last week, I watched the Vryse pitch on Shark Tank. It is a search engine optimisation (SEO) and generative engine optimisation (GEO) platform.

SEO concerns itself with reverse engineering Google’s search algorithm–optimising for keywords, backlinks, page structure, etc. to rank higher in search results. GEO is the natural corollary–optimising to be surfaced in LLM responses.

Fundamentally, LLM-search has two layers:

The first is pre-trained knowledge, which refers to what the model learned during training. This is a snapshot of the internet at a fixed point in time, baked into the model’s weights. If you’re a brand trying to influence this, you’re essentially trying to retroactively change history. Short of becoming Wikipedia-level ubiquitous, there’s not much you can do here.

The second is RAG–Retrieval Augmented Generation. When you ask Claude or ChatGPT a question that requires current information, the model calls a search API (Google, Bing), retrieves the top results, reads them, and synthesizes an answer. This is where optimization is actually possible.

The RAG layer still runs on the existing search infrastructure. The OpenAI web search API documentation shows that models rely on the top 10 or 20 results from traditional search engines. If you aren’t on that first page, you don’t exist in the model’s context window.

This got me thinking that if GEO for pre-trained knowledge is a black box, and GEO for RAG mostly reduces to “do good SEO,” then what’s the extent of value-add of a dedicated GEO platform?

There is not a lot of research on GEO, but a few emerging hypotheses are as follows:

Reasoning models can go deeper into fewer pages, not wider across more pages. If your website ranks among the top few search results, the quality and structure of your content matter more than ever.
GEO rewards meaning density and semantic proximity (related concepts clustered together).
While average Google search queries are 4 words^[1], an average generative query is 23 words. There is a behavioural shift from searching to consulting in the context of LLMs. GEO implies optimising to answer complex, conversational questions.
Authority is verified through presence across key signals–communities (Reddit, Quora), YouTube transcripts, and citation-worthy journals/publications. Absence from the participatory web is treated as low-trust.
Attribution is hard. Google’s ad-supported model incentivised moving users to third-party sites to sell more ads (measurable through click-through rate). LLMs are subscription-driven. They have no financial incentive to send users away (at the time of writing this); their value lies in providing a frictionless, all-in-one answer.

With GEO, the technical requirements of search are merging with the structural requirements of good writing. It would be interesting to see how this evolves and if we are able to deterministically optimise LLM responses just as we did with search.

Investigating ChatGPT Search: Insights from 80 Million Clickstream Records↩︎

Attention is all you need

Rakshit Ranjan — Sun, 18 Jan 2026 18:32:00 GMT

This weekend, I spent time reading Google’s now-seminal paper, Attention Is All You Need, which introduced the transformer architecture that forms the foundation of large language models like ChatGPT, Claude, and Gemini.

I’ve forgotten most of the math I studied during my engineering, which made it difficult to fully understand the paper. However, I was able to build some intuition. Here are my abstracted notes:

Prior to transformers, techniques like RNNs (Recurrent Neural Networks) were used to train models:
- RNNs process sequences one step at a time. If you’re reading a sentence, you read word 1, then word 2, then word 3. Each step depends on the previous one.
- This made the process computationally slow—computing step 5 requires completing step 4, which requires step 3, and so on.
- Since information has to travel far, RNNs start to fail on large contexts—just as information gets lost or modified in a long chain of Chinese whispers.
The researchers proposed a radical new approach: instead of processing words sequentially, compute the relationships between all words in a sequence simultaneously.
There’s a bunch of math behind this. The paper formalises it using three concepts:
- Query: what I’m currently looking for
- Key: what each word offers
- Value: the information each word contains
These are used to calculate “attention.” Attention is an elegant way of assigning weights to words based on their relationship with other words. For example, in the sentence “The dog sat on the chair as it was tired”, the calculation will enrich the word “it” with context such that its relationship with “dog” has a higher weight than its relationship with “chair.”
Attention is content-aware but position-blind, which can be problematic. Without positional information, a transformer wouldn’t know which noun is the subject and which is the object—”man eats fish” and “fish eats man” would be indistinguishable. To solve this, the researchers added a positional encoding to each word before feeding it into the transformer. This preserves word order and meaning.
The paper had several implications:
- Training time compressed dramatically. The researchers achieved far superior results on a translation task in 3.5 days using 8 GPUs—at a fraction of the cost of the best models at that time.
- Due to the parallel nature of computation, throwing more compute at the problem works well. Capital became a moat, which is why we’re seeing massive investments in GPU acquisition for training.
- Transformers, due to their architecture, are extensible to multimodal inputs (voice, text, images) and diverse tasks (summarisation, text generation, translation).

Transformers made intelligence a scaling problem. Everything since has been a consequence of that.

How to ask?

Rakshit Ranjan — Sun, 11 Jan 2026 18:17:00 GMT

I spent the weekend using Claude Code to reorganise my notes and update my website. Through the process, I realised that working with agentic AI tools (and, by extension, chat-based LLMs) is akin to working with engineers in a product org.

I mean the actual mechanics: writing up what I wanted before asking for it, being specific about edge cases, reviewing the output, and pushing back. The stuff that separates good product managers from bad ones.

This surprised me.

Take planning. Most people just start typing what they want. Make me a website. Write me a story. Analyse this data. And then they’re disappointed when what comes back isn’t quite right.

Good PMs don’t work this way. Before they ask anyone to build anything, they’ve already written a document explaining what they want and why. They’ve thought through the edge cases. They’ve gathered examples of what good looks like. That is how you get what you actually want instead of “what you literally asked for”.

The same thing works with LLMs. Before I ask Claude to do something complex, I dump my raw thoughts into it and ask it to write a plan. This surfaces how it’s thinking about the problem. Often, it’s different from how I thought about it. Sometimes it’s better. Sometimes it’s worse. Either way, I can correct course before any real work happens.

Then there’s precision. Bad PMs say things like “use the customer info” and “make sure it works”. Good PMs reference specific database fields and define exact test cases. The gap between vague and specific is where most miscommunication lives.

LLMs are, if anything, more sensitive to this than humans. A person will ask clarifying questions. An LLM will just guess. And its guesses, while often plausible, are frequently wrong and misaligned with your requirements.

I’ve started breaking complex tasks into steps for this reason. When I was updating my personal website, I didn’t ask Claude Code to redesign the whole thing. I asked it to audit the current structure first. Then propose changes. Then update the navigation and so on.

The last piece is a review. Code gets reviewed before it ships. Product specs get reviewed before anyone builds them. But most people treat LLM output as final the moment it appears.

One trick that works is role switching. I have the LLM do the work as a “developer” or “analyst,” then evaluate its own output as a “senior engineer” or “manager.” This forces a change in perspective and catches real problems.

In the current scenario, I believe being AI-native has nothing to do with being technical. It is about developing the ability to figure out what you want, communicate it precisely, and verify that you got it.

It can be done by anyone who has spent years translating fuzzy information into concrete instructions. Engineers. Product managers. Editors. Consultants. Anyone.

I don’t know if this will last. Maybe LLMs will get good enough at reading intent that precision won’t matter. But right now, the bottleneck isn’t the model, it’s our ability to say what we mean.