Why Google might win? (3/5)
On data
Scaling laws have shown, consistently, that model performance improves with data. The frontier labs have been scaling models and data together, and the public internet is the shared corpus everyone has been drawing from.
A data moat should largely come from having access to high volumes of proprietary data to train on. Google does sit on a huge stockpile of private data across its various services but so does every other major tech company.
Meta has the social graph. Microsoft has two decades of enterprise productivity data. Apple has billions of devices generating behavioural signals that no one else can access, and by design, cannot aggregate them centrally for training.
I believe that Google stands out because of 3 unique data properties that it possesses: completeness, multimodality, and compounding.
1. Completeness
Most publicly available AI training data captures a fragment of human cognition. Wikipedia captures knowledge. Reddit captures opinion. GitHub captures code. These are useful snapshots of thought but not behaviour.
Google can trace the full arc of a human decision and map it accurately to individual users.
Search captures real-time intent in natural language.
YouTube captures what people watch after they search, revealing the interest and learning signals that sit between intent and action.
Chrome records browsing behaviour across the entire web, not just Google properties.
Google Analytics, installed on roughly 85% of the web, captures what those same people do next: what they buy, where they click, how they behave commercially.
Google Pay captures direct transaction data, closing the loop between intent and purchase.
Gmail captures the downstream evidence of decisions: order confirmations, travel bookings, subscription receipts. It sees what people did after they decided.
Nest and Google Home record domestic behaviour: when people are present, how they live, what their environment looks like.
Maps and Waymo record where they end up in the physical world.
Fitbit captures biological response.
No other competitor comes close to this breadth.
A model trained on intent alone becomes better at predicting what people want. A model trained on intent and outcome becomes better at helping people get there.
2. Multimodality
Text has historically been the dominant modality of AI training and all frontier labs excel at it.
Google’s advantage is structurally significant in the other modalities.
YouTube has billions of videos where audio, visual, and text describe the same moment simultaneously. Videos carry a spoken track, a visual frame, and an auto-generated transcript timed to the millisecond.
Street View and Waymo add spatial and physical understanding at a scale no competitor has approached. Street View is millions of miles of continuous visual imagery, each frame tagged with precise geographic coordinates. Waymo layers in LiDAR and real-world sensor data from live driving conditions. Together they give a model something close to a physical map of the world–as it actually appears.
Google Photos holds trillions of images, each tagged with time, location, and the objects and scenes they contain.
Google Translate has produced the most comprehensive multilingual parallel corpus in existence across 100+ languages, where the same meaning is expressed in structurally different linguistic forms simultaneously.
The unique characteristic here is how the modalities align–audio, visual, and text describing the same event. That alignment is what lets a model learn the relationship between modalities.
This can’t be assembled on demand. It accumulated over decades because the products required it. That is why it is structurally different from data assets built with AI in mind.
3. Compounding
Google’s data grows because it is generated by use. The behavioural signals embedded in that use provide a form of continuous feedback at a scale no lab can replicate through paid annotation. Every interaction is grounded in what a real person actually did, not what a model predicted they would do.
That distinction matters because of what scaling laws tell us about data. As frontier models scale, the data requirement scales with them. The public internet, which every major lab has been drawing from, cannot keep pace. Common Crawl, Wikipedia, arXiv, GitHub are finite reservoirs, and researchers estimate they will be effectively exhausted for training purposes by 2026.
The obvious response is synthetic data. If you run out of human-generated text, you generate more using existing models. This works up to a point. Models trained on successive generations of synthetic data tend to drift from reality, a phenomenon researchers have started calling model collapse. Google’s live behavioural feed reduces this risk because a significant portion of its training signal is anchored to real-world actions. It is not immune to the problem, but it is structurally less exposed than labs whose pipelines depend primarily on synthetic generation and licensed text.
The first generation of AI products, chatbots, copilots, summarisation tools etc. required models that understand language well. The next generation requires something harder. Agents that book travel, execute multi-step research, navigate physical environments, and handle purchases need models that understand how the world actually works: spatial relationships, causal chains, how decisions play out over time.
OpenAI and Anthropic are approaching this from text, trying to infer physical and causal structure from written descriptions. Google has that structure already embedded in its data, accumulated over decades of building products that operated in the physical world. These are different starting points with different ceilings.
Here is where the argument needs to be tested.
Google is not moving fast enough on agents.
Anthropic has built Claude Code, and Cowork. Just this week, they announced that Claude now has the capability to remotely trigger tasks on a computer.
OpenClaw took the internet by storm and got acquired by OpenAI.
Perplexity launched Perplexity Computer.
These are small, fast teams shipping genuine agent products at a pace that Google has not matched. Gemini is technically capable, but Google has not yet translated its structural advantages into a category-defining agent product.
While chat has low switching costs, for agents, they could be significantly high due to the complexity that they deal with. Google needs to translate this structural advantage into embedding users deeply into agentic products and workflows. And it needs to do it quickly.

