Saturday Links: No Elephants, $40B, and bot firewalls

OpenAI raises $40B, Image generation gets better, and bots spike site traffic.

Saturday Links: No Elephants, $40B, and bot firewalls

Here are this week's most interesting links:

  • OpenAI raises $40bn in deal with SoftBank that values it at $300bn. A GDP comparison is not strictly fair (this is a one-time raise, GDP is annual), but it still puts the size of this fundraise in perspective. $40B is around the size of the annual GDP in the 100th largest national economies in the world - currently, it would put OpenAI between Estonia and Honduras. The deal gives OpenAI $1oB now and another $30B if certain conditions are met so it does create some potential pressure and challenges down the line.
  • No elephants: Breakthroughs in image generation. Ethan Molick has a nice explainer of why the emergence of Autoregressive image generation is so important. TL;DR: Much more control and making it possible to build a real workflow.
  • LLM Embeddings Explained: A Visual and Intuitive Guide. This is a great technical but accessible post by Hesam Sheikh Hassani on how embeddings in LLMs work. Embeddings are the foundations for fine-tuning LLM models, which one often hears bandied about.
  • MCP: The Ultimate API Consumer (Not the API Killer). Kevin Swiber has a nice summary of how APIs and MCP actually play together rather than conflict. There is lots to unpack in an MCP world, are we really going to have arbitrary input strings sent between all our public-facing systems? The answer for a long time is going to be no, not without a lot of extra scaffolding and trust! We'll have a lot of API-MCP interplay to learn from since MCP is on Fire these days with almost every player launching MCP server solutions.
  • AI Bots and the Labyrinth. AI bots gather web data from sites (for learning or to use sites intended for humans). Unfortunately, the data gathering bots often do not respect the robots.txt do not crawl instructions. This turns out to be great business for providers like Cloudflare, who provide cloud firewalls that can intercept the bots before they get through to a site. Cloudflare has leaned in on this and added additional security layers to frustrate bots specifically. It seems likely to me that down the line this will need a protocol to identify "permitted" bots. At the moment we're building walls, later we'll want to put in side doors.

Wishing you a great weekend!