Sunday Links: AI marriages, voice, and tools for AI agents

Dystopian robot supervision, Amazon Alexa, and tools for agents.

Steven Willmott

01 Mar 2025 • 3 min read

After a side-quest last week with two posts about the AI Engineer Summit (see a third post on the Safe Intelligence blog this week), we're back to normal service this week. Here are some of the AI stories I found most fascinating this week:

Claude and Alexa+ tie the knot. There's not much detail on exactly which part of the new Alexa+ experience Claude is powering (Amazon's demo stream post mentions both Amazon's own Nova model and Anthropic). No doubt it's part of the more interactive voice interface, at the very least. Amazon's Alexa and Apple's Siri have been the forgotten voice assistants for the past two years while new models grabbed the limelight. It will be interesting to see if, given Amazon's huge hardware footprint, they can grab user share back and stop people from going to ChatGPT for conversations. This will be a distribution v's innovation battle. It seems unlikely that Amazon will win, however, unless they manage to make Alexa compelling enough for people to use it on all their devices, not just Alexa hardware.
Crossing the uncanny valley of Voice. As a fun follow-on to a story about Alexa, it's worth trying out Sesame Labs' latest demo. Their voice model tries really hard to bring emotion into the conversation. If you think AI will always be represented by a flat, unemotional, droning voice, think again. It's worth trying out.
Schiphol tests self-driving baggage vehicle. Another echo from the future. Airport airside environments are controlled and highly marked up, so it's a perfect place to use self-driving autonomous vehicles (as long as they are good at obstacle avoidance!). This week, Schipol Airport in the Netherlands started testing an autonomous luggage delivery vehicle. How long until no human touches a plane during its airport turnaround?
Y Combinator deletes posts after a startup’s demo goes viral. Optifye.ai's demo showed off AI vision-based systems that detect human workers not engaging in work in a factory setting. AI supervision of humans clearly strikes a deep dystopian chord with many people. There are already many companies and systems that do this, however - not just in factory settings but also in terms of desktop activity monitoring. It seems to me that the negative reaction has less to do with the AI in the loop and more to do with the obvious conclusion that people are being treated as subhuman machines in these work settings. Perhaps with better automation we can enable humans to take on more fulfilling tasks that don't have to involve robot supervision.
ElevenLabs now lets authors create and publish audiobooks on its own platform. It seems inevitable that voice AI will begin to encroach on the territory of human voice actors. ElevenLabs now offers the ability to voice an Audiobook and publish it to Spotify for distribution. It's sad to see another valuable performance niche get attacked, but it seems inevitable that AI will take up the bottom 70-80 of such markets as a matter of course. Leaving the most skilled voice actors likely to be able to claim a premium for a while, but many others possibly not being able to participate in the market any longer. It's also interesting that ElevenLabs is choosing to compete with some of its customers directly to offer this service.

There are also two links that I think illustrate a powerful new type of infrastructure (and, I think, a new type of player in the Internet stack):

Exa.AI. Exa is a search engine designed to be used by AI (LLMs today). Search is more semantic in nature and does well with prompts that have lots of context. The results are well-structured lists and tables.
Browserbase.com. Browserbase provides highly efficient headless web browsing on demand via API. Why? Because AI needs to browse the web too, and it makes no sense to run a fully fledge copy of an operating system and Chrome just does that.

Both of these companies represent a new class of tools that AI systems will need to use as they interact with our online digital world. The first LLMs ingested web information from a snapshot in training. It took several iterations to add a browser extension to actually get live web data. These tools for AI agents will very likely multiply first and will stretch the infrastructure of the web. For a nerdy deep dive into the topic, listen to the latest Latent Space podcast episode, which features an interview with Browserbase founder Paul Klein.

Finally, in non-AI news, the news came this week that Microsoft would be shutting down Skype in March of this year. When Skype launched in the late 90s, I remember being shocked at how people I knew in the Telecoms industry did not understand how Skype was creating an amazing new layer above their products. Maybe its founders can buy it a third time and revive it again as a human / AI communications network.

Wishing you a wonderful weekend.