Sunday Links: Harvard's book data set, RIP Cruise, and did Sora train on Twitch streams?

Open source book collections, autonomous driving takes a step back and OpenAI SORA

Steven Willmott

15 Dec 2024 • 3 min read

Thank you to everyone for your patience that these are arriving on Sundays these days. Hopefully, when things calm down a bit, I can get back to earlier in the weekend. Anyway, another busy week:

Amid lawsuits and criticism, Character AI unveils new safety tools for teens. Character.AI is one of the biggest AI chatbot services available at the moment (20M+ monthly users) and is subject to a number of lawsuits centered on inappropriate and potentially harmful content. These new rules provide a new model that tones down responses on certain topics. The problem here seems deep, however: character.AI wins by increasing engagement from the user (90mins+ daily for the most active users). If the user has certain hopes, fears, problems, fantasies, etc., it is in the interest of the company to feed these in order to build a strong bond and retain the user. Yet, that seems likely to lead to addiction and highly nuanced problematic content. It seems very unlikely that a few firewalls and filters will fix this. Worse still, the richer and more complex the model, the harder it is to control.
NotebookLM gets a new look, audio interactivity, and a premium version. NotebookLM is still one of the standout uses of AI this year. The ability to turn information into an interactive source and then generate a podcast is extremely powerful. Google says that so far 350 million hours of audio content have been created. The new premium version expands usage limits and seems to have one headline feature that allows you to join the audio conversation with hosts to ask questions or add explanations. This seems quite interesting and powerful, but it wasn't necessarily on my wish list. More powerful would be controlling the tone and voices of the AI actors. They have become very recognizable now. Also, when I think about joining a conversation, it seems like you'd want a way to control people's roles. Who is the host, who is an expert, etc? I'll definitely be trying this in the new year.
Harvard Is Releasing a Massive Free AI Training Dataset Funded by OpenAI and Microsoft. To fuel your NotebookLM and LLM dreams, Harvard will be opening a large collection of high-quality text from out-of-copyright books as open source. Microsoft and OpenAI are supporting the effort and argue that this will help build better models. At the same time, many people, including Seth Godin, worry about the fact that more immersive AI experiences might take people further and further from reading books.
OpenAI Sora launches and generates wows, but are they using video game footage as training data? The new video generation tool from OpenAI is producing some amazingly fluid, detailed videos from prompts, even though there are still obvious flaws on show. This illustrates just how hard video generation is. A really interesting set of observations from Techcrunch's Kyle Wiggers is that it is uncannily good at creating video game footage and Twitch-like streams (including down to individual well-known Twitch streamers). This seems to indicate gameplay Twitch streams were likely part of the training data. This risks creating a New York Times-style lawsuit from any number of parties (the streamers, the games companies, or Twitch itself. It makes sense to use video games in training since there are some amazing games, and the physics is clearly on show. One might even expect training data to come from playing the game and recording the results. Copyright might end up being a very big issue here.
RIP, Cruise Robotaxi. The unit is being folded into the car divisions to further self-driving there, but that will no doubt be very different from a fully operational service. Sadly, GM has decided to fold its Cruise Robotaxi service. Experiencing it was truly amazing (I did several times in San Francisco and Austin). The future was definitely visiting. The unit is being folded into the car divisions to further self-driving there, but that will no doubt be very different from a fully operational service. There are still others carrying the torch (Waymo, Tesla....), but it's a shame to lose a pioneer even if it did cost GM $9B.

In other news, Google also announced a big breakthrough in Quantum Chip design. Maybe AI hype will be subsumed by Quantum hype very soon? (Unlikely, we'll just get Quantum AI).

Wishing you a great week!