Friday Links: Multi-lingual podcasts, OpenAI's PR department and LLMs v's RAGs
Another batch of reading links for the weekend
No long post this week due to some holiday days, but no skipping Friday Links. Here they are:
- Spotify pilot’s voice translation for podcasts. This is interesting not because Spotify figured out voice translation (others have, and all are working to make it better) but because of value chains. If you’re a startup working on en-masse (or even real-time) voice translation, part of the obvious market would be podcasters who want to grow international audiences. What if the distribution already has that embedded for free or a small fee? That potentially collapses a significant chunk of the market. The platforms that control distribution still have a lot of leverage if they care to use it.
- DeepMind: AI large language models can optimize their own prompts. This is more evidence that we’ll increasingly be able to build more effective models with smaller datasets (see last week’s post). There’s a limit, and models are still fragile in many ways, but there is still a lot of squeezing to be done.
- Fine-tuning an LLM v’s RAG. Daren Cox has a great in-depth post that dives into the two main ways of deeply customizing a chatbot AI system experience. “A bit of both” is likely the best answer for any long-term applications you plan to run.
- ChatGPT can now see, hear, and speak. This is the ChatGPT announcement headline for their new voice and image modalities. Deliberately anthropomorphizing their product doesn’t do anyone any favors. We should not consider LLMs as “agents”; they are far from that. Maybe the team likes being invited to senate hearings on AI Risk.
- Getty Images launches Generative AI based on its image library. This is definitely one answer to the copyright question: players with large libraries of something can launch AI based on their collections (Adobe has done the same). Now the question will be, how do such collections refresh? No doubt it will be more expensive going forward. Worse, if we end up with ten image generators, each of which is trained on some small subset of images, they’ll likely all be relatively poor. What’s needed is an open platform that artists can opt into and also be paid.
Have a great weekend!