Saturday Links: Claude prompt caching, ML as pachinko, and Google Assistant

Links covering everything from caching to Google's new assistant.

Saturday Links: Claude prompt caching, ML as pachinko, and Google Assistant

Here are this week's links:

  • Claude releases prompt caching, aka built in RAG? Using cloud-based AI solutions such as OpenAI and Claude means sending prompts across the Internet whenever a new session is started. The new cache controls allow developers to store selected prompt elements on the server side and reuse them in future sessions. This can reduce call latencies and costs significantly (though Claude still charges for the used of cached fragments at a lower rate). It's a little like having a built-in RAG feature. Baby steps to full memory agents?
  • Current LLM Pre-training and Post-training Paradigms. On the more technical side, this is a useful summary by Sebastian Raschka on his substack that runs down how the current generation of open LLMs is trained. There's nothing extremely surprising but interesting to see how uniform the approaches have become + some of the optimizations creeping in (distillation, synthetic data, and specific long context window training, for example.)
  • A Fresh Look at Nonlinearity in Deep Learning (Harys Dalvi). With more tech, this post gives a nice, intuitive description of what is actually happening in a neural network with non-linear activation functions (which essentially all interesting networks are today).
  • The Transformer Explainer (from IBM). Sticking with the theme, this is a very cool visualization of how transformers (the key innovation that makes LLMs work so well) actually work. The playground downloads and runs a GPT2 model in your browser, and you can follow the prompts. (There are examples while the download completes.)
  • Google announces a new Android Assistant. At Google's Pixel 9 launch event, the company announced that Google Gemini AI would now power its on-device assistant. This will involve calls to the cloud for complex queries and purely private on-device activity based on Gemini Nano for high-security queries. The company also emphasized that it plans to ship this with Android so other device manufacturers can use the same setup. This is a salvo in the brewing personal assistant war. Google wants to make sure you make friends with its assistant wherever you are working and don't get tempted to spend too much time with ChatGPT or other challengers. The "add-me" feature that allows you to add yourself into a photo is just downright weird.

Wishing you a wonderful weekend!