Saturday Links: Agent Wars, Copyright Fightbacks, and Abstract Reasoning for LLMs
One day late again, apologies! Travel has been heavy these past couple of weeks, so it's been tough to get ahead of the curve enough to post on time. It's been an interesting week with big shifts that could have long-term implications but muted commentary.
A few things that stood out:
- ChatGPT accelerates toward task automation. OpenAI is apparently working on a layer within ChatGPT that will allow the automation of complex tasks over backend services. I already commented on this task-centric view of AI chat interfaces in previous posts, but chat interfaces will become an important vector for many personal and work tasks. By grounding these in back-end services, the interactive nature of chat/voice commands can be grounded in concrete actions. (The Rabbit R1 approach is a great example of this.)
- OpenAI ChatGPT @mentions. On a very related point, OpenAI released the capability for you to @mention a custom GPT from its app store in any chat GPT message. This means you can invoke (for example) a Kayak app/GPT inside a chat GTP prompt: "Please use @kayak to find all the flights between New York JFK and London" this weekend. Right now, there are some limitations (only one @mention per query), but one can also imagine it will quickly allow things like defaults "Set @kayak to the dedicated GPT for all travel queries". This is super convenient and part of a trend to make the single chat interface more and more a unified window on the digital world.
- HuggingFace and the rebel resistance to ChatGPT. This is similar to the ChatGPT App Store but allows the creation of GPTs (Agents) against any LLM model. You can use any model and build GPTs against it. This is a powerful and interesting play. If Hugging Face can standardize easy GPT creation, it may be able to put up some resistance to the dominance of big vendors building the largest potential ecosystems around their proprietary models. The big challenge, however, is that the hugging face does not have a sizable user audience; this will probably turn out to be crucial. For these GPTs to see a lot of usage, they'll need to be easily accessible to end users on their phones, desktops, and elsewhere. Without that user pull, it will be hard to compete with ChatGPT, Google, and Apple personal assistants. Hugging face may be more successful for business/technical use cases, though.
- EPFL and Meta AI Proposes Chain-of-Abstraction (CoA). LLMs can sometimes get lost in the details of what they are reasoning about. I've mentioned before that we'll ultimately want to combine world models with LLMs to keep answers in the known bounds for whatever bound an LLM is operating in. A good way to do that is to add layers of abstraction to query handling. This paper by EPFL and Meta researchers does this. By interleaving LLM processing with system functions such as performing math, they show answers can be significantly more accurate. A word of warning here is that a big challenge will be knowing when to invoke different layers of processing.
- More legal submissions in AI copyright cases: AI Image generator firms hit back with their own arguments in the large AI copyright case in the US brought by artists. Venture Beat's article provides a good review. It seems to me that where this will very likely end up is that such image generators enable copyright infringement but don't themselves represent it (they really don't store copyrighted images). As such, users will violate copyright or IP protections if they produce certain types of images and use them commercially. Image generation services such as Midjourney may be required to add additional prompt filters and controls but not shut down. Further, I think they'll likely allow certain artists to Monetize but allow prompts using their names.
Wishing you a wonderful weekend.