Saturday Links: Multi-Agents, Reasoning, and Fairness

Multi-agent systems are coming, but there are some lessons to learn from the past.

Steven Willmott

19 Oct 2024 • 3 min read

After last week's AI Nobel prizes, we have nothing quite as groundbreaking this week. Nevertheless, AI is still moving forward fast:

NVIDIA releases a new fine-tuned LLM model based on LLama3.1. NVIDIA has been doing this for a while (presumably, they need to test their GPUs :-). What's interesting, though, is that the model is based on LLama. That's the logical choice if you are making fine tunes, but I wonder if pushing one model too hard might upset other customers. Secondly, newer versions of LLama aren't available in the EU, so NVIDIA's EU demos might get stuck in the dark ages. You can talk to the new model here.
Sequoia Capital: Generative AI’s Act o1. Good post, terrible subtitle (and nomenclature in general). This post from Sequoia Capital details some of the reasoning advances in ChatGPT's o1 Strawberry model and how they will improve inference and then segue into using LLM-powered agents to take actions rather than generate text. Both things are important, but it's disappointing to see a subtitle like "The Agentic Reasoning Era Begins" or the suggestion that these things are necessarily linked. "Agentic behavior" is just taking action on behalf of another - your ring doorbell is an "Agent" that tells you when someone is at the door. Many extremely small, simple AI automated systems already exist in the world. OpenAI's o1 iterative reasoning is powerful, but it isn't required for many agent-like functions to come, and neither is it the only way to improve LLM reasoning. Hype and buzzwords got a bit ahead of good content on this one.
As a continuation, the number of new models being released each month is wild (last week there were releases from Mistral, MIT, Apple, and others - all pushing the edges of LLM architecture to improve various aspects of efficiency and performance.
Evaluating fairness in ChatGPT. This report from the OpenAI research team finds that in small numbers of cases (1 in 1000 for some prompts), there are variations in how ChatGPT answers questions depending on the identity of the user sending the query. In particular, responses were sometimes kinder and friendlier for users with female-sounding names; for male-sounding names, they were more technical. This is true in particular for older models (so the behavior seems to be reducing.). No doubt, this will have some people jumping up and down with outrage, and it is probably something that should be fixed or, at least, be correctable. Having said that, it's good that the OpenAI releases studies like this, and it actually surprises me that the bias isn't stronger. This bias seems inevitable because the models are trained on the existing corpus of human stories and news. The long-term fix here seems likely to be the models adapting to individual users over time so everyone can tune the tone to their liking.
Lastly, a topic close to my heart is the "reinvention of Multi-Agent Systems." Just two sample announcements from this week: Langchain Multi-Agent Systems and OpenAI's Swarm. Both are frameworks for connecting up multiple different LLM-powered "agents." These can work independently but also communicate with each other. Both have a simple communication mechanism, and both touch on the problem of the content of those communications. (What if "schemas" are different.). It's obvious that using AI in multi-agent settings will be important and powerful, so it's great to see the beginnings of frameworks. However,... there is an entire research field dedicated to building these systems well, from the coordination problems that arise to how to structure communication between agents. Not all of it is critical at this juncture, but hopefully, we can learn from that work and not just re-hash it. A good example is in communication. The problem that needs to be resolved is how agents not of the same type (and more generally not using the same underlying LLM) talk to each other. A large body of work on specific uses of language that help structure intent in communications. We're likely to need this to stop agent-to-agent communications from going completely off the rails.

Wishing you a great weekend!