Friday Links: JSON, Agents, and the future AI Maintainers

Friday Links: JSON, Agents, and the future AI Maintainers

Here are this week's links.

  • Github launches the ability to run and test AI Models. Github has retained its developer-friendly credentials despite having been owned by Microsoft for a long time now. This week saw the launch of the ability to run a wide range of AI models directly from your Github account (including LLama, Mistral, OpenAI, and Microsoft's own Phi models). Once you've experimented, you can move the working application directly over to Azure to run it in production. This is a pretty clever onboarding funnel to use Azure for AI inference.
  • OpenAI adds better handling of structured outputs in the API. The OpenAI API has supported JSON formated outputs for a while (this makes them parsable and hence easier to include automatically in applications), it often didn't adhere to any JSON schema submitted with the query. In other words, results would come back structured in valid JSON but in a different arrangement from the one asked for. This new update promises to fix that. Developers can now set strictness in output to be true and supply a response format JSON schema. Now what we should do is couple this with https://schema.org/ formats + other crowdourced, copyright free formats for standard return types. Using widely shared schemas will make it easier to connect up many different apps to AI.
  • Diffusion Models as Data Mining Tools. In this paper, researchers describe an approach that uses diffusion models (like those used to generate images) to extract common visual elements from a data set. The data set images were classified with common tags such as car date of manufactor, year of the image, country and so on. The hypothesis is that the diffusion model is actually learning a compact representation of the image data set and hence using common image features to indicate which tags apply to an image. This internal representation can be used to return the key elements of the images that lead to one classification or another (e.g. [from the paper] the types of glasses worn in a portrait images being indicative of the year.). The approach is potentially much more scalable than existing feature mining approaches and can be used across multiple data sets rather just one. The PDF is here.
  • Build Agents with LLama 3.1. The Llama 3.1 release from a week or two ago was a big step forward in terms of capability for open source LLMs. What wa sless visible was the fact that Meta also released a number of supporting tools (it's "stack") that wrap around the LLM to turn into something that can do multi-step reasoning and trigger actions via APIs. If you're building an AI powered app you may well need a lot of these elements of the stack so it's great that Facebook is releasing them. Note though that it does mean you're likely dependent on the LLama family of models going forward.
  • The Soul of Maintaining a New Machine. I'm not sure how I stumbled across this long-form piece, but I love it. This chapter of Stewart Brand's work-in-progress book "Maintenance: Of Everything" tells the story of the legions of Xerox copier maintenance engineers that keep the company's complex machines running all over the United States. Each new model was more complex and introduced more failure modes. User behavior was often a big part of the problem, and the teams developed their own knowledge-sharing to stay on top of not only emerging problems but the idiosyncrasies of individual machines. After reading this, my thought was immediate: will we soon have a cadre of roaming specialist AI maintenance engineers who look after the functioning (and sanity) of our increasingly complex AI models?

Wishing you a great weekend. I just hope you aren't stuck trying to fix your photocopier (or, more likely, your printer).