Sunday Links: FineMath, Robotaxi v's Delivery Bot, and can we just wait?
This week's links covering robotaxi accidents, what we teach LLMs and AI Fails.
Happy New Year! I hope you had a great start to 2025 and that it will be filled with lots of health, happiness, and success. AI. I think it'll be filled with quite a bit of AI whether you like it or not :-). Here are the links for this week:
- Hugging face releases FineMath - A data set for LLM Math training. Hugging Face is doing amazing work curating high-quality data sets for training. In many ways this is one, but I really question the premise of this. The idea is to help make LLMs better at mathematics (they are currently very poor at this). While we'd like LLMs to be better at everything they do in general, I really don't think this makes a lot of sense. We already have symbolic reasoning systems that are extremely good at mathematical problems. LLMs are going to struggle with deep symbolic reasoning for a long time. We might get them to do better some of the time or even most of the time, but getting them to do well all of the time is a herculean task. When looking at humans, there are some humans that have a propensity to enjoy and be good at math, and a subset of those become mathematicians and are extremely specialized. Even these humans, though, use calculators, computers, and other tools to do much of the heavy lifting of actual calculations. We'd very likely be better off teaching an LLM to use a calculator (or Wolfram Alpha) than to learn to do math reliably all by itself.
- A Waymo robotaxi and a Serve delivery robot collided in Los Angeles. I'm sure there have been other bot-on-bot crashes, but it's interesting to see them happening. It sounds like the delivery bot paused on the sidewalk just long enough for the Waymo taxi to assume it was inanimate. In urban environments, it would seem pretty logical to have a public standard messaging layer for all autonomous systems to broadcast beacon information: I'm here and planning to do XYZ. There are problems with it: it adds cost, there will always be objects not transmitting, the system could be spoofed to disable vehicles, and so on, but it seems like a logical safety layer that would be useful as autonomous vehicle traffic increases.
- Working paper: AI and Freelancers: Has the Inflection Point Arrived? I feel a little bad picking on one specific set of authors, but I've seen multiple papers and post that take a similar theme. These types of paper model potential benefits from AI and then subsequent replacement. Modeling is not easy, and it can have utility, but... with all due respect to those doing this work, the papers put a scientific/mathematical overlay on a very obvious result. As something increases in capability, it goes from being useless (no economic impact) to a support for the practitioner (uplift for those that use-it) to a full replacement of some or all practitioners (potential economic ruin). You could trace much of the same trajectory for farming equipment (none, plows, using plows with animals, using engines... all the way to full autonomy today). It's also clear that the trajectory will vary by type of occupation. What are actually more interesting questions here? Things like: 1) are new abilities unlocked for people up the value chain? 2) does the total economic value change? 3) do markets expand when prices drop? etc.
- Heed The Wait Calculation: Strategy From Ethan Mollick. Inspired by Ethan Mollick's philosophical point that for some things that are hard (in AI or otherwise), you might not be better off "starting now" but instead waiting until the technology gets better and starting later. A good example would be traveling to Mars or the stars. Science fiction is replete with human expeditions out into space that "overtake" the colony ships of previous eras that are traveling more slowly (spoiler alert, the colonists are almost always dead or have turned into Zombie, trip rating 1 star "don't recommend.") I've also personally seen this cycle many times (people trying to run Java VMs on early Palm Pilot devices only for hardware to 2x capacity in 12 months). There is definitely a set of tasks for which "waiting" makes sense (launching your own LLM building effort now, for example, seems like a low-return activity), but in general, it's tough to know what should wait. If you work in a frontier field, then when new things become possible you will be amongst the first to take advantage. Maybe you can even help make the breakthrough. We need people working on every frontier. The key to me would seem to be to be aware of how far away from real applications we are in any given area. OTOH, maybe if someone offers you a ride in a cryogenic capsule on an Alpha Centauri colony ship, perhaps give the first flight a miss! (P.S: maybe teaching LLMs to be amazing at Math is one of those things that can wait...).
- The biggest AI flops of 2024. MIT Technology Review has an article listing AI failures in 2024. What strikes me about this is how (apart from a couple) how mundane they are. AI is certainly making errors in some applications where it shouldn't, but in many domains, the utility already far outstrips annoyance with this. The truly damaging things on the list, in my view, are nude deepfakes and the rapid rise of inaccurate AI content on the public web. Both of these have become much easier with AI and are highly damaging. They are also not intrinsically AI problems. We need better ways to track and prosecute deepfake creators no matter what tools they use, and we need better ways to track the provenance of real content. AI is the wake-up call.
In a rather sad footnote, here's a link to The Beginning of the End for ANT+ Wireless, which is an example of how something that is principle good (EU regulation on data security) can end up helping to kill something else good (a workable open ecosystem of sports devices). Do we really need encryption on bicycle power meter information? Hopefully, Bluetooth and other standards will be re-animated to keep the field open.
Happy 2025!