Does ChatGPT have a product in it?

Columbo the Detective and Robby the Robot

I think not. I discussed why there’s low value to be captured by modeling data in Where’s the Product in AI?. Using a Large Language Model approach to ask questions or frame answers does not alter that analysis.

There’s been a dream of talking with computers since the first movie director to consider one realized that text printouts and panels of blinky lights make poor actors. Unfortunately, real computers want more precision than people generally use in speech, so in-product implementations of those dreams are limited to toys like Intercom, Alexa, Siri, and Cortana.

Meanwhile, people who work with computers learn a DSL (Domain Specific Language, such as SQL) for the purpose of asking the computer questions and giving it commands. That situation is regarded as ripe for disruption, so academics and vendors offer expression builders and natural language processors and fuzzy logic interpreters. Those tools help a bit but fail to actually disrupt. Because they cannot capture the full complexity of a human’s question or reasoning, they can only work in simplistic use cases. They certainly can’t handle framing a NORA (No One Right Answer) question. They become features to larger products, on-ramps and helpers that are only used to aid infrequent or new users. The power users of the product will stick with the DSL, and since new users go to power users for help, the new users are also encouraged to skip the training wheels and just learn the language.

But wait, all those prior art examples I just linked were focused on the problem of translating human input to a machine language. While GPT tech has use cases for that, it also appears to do something new: translation of machine output to a human language. Is that sufficiently interesting to make a new disruptive product?

No. Unfortunately, a large language model is not actually doing translation… it is pattern matching to what it has seen on the Internet. It is a form of creation: new strings are produced that line up with prior inputs to the model. This is just iteration on the Markov bots. What we’re seeing is a form of sleight of hand: systems designed to take advantage of human propensities to pattern match and anthropomorphize. While boosters wave away any flaws with a promise that the tech will mature, in my opinion it has already hit maturity. It’ll get smoother, just like Penn and Teller are smoother than Houdini, but it’s entertainment rather than production.

Let’s take a step back and ask what we want when we say “translate machine output to human language.” Is the goal to turn a table of numbers into a graph? That’s already a thing, but you can’t get Cortana to read a graph to you. So the goal is more like, turn this table of numbers into a sentence… and that takes reasoning. This is where pattern matching breaks down, because the model cannot tell the difference between a correct and an incorrect answer. It’s showing what’s more probable according to its data set, which makes it prone to being misled. The result is not an answer producer, it’s a bullshit generator.

Another way to think of that goal is automation. Instead of automating an “if you see this input, perform that action” stimulus-response pair, the dreamed function of output translation is an automated “if you see data that looks like this, perform that transformation and report the output”. Table of time-sequenced numbers goes down? Then feed “the $y_legend values are going down over the &datediff(&latest(x), &earliest(x)) $x_legend period” into a text-to-voice processor. I’m sure anyone who’s written any sort of script is cringing right now at the number of assumptions glossed over by that pseudocode sentence. What if the numbers in the set go up at first and then down? What if the legends are wrong, or missing, or improperly pluralized? What if the data set mixes quarterly and monthly numbers, or is missing a row, or one cell is accidentally formatted as a string? Are we in a type-sensitive language or not?

This sort of assumption handling is why automation is another area that struggles to find product traction. People can’t easily handle automation because they can’t define context clearly enough to handle real-life context changes, anomalies, abnormalities, and edge cases. It is useful to look at where automation does succeed: narrowly defined use cases in which it’s a helper to a human. I trust my watch to handle alarm clock duties. I’ve written automatic street sweeping reminders. I use cron and ifttt all the time. Scripts are very useful for ensuring that all the parts of a complex task get done, and that boring follow ups are completed. I suspect there is a useful role for tech like ChatGPT as a helper to humans, on order with a spelling or grammar checker. I don’t think that it will be useful as a new search interface, or as a replacement for DSLs, or as a translator of data into speakable sentences. Handling these things takes human reasoning, which is why we celebrate Stanislav Petrov Day. Asking a bullshit generator to fake reasoning is not a good idea.


%d bloggers like this: