Want to know how generative AI works?
Imagine a newborn child. Now, just for fun, imagine that this child – we’ll call him Karl – is born with the ability to read. I know, no way, but suspend your disbelief for just a second.
So Karl can read. And by the way, he can read really, really fast.
Now, just for fun, let’s give poor Karl the entire contents of the internet to read. All of it.
Task done, everything Karl knows is from the internet.
Most infants learn basic, foundational things as they grow up. “Hey look, I’ve got hands! Oh wow, feet too! The dog’s got four legs… and a tail…and it barks!”
But Karl never learned these things. Karl only knows what he read on the internet. So if we ask Karl to write an RFP (Request for Proposal, a common business document) that’s like others our company has written, he’ll probably do a fantastic job. Why? Because he’s read zillions of them, knows what they look like, and can replicate the pattern.
However, Karl can’t get common-sense relationships, as Gary Marcus elegantly pointed out in this blog post. As he notes, Karl may know that Joe’s mother is Mary, but is unable to deduce from that fact that (therefore) Mary’s son is Joe.
Nor can Karl do math: ask him to calculate 105 divided by 7 and unless he finds that exact example somewhere in the vast corpus of the internet, he’ll get it wrong.
Worse, he’ll very authoritatively return that wrong answer to you.
That’s a loose analogy of how Large Language Models (LLMs) work. LLMs scrape huge quantities of data from the internet and apply statistics to analyze queries and return answers. It’s a ton of math…but it’s just math.
In generating an answer, LLMs like ChatGPT will typically create multiple possible responses and score them “adversarially” using mathematical and statistical algorithms. Does this look right? How about that? Which one’s better?” These answers, however, are tested against patterns it finds – where else? – in the internet.
What’s missing, in this writer’s humble opinion, is an underlying, core set of common-sense relationships – ontologies to use the technical term. “A mammal is a lifeform that gives live birth and has hair. A dog is a form of animal. A horse is a form of animal. Dogs and horses have four legs and tails.” And so on.
LLMs need what is called a “ground truth” – a set of indisputable facts and relationships against which it can validate its responses, so that it can – the word “instinctively” comes to mind – know that the mother of a son is also the son’s mother.
Microsoft claims that Bing Chat leverages Bing’s internal “knowledge graph,” which is a set of facts – biographies of famous people, facts about cities and countries, and so on, and this is a start, for sure. More interestingly, Cycorp, which has been around for decades, has built enormous such knowledge bases. And there are undoubtedly others.
What I’m advocating is that such knowledge bases – facts, properties, relationships, maybe even other things (like Asimov’s Three Laws) underly LLMs. In the adversarial process of generating answers, such knowledge bases could, in theory, not only make LLMs more accurate and reliable but also – dare I say it – ethical.
(This post was inspired, in part, by this marvelous paper by the late Doug Lenat and Gary Marcus.)