AI hype is overblown.

Joshua Grant

18 Mar 2024 — 9 min read

I just watched a strange video published by Figure, a company that builds humanoid robots. They partnered with OpenAI for the video to allow the robot to respond to a human's commands.

The demo they showed was of a humanoid robot in front of a scene consisting of simple objects and a man standing awkwardly with his hand on the table. The man then asked the robot to describe the scene; at this point, a little yellow flag went off in my head because I thought, "There's no way that this wasn't staged." I think about how many takes they had to do, how many scenarios they had to devise, or how they had to precisely position the robot to pick up the apple. It was exactly the right spot. When the guy dumps the trash onto the plate, it's evident that this trash is easy to pick up. Uniform size, grippy, and light.

I see this and the other Devon AI demo, which supposedly shows an AI debugging code. My first thought is not "Wow, that's impressive," it's "Wow, this is such a scam." Technology can not, and will not, reach the level of human capability these demos suppose they're showing. As humans, we're really great at generalizing. For example, I don't just pick up this apple in this scenario; I can pick an apple from a tree. I can see an Apple logo and associate Apple with a computer. The fact that I can generalize makes my intelligence what I call general intelligence. But when we run into the situation of a robot, or like Devon, an AI as they call it, given this very narrow task, it does not quite feel like intelligence anymore as it does feel like a machine that was designed to fulfill that one task very well, or at least well enough to show a demo, without truly delivering the results that the demo promises.

Here's another video that they published of the robot making a cup of coffee. Notice that training took 10 hours:

When you look at what robotics and AI are trying to promise, you wonder if we're headed toward that goal. For example, the ultimate goal is for humans to build a humanoid machine. They can perform the same sort of labor-intensive that humans don't want to do, including intellectual tasks. Manufacturing away our most significant source of friction is work. You look at robots like Boston Dynamics and think, "Ah well, yes." there may be some benefit to having a robot that can move material from point A to point B in a dangerous and unpredictable predictable environment. No human wants to risk life or limb to be a pack mule.

But the robot Factor 01 looks very advanced. Yet, it still needs to be more practical. Factor 01 won't be able to come into your house and clean. It definitely won't be able to drive a car. The choice of materials is surprising because metal. This may be on purpose because the aim is to create an assembly line-style robot that looks humanoid. For example, something that could be seen in the kitchen. Still, I question the need for a humanoid-style robot.

For example, look at the start-up by Steve Ells. He's working on a restaurant named Kernel aimed at automating away challenging kitchen tasks through robotics. And this is robotics without the pretense. For example, it can cook your fries, distribute them into a container, and automate all that labor. It's not like these are necessarily tasks suited to the human form. At its most basic, yes, we have designed restaurants to be efficiently run by humans, but that doesn't mean that the only way to build a restaurant is to be run by humans.

Suppose you look at factories, which are machines explicitly built to manufacture specific products. They are highly efficient - much more so than humans. In fact, we have automated material production at such a scale that the entire world has been swept into a higher standard of living. This begs the question - why or what exactly are robots like Factor 01 trying to solve? What are they trying to fix? We already have a highly efficient way of mass-producing items. We already have purpose-built robots that we can use to make products. From the demo, Factor 01 still looks very purpose-built but more general. The question is how general it can get. Is a layer of OpenAI really enough? Is the scene so staged that the robot would have failed if they had replaced the plate with a piece of paper or a dish rack with a towel? Can the robot improvise, or was this a highly tested scenario? The thing is, it may be neural networks from the ground up, but that doesn't mean they will work in every situation. It's the equivalent of asking Siri something, and she has a canned response. You may say what the time is to Siri, but you can't really do much more. If you try to actually treat Siri like a human, you notice very quickly that it's impossible.

You may be thinking, "Oh, Siri, she's been around for a while; this is a big step up." But if you look back, we've had voice assistants since the 1990's. I remember on my first Macintosh, I could ask it to tell me a joke or perform an OS-related task. The only difference between previous iterations and this one is that we have just added more scenarios that are supported - but the problem always will be that the rate at which we can add new scenarios is so much slower than the rate at which the world will be progressing. There's always a lag and a huge gap between what humans want to experience with AI and what they can. For example, OpenAI has to train their data, which involves a lengthy process of web scraping, akin to an entire Google web scrape done at a specific point in time, and then all of that data, terabytes, is fed into the language model, where the outcome produces the latest version of ChatGPT.

You'll notice that there are issues with this; for example, when you ask ChatGPT something relevant, it has no opinions. In fact, it can only remember six months ago. You can't have a conversation with it. It can't remember who you are. You can't build a relationship with it. But even more than that, you can ask nonsensical things, and It will still respond in a statistically probable way.

When you look at a robot like Factor 01, you wonder how many takes there were. Was it truly an improvised interaction? If so, did they cherry-pick the results?

We have AI generators like Midjourney, and if you look at the pictures these image generator websites show, they look fantastic. Still, you'll notice that when you start generating images, there's a problem. The images make sense at the mechanical level, but there's something usually off about hands, for example, or perhaps the understanding of the scene. It's best when they're trained on photography data, but will they ever truly understand human anatomy? My position is no, not with this iteration. This iteration of AI is promising something that it can't deliver on.

When you see an image of a beautiful landscape, the AI says, "Ah, I bet this is statistically the most likely configuration of pixels that matches this prompt." It does this by scraping the web and having a vast sampling of images. We're talking billions of images; essentially, it's an AI trained on a supermassive data set.

The problem is that iterating on this approach will not get us to true AI, so the promise that it will improve is false. It will improve in some ways. It will improve by being faster. It will improve when we give it more sample data. It can make more accurate predictions, equating to more precise images. It will improve by having more sample data, leading to better adherence to the prompt that you give it. Yet, improvement will continue to have diminishing returns where even increased investment will yield fewer results. Eventually it'll succumb to enshittification like every other productized service.

But the ways it won't get better are perhaps the most eye-opening. It won't get better to where it can draw something using brushstrokes. It won't get better than a photograph by understanding lighting, physics, and scene composition. Now, it can borrow the probability of these things from other photos, but it can never truly generate that. And we will soon become frustrated with these limitations.

That's why when you see AI-generated art, there's something always off-putting about it. Humans are very good at noticing how AI generates content. OpenAI has the same problem. With the current technology, it will never be able to remember your name. It will never be able to be given something like an image and produce its own thoughts. It will never be possible to actually know something is true. Instead, it statistically predicts the likelihood that specific words will follow other words in their response. But it can't ever be certain whether those words are correct.

This points to the root of the problem - our current evolution of AI does not think. Well, what is thinking, you might ask? Thinking has been at the root of the AGI question for decades. And it used to be the consensus that chess required thinking. But the problem is that we built a purpose-built machine that was so good at doing chess it didn't have to think. People responded by saying, "Oh, OK, yeah, perhaps our goal wasn't well defined enough because chess requires thinking, but also yes, you can do it without thinking. If you have a table of all of the possible moves and the one that is the most likely to lead to victory, that technically works, but it's not thinking, so we moved the goalposts, and that's probably fair because it's not brute forcing chess is actual thinking." So we said, "Well, if you can hold a natural conversation with a human, surely that is thinking," and so machine learning experts got on the task, and they found a way to brute force conversations with humans by scraping the entire Internet for every conversation that's ever been recorded on the Internet. And meshing them into something that responds with the most likely compilation of words. You know, it was shocking when OpenAI came out, and you talked to it, and it was saying things that seemed convincing and human. But later, it became clear that it wasn't thinking at all. It's just brute-forcing the conversation, a purpose-built, massive industrial machine churning out products of sequences of words.

So what do we want? What is the actual purpose of AI? Since the beginning, it's always been to outsource labor and intellectual tasks or intellect-intensive tasks to an assistant or even something that was more capable than we are so that we could focus on stuff that we really wanted to do. For example, I don't want to work at a restaurant, I don't want to cook food at home, and I don't want to do chores. I don't want to be cleaning, and I don't want to be repairing things or doing dangerous work. I only do these things because of the economy or because I have to. Not necessarily because I want to. And yes, there are some intellectual tasks that are fun to do, and there are some that are not. Some that I'm good at and some that I'm bad at.

And I might enjoy coding, and making art, or playing music. But the question is, if an AI does this, will it take all of those pleasures away? And I think no. I think it will drastically change how society functions. But I don't think AI doing art has to take away the pleasure of me creating art. If I didn't enjoy art in the first place, but if an AI can do it better, why wouldn't I use that? Why wouldn't I profit from its labor? If the concept of a job becomes outdated because all of our work can easily be outsourced to robots - there will be some societal unrest, but I think at the end of the day, humans will realize life is for us; we make the rules. Why don't we find a way that is good for us instead of just suffering? When the cost of labor is much lower, it will dramatically change how the world operates. If we wanted to start a business, suddenly, start-up costs would mean investing in a robot for your restaurant or buying an AI solution.

Let's say that no business will hire any employees. No humans are involved except at the executive level. If everybody owns a business, funds have to be redistributed to society. Because at the end of the day, if everybody is born homeless - what do we do, go back to feudalism? I don't think so, not with the level of information that we have today. I think there may be ways people live for free, but I don't know if it necessarily means the same level of feudalism. Why hire a human worker who's unreliable and unpredictable when you hire a worker who works tirelessly? I mean, that really is the end goal - that we can replace human labor. We don't need so many humans, and we can meet our needs for surplus production.

However, with the current level of technology, the current state of AI, and any iterations on the current state, we're not getting closer to that goal. We are building giant, purpose-made machines, taking massive amounts of data and purpose-building pixels in a row, or text, statistically likely to satisfy output for an input. Suppose we genuinely want to develop general intelligence. In that case, the only way is by simulating thought and the process of thinking, not by investing in a fundamentally outdated technology and polishing it more and more.

AI hype is overblown.

Joshua Grant

Read more

AGI Must Be Embodied

The AI Hype Cycle Is Full Tilt

Project Postmortem

The University to Tech Pipeline