Neural networks won't ever run game engines.
Is it possible to make a video game entirely from neural networks? If it were, the question would be, is it a more cost-effective or engineering-light solution that could become the new Unity or Unreal Engine? My intuition is no. Neural networks can't. To provide evidence for that point, I'm going to explore some of the different contexts, problems, and bonuses that neural networks get, and I've already discussed some pitfalls and potential limiting factors that they have.
So, the neural network, we all know by now, is a vastly over-parameterized function. It's like a computer program listing inputs and possible outputs. And there's some magical thing that happens within the inputs and the outputs called the hidden layers that we won't discuss. However, one of the issues with neural networks is their lack of precision. For example, if you want to train a neural network to add two numbers together, it will never be as accurate as a calculator. The reason is that neural networks use statistics to calculate the probability of a specific outcome. But the outcomes themselves are predetermined.
If I wanted to add five plus ten together, the neural network would have to know that it could output 15. If I wanted to add together 782 and 1143, the neural network would have to know it could produce 1,925, and it would have to have that as an output. If the output was a floating point number, the values would range from negative one to one.
Is there any way to overcome this limitation? Maybe a neural network could output a field of pixels in which it could produce an arbitrary sequence of digits. Maybe you could use an LLM to treat the input like a sentence. But still, it wouldn't know properly without getting extensive training on how this addition would work. And that's just addition. We'd run into issues with multiplication and division; the result is that the neural network would have to be highly parameterized with every possible outcome represented in a training set. Sounds dumb? Check out this Meta network that trained an NN on 100 million problem-solution pairs.
The training would be a lengthy process involving every possible combination of digits and operators. It's already the wrong step in building a deterministic world from which we can create a video game.
Why care about determinism?
Having a deterministic video game means that given the environment at a given state and action at a given time, the environment will change the same way every time that action is taken. That's great for players because they can make predictions about your world and correctly perform actions to reach a goal. If the world was non-deterministic, players would never be sure an action would achieve what they wanted.
For example, if you have a race car in TrackMania and you are accelerating, the acceleration will always produce the same result given the same initial values. Now, there's a fascinating video from Yosh on YouTube where he trains AI to beat TrackMania's pipe challenge. He discovers that there are micro fluctuations in the input values for the motionless car, that come from floating point errors, which are notoriously tricky to compensate for.
These micro fluctuations lead to vastly different outcomes. Even though the game is deterministic, the slightest change, even one millionth, is enough to derail an otherwise normal-seeming outcome. That's because of chaos theory, which he explains in his video. It's highly worth watching. I recommend it.
However, chaos theory essentially stipulates that minor fluctuations in the starting values of a system can have drastic and outsized consequences later on. If you know what seeds are, you'll know Minecraft uses this fact to generate random-seeming worlds. The worlds are entirely different, yet, they behave the same way. For example, the same biomes always exist. There will always be the same logic and inherent structure to the world; it won't be possible to have a checkerboard grid of sand and snow tiles. And in TrackMania, the car will always go left if you press left on the ground while moving.
Determinism doesn't guarantee a predictable outcome, which is kind of a scary thought. What's scarier is throwing determinism to the wind and hoping the world makes sense.
It's more expensive to train the AI than it is to just make a game
So, knowing that predictable environments is already challenging is something to consider. A neural network is not going to improve this process. So, let's say we trained a neural network to represent the pixels on your screen to provide an environment. And you want to see a basic house, like a 3D house with textures in the middle of a field. You want to walk around the house and interact with it. To get to that point with a neural network, you need training data. You also need training data on the possible player-provided inputs. And you would need to train this neural network so that it has an easier workflow than the typical pipeline for game creation.
The problem is that creating the training data looks like the normal pipeline for game development. Someone still needs to make a 3D model. They still need to place it in the world. They need to add a camera. They need to add a player controller. We allow the player controller to move around. But then you run into the problem of input values. So, player position is one thing. And the player moving, depending on the position, is another. But if you think about the variability of input values, we have 360-degree rotation. We have horizontal movement in both of the axes. Possibly jumping, which could be the third axis, and crouching. And we have the house, which needs to be represented in all directions.
The problem is that even with this simple world the outcome would be highly irregular unless the training data was so extensive that it covered every single input and every single possible output. And that would take a lot of training data. It would take a lot of training time. The problem is that it would require you to do the same work you would normally do when developing a video game. So you're not saving any time there.
AI is lossy
I would argue that representing all of this data in a neural network is less efficient. It could be that neural networks, at some point, are the compression algorithms for vast amounts of data, as we've seen with generative AI. However, the problem is that they are statistical compressions of enormous data. They are lossy compressions of large amounts of data. They don't compress small amounts of data at high fidelity.
Generative AI, like DALL-E, MidJourney, or ChatGPT, has major flaws. And I believe these flaws are irredeemable. You are taking inputs, such as a photo of Mona Lisa, alongside another input, such as a screen grab from The Simpsons, and you're trying to train an AI that can produce both. The current approach is to have an AI trained to downscale images, an AI trained to upscale images, and then an AI trained to turn words and images into a set of abstract image values that are later used to produce an image from those abstract values. It essentially looks like an hourglass, where information is coming down from the top, it's filtered, and then on the bottom, it's then recreated. But the issue here is that while it's useful for a quick visual, it lacks the control that artists need in order to produce high-quality work.
Gen-AI simply upscales low-quality statistical guesses at word combinations; some of the time, these results are bad or wrong, which erodes trust and defeats the point. Even worse, most of the time, there's literally nothing that can be done. It's simply a limitation of the medium. You can't take 1 billion images and expect a model that's only a few gigabytes in size to fully represent the entire input space. And you can't get it to be precise, either. You can get it to interpolate between the input data, but you can't have it give an accurate reproduction of the Mona Lisa because the Mona Lisa has variance in training it on one photo versus other photos tagged with Mona Lisa that are in a different style. It will give you different results than what you expect. It's inevitable.
What about reskinning?
Well, reskinning is interesting. You want to design the world with just basic shapes and then have an AI come in and say, "Ah, that looks like a house in a field. I'm going to texturize the grass and the house." Maybe technology will be built for that. It could work because you're just leveraging generative AI to essentially take a scene from a video game, convert it into tokens, like house and field, then, use those tokens to create assets and inject them into the models.
It's worth considering, but the problem is that if an AI is doing this, you can't control the outcome. The house could look like a house, but you might not want it to look like a mid-century home; you might not want it to look like a 1600s French farmhouse. You may want it to be a log cabin because a log cabin is important to your story. If you want to customize it - then you need to start prompting the AI to generate a specific texture for the house. So human intervention is necessary.
And what's true about video games and every other creative medium is Conservation of Detail. With generative AI, you can spam tons of detail irrelevant to the main plot, but that's wasted effort and time. The players will be confused, thinking "is this asset plot-relevant?" Game testers will file extraneous bug reports. Your marketing team will shooting in the dark.
Let's take a step back. If you have an asset catalog like LAION-5B, with a vast amount of images all categorized by keyword. Say you want to search for something that you're using in your video game. And you have to go through the asset pipeline already. So you're creating the textures and have to make them with lighting, etc. And you're building a brick texture for the walls. If you search on a generative AI for "brick texture," will that be the brick texture you're looking for? Will it be as efficient as you could be just searching for the texture? Would the color be right? And more importantly - is the work copyrighted (a whole other can of worms)?
What will happen is that a lot of people without the skills will use this as their artist. We're going to the lowest common denominator of people using Gen-AI to create content. But at the end of the day, that content is just going to be worthless. Because if you're not skilled enough to know precisely what you want to do, then you aren't operating at the professional level that's required to succeed. Have you ever read a GPT spam website? The content all feels bland and stupid. It says nothing and just wastes your time (similar to what I'm doing, except only I can take credit for me being bland and stupid).
Where AI is Useful
I think, at least for now, AI has some very practical uses. It mostly has to do with accessing or working well with statistical data. For example, as I am speaking in my Voice Memos app, there's no deterministic algorithm that will fully transcribe my text for me because the way that I speak is not deterministic. The input values change. The only way to transcribe voice is by using a statistical model. And that's where neural networks come in. So, one way of thinking about it is, is this technology trying to replicate something that traditional computing can't do? If so, maybe a neural network is appropriate. I don't think a neural network is appropriate if this technology is trying to replace something computers can already do, or that humans can do well.
And you may wonder, well, how about programmers - can neural networks replace them? Maybe we could make games by asking an AI to write code that creates the game, which I'm skeptical about. Statistically likely code auto-complete is quite different than feature creation. We may make some progress by eliminating repetitive tasks, but that's what libraries and frameworks already do. That's what abstraction already does. LLMs are just productivity tools, not employee replacers.
A huge reason the code generation approach won't work is that the number of games that can be created is infinite. The number of variations in game styles is endless. And the only way to train an AI is on a limited input data set. That's why I think AI is dumb. Because there's no way to fully represent the output space using a model. And if we're taking any traditional approach to video game making, like using a game engine, an AI will struggle. If we take a non-traditional approach, like compiling or creating a game engine, there's insufficient training data. As I said, you can't recreate even one scene using AI unless you're being vastly inefficient.
Can AI Run Simulations?
The last question is, what about simulation? Can AI simulate a world? The answer is, to some extent, yes. You can see them in videos on Two Minute Papers, where scientists train physics simulations using AI to predict the motion of particles. And the simulations are pretty accurate. And they ran much faster than the actual simulated particles. But part of the problem is that these are baked like lighting physics. An AI is like baking physics into the model to save compute resources. And so there's no variability. You would need to train the AI for all the different scenarios that you wanted it to be able to represent. Step outside of that training data? You'd need to retrain the model.
And then maybe, let's say that you created a world purely from particles. These particles all interact with each other like atoms. AI is responsible for determining what the physics would look like for all these particles. Because our understanding of particles is statistical, it could be that an AI could generate the output world, given the input. But we're talking about the same limitations that would happen with a normal computer. You might get speed increases. Let's say it's even 80% faster than a traditional computing approach. That's pretty cool. You're losing accuracy, but you're saving a lot of time. Give it a model that has 15 million particles, about the size of a tablespoon of water. If we can simulate that much, it's not that useful. However, if you wanted to simulate the inside of a bedroom, you're talking about an unfathomable number of molecules or atoms. So, in the video game, even a 70% or even 1,000% or 10,000% reduction in processing time doesn't get you where you need to go. You need to simulate far more than computing resources would allow.
Video Game Development Already Uses the Right Tools
I think there's no straightforward approach to using AI to generate these worlds for us. I think that what we've built up from the very first video games is a suite of tools and methodologies for creating worlds that appropriately use the hardware we have today. Hardware has been the limiting factor, and the tools have improved to coincide with the increase in processing power. But they've always been appropriately suited. It's been a co-evolution.
I think AI, especially generative AI, is ultimately the wrong approach. I guess we'll see. In my mind, we've already seen the peak phase of AI. So tune in for the next couple of years to find out how it turns out.