We Have No Idea What AIs are Thinking

One Of The Most Terrifying Aspects Of AI

Feb 06, 2022

AI Brain vs Human Brain

Using the brain as an analogy, what do you think a memory looks like? Think back to something you did earlier today and yesterday and you likely don’t have too hard of a time pulling up that memory and grabbing specific things from it. But what if you were able to look inside your head to see the neurons that contain those memories, what would those neurons look like and how would they even be able to store memory?

In the simplest model of a neuron, it passes on information by either being active or inactive. That activity then gets passed on to other neurons around the brain until it creates the complete picture of a memory. Although it may be trivial for us to remember things, it is still a near-impossible task to look at the neurons themselves and understand what the memory actually is. It would be equivalent to looking at a bunch of random blinking lights and trying to understand from it some kind of complex concept.

This picture was created by an AI with the inspiration of “Neural Brain Activity”

AIs work in a very similar way, in fact many AIs contain neurons of their own. These neurons are often represented as numbers in a grid and connect to other numbers in a grid. With this grid of numbers, you can pass it information that the neural network will take in that it will run through its grid of numbers until it gives a final output. In this sense it is very similar to neurons being passed in information (for example from our eyes or ears) and processing it to allow us to understand our vision or what we are hearing.

But What Do We Really Know?

But the question then becomes what happened between our eyes seeing light and our brains actually understanding the information? Unfortunately, much of that is still a mystery to us, and the same is true of many of the neural networks (AIs) we create. Even though we may be able to look at the numbers that make up a neural network, there isn’t a convenient way to interpret those numbers, leaving us lost in trying to understand what the AI is thinking. If we think about what that means on a practical level, if we create an AI that can identify dogs, we have no idea what characteristics it is even looking at to identify the dogs! Does it look at the eyes? Or the shape of the head? Or the nose? or maybe how many legs it has? What happens if we give it a picture of a dog with only three legs? Maybe that last one is a conversation for a different day, but the rest of them clearly pose a challenge in knowing exactly what the AI is thinking.

This is why many frontline scientists developing AIs have started to express their concerns about some of the dangers of AIs. Given that it is so challenging to know what an AI is thinking, it could be highly problematic if that AI develops “thoughts” that aren’t in line with the goals of the people that created them. Imagine handing over the wheel of your car to an AI, but unknown to you is that somewhere in its brain it has prioritized getting you to your destination over safety in a given situation. Because of this, the car may put you in danger, but there wouldn’t be any way for us to know that it had that intent before seeing its behavior.

It’s Not Hopeless

This isn’t all to say that we will never understand what an AI is thinking. In fact, many researchers have come up with clever ways to start poking into the brains of AIs with at least some success for getting an idea of what it is thinking. One method has been to give the AIs incomplete parts of pictures. Using our dog example, if we gave that AI only a picture of the paw of a dog, there is a good chance that the AI won’t be able to tell us that it’s a dog, but some of the ‘lights’ of the AIs network will still respond to the paw. Because of this, we can sometimes get a rough idea of what sections of the AI are associated with the different characteristics it is looking for.

This however isn’t a perfect system, in that you first need to know possible things it might be looking for, and even then there’s no guarantee that the AI will be looking for the same thing you would expect it to be looking for. For example, we might expect an AI to be looking for paws, but given that there are many animals that also have paws it is possible that the AI disregards that information since it isn’t inherently useful in deciding whether that animal is a dog.

The challenge becomes even harder when we consider that the AI may not use any of the normal characteristics that we think define a dog. We know that a dog has to have a certain set of pieces, but what if the AI recognizes a piece that we haven’t even thought of as a piece? An example could be the geometry of the eyes. Do dogs have eye shapes that are specific to dogs? We as humans don't see that as a possible way to identify a dog, but maybe the AI is able to do that.

When it comes to self-driving cars, this kind of exploring of the AI brain is even more scrutinous to ensure that it will behave both how we expect, as well as in a safe way. By giving the AI hundreds or thousands (or sometimes millions) of different scenarios, we can better explore what an AI will do, giving us at least a little insight into how information is being processed and what kind of decisions are being made. From there it is possible to go back to the drawing board if needed.

So What’s The Point?

Although we can create complex programs that are able to do fairly challenging tasks, we still know very little about how they go about those tasks. In the same way that we have discovered how different areas of the brain are responsible for different functions, AIs can also be assessed with some success for identifying what a given region is doing. But on the finite level of a single neuron or node, it becomes almost impossible to know what it is responsible for. However, given the similarities between brains and AIs, it is likely that in coming years we will push this understanding even further, with one field helping create discoveries in the other, either telling us more about how the brain works, or more about how AIs work.

Chris’s Newsletter

Discussion about this post