AI is not a black box (relatively speaking)

Abstract: Opinion works of general TDS audiences. I think in a tangible way, artificial intelligence is more transparent than humans. The claim of artificial intelligence as a “black box” lacks perspective and is behind artificial intelligence research in some ways compared to the opacity of human intelligence research.
Readers are black boxes. Your thoughts are mysterious. I don’t know what you think. I don’t know what you’re going to do, I don’t know what you say is honest and whether you honestly justify your actions. We learn to understand and trust humans through years of introspection and interaction with others. But experience also tells us that understanding is limited to people with similar life backgrounds and trust, and the opposite motivations of ourselves are not necessary.
By contrast, artificial intelligence (although still mysterious) is clear. I can explore the equivalent ideas and motivations of AI and know that I got the truth. In addition, AI is equivalent to a “life context”, its training data and training goals equivalent to “motivation” are not fully understood, if not fully known, and are open to review and analysis. Although we still lack years of experience with modern AI systems, I don’t think there is any problem with opacity. On the contrary, the relative transparency of AI systems to inspections and its “white box” nature can become the basis for understanding and trust.
You may have heard of AI as a “black box” in two senses: AI, such as Openai’s Chatgpt or Anthropic’s Claude is a black box because you can’t check its code or parameters (black box) Right to use). In a more general sense, even if you can check these things (White Box access), they don’t help understanding how AI works to any generalizable degree. You can follow every instruction that defines chatgpt without the insight than just reading its output, which is the inference of the Chinese room argument. However, (human) ideas are more opaque than restricted AI. Since physical barriers and ethical constraints limit the inquiry of human thought mechanisms and our incomplete models of the architecture and components of the brain, human thought is more like a black box—albeit organic, carbon-based nature”, even proprietary, even proprietary, i.e. closed, closed AI models. Let’s compare current science tells us on the one hand about how the human brain works internally, and on the other hand, AI models.
As of 2025, the only static neural structures (flies) drawn account for only a small part of the complexity of the human brain. Functionally, experiments using functional magnetic resonance imaging (fMRI) can reduce neural activity to about 1 mm3 The volume of brain matter. Figure 2 shows an example of neural structures captured as part of the fMRI study. The hardware required includes a machine worth at least $200,000, stable liquid helium and a very patient human willingness to keep a static supply while a ton of superconductors rotate a few inches from their heads. While fMRI studies can determine that, for example, the processing of visual descriptions of faces and houses is related to certain brain regions, most of our understanding of brain function is due to literal accidents, which of course is morally unscalable. Ethical, less invasive experimental methods provide a relatively low signal-to-noise ratio.

Open source models (white box access), including the Big Language Model (LLM), are regularly sliced and diced, and are actually sliced, otherwise, even with the most expensive FIMRI machines and the sharpest scalpels, use consumer computer gaming hardware to make humans more invasive to humans. Every point of each neural connection can be repeatedly examined and consistently recorded in a huge space of input. AI will not get tired of it in the process, nor will it be affected in any way. This level of access, control and repeatability allows us to extract a large number of signals from which we can perform a large number of fine-grained analysis. Controlling what AI observations enables us to connect familiar concepts with components and processes inside and outside AI in a useful way:
- Related neural activity to concepts similar to fMRI. We can judge whether AI is “thinking” specific concepts. How do we judge when a person is considering a specific concept? Fig. 1 and 3 are two concept renderings of Gemmascope, which provide comments for Google’s Gemma2 Model internal devices.
- Determines the importance of a specific input to the output. We can tell whether a specific part of the prompt is important for generating the AI response. Can we tell if human decisions are affected by specific issues?
- Transport concepts as attributes through the path through AI. This means we can accurately distinguish concepts in neural networks from input words to final output propagation. Figure 4 shows an example of such a path trace to implement the syntax concept of topic number. Can we do the same for humans?

Of course, humans can self-report and answer the first two questions above. You can ask the hiring managers what they are thinking when they read your resume, or what is important in the decision to give you a job (or not give you a job). Unfortunately, humans lie, that they themselves do not know the reasons for their actions, or prejudice in ways they do not understand. Although the same is true for generating AI, the interpretability approach in the AI space does not rely on AI’s answers, namely truth, fairness, self-awareness, or other ways. We don’t need to trust the output of AI to tell whether it is considering a specific concept. We literally read out that the (virtual) probe sticks to its neurons. This is trivial for open source models, considering what it takes to get such information (ethically) from humans.
What about closed “black box access” AI? A lot can be inferred just from the black box access. The lineage of the models is known, as is their general architecture. Their basic components are standard. They can also be interrogated at a much higher rate than humans have tolerated and in a more controlled and reproducible way. Repeatability under Selecting Input is often an alternative to open access. Parts of the model can be inferred, or their semantics can be copied by “distillation”. So Black-Box is not an absolute barrier to understanding and trust, but the most straightforward way to make AI more transparent is to have public access to its entire specification despite the current trends among current AI builders.
Humans may be more complex thinking machines, so the above comparison seems unfair. And, because of our years of human experience and interaction with other (puted) humans, we are more inclined to feel that we understand and can trust humans. Our experience in various AISs is growing rapidly, and so is their capabilities. Although the size of the optimal models is also growing, their general architecture remains stable. There is no indication that we will lose the transparency of the above operations even if they acquire and exceed human abilities. There is also no indication that exploration of the human brain may produce enough important breakthroughs to make it opaque intelligence. Artificial intelligence is not-may not become what human emotions call black box.
Piotr Mardziel, head of AI, Realmlabs.ai.
Sophia Merow and Saurabh Shintre contributed to this article.