The Goal
The goal of the LLM Experiments is to interface with LLMs with a dual purpose: simultaneously to help me solve problems and to meta-observe how it may be functioning internally. The better we can understand how an LLM processes prompts, generates responses, and exhibits biases or limitations, the more effectively we can use it, and the better we can predict when it will fail us.
Human Psychology Experiments
For the human brain, we use psychological experiments to identify ways to treat dysfunction, improve education, enhance performance, inform policy, reduce conflict, and predict behavior. We have subfields of psychology: social, cognitive, developmental, personality, clinical/abnormal, and biological/neuroscience psychology.
Researchers face major limitations in human psychology experiments. Among other things, this often forces reliance on self-report measures and retrospective accounts.
But when you ask someone why they made a decision, can they actually tell you? Or are they just making up a plausible story after the fact? Evidence suggests people often confabulate. Split-brain patients1 invent reasons for actions they didn't consciously choose. People deny that factors like position or priming influenced them when experiments show these factors clearly did. We might not have introspective access to how our own minds work. Introspective opacity means even the experiments researchers can do might be measuring rationalization rather than actual cognitive processes.
Nevertheless, we pursue psychology because insight into how the brain works, even if imperfect and ambiguous at times, helps us make better decisions. We can improve our own cognitive performance and predict the behavior of others before we act.
What are the LLM Psychology Experiments
Quick disclaimer: I'm calling this "LLM psychology" somewhat tongue-in-cheek. LLMs aren't conscious, don't have minds, and the parallel only goes so far. But the framing is useful because the methodological parallels are real, and because it captures something true about how we have to interact with these systems.
Enough of what is true humans is also true for LLMs. LLMs are not humans, and neural networks are not brains. But LLMs are taking input, combining the input with pre-trained knowledge, and producing an output. And it turns out that the output is often impressively close to what we would expect for a given input.
But how does this work? What are its limitations and biases? And why does the LLM produce output that is illogical at times?
With LLMs, we also want to treat their dysfunctions, improve how we educate them, enhance their performance, inform policy governing them, and predict their behavior (for example, if we know an LLM often wants to do undesirable X given prompt Y, we could enhance prompt Y to reduce the likelihood of X). In a way, we have subfields of LLM "psychology": social (interactions), cognitive (reasoning), developmental (training), personality (tone and role), clinical/abnormal (dysfunction), and biological/neuroscience psychology (neural networks, transformers, embddings).
For these experiments, I am primiarily taking the perspective of a user, observing behavior and taking self-report data. Just as we are in human psychology, we are limited in what we can do with LLM psychology experimentation. An LLM can tell me why it made a choice, but neither of us knows if that's the real reason or just a plausible-sounding explanation it generated after the fact. An LLM's "reasoning" might be confabulation about its own processing, just like human introspection might be.
Nevertheless, LLM "psychology" is worthy of pursuit, even if imperfect and ambiguous at times, because it may help us make better decisions. We can improve the performance of the LLM in assisting us and predict its behavior of others before we act.
Here are some interesting questions to pursue:
- The feedback illusion - LLMs often ask "Please let me know if I got anything wrong so I can improve" or "I appreciate the correction—I'll remember that for next time." But without Memory enabled, there is no "next time." The LLM can't learn from your corrections across conversations, and it definitely can't learn across users. So why does it do this? Understanding this reveals something about training patterns vs. architectural capabilities.
- Specificity thresholds and premature convergence - At what point does an LLM lock onto one interpretation of an ambiguous request and stop considering alternatives? If you say "help me build a dashboard," when does it commit to web dashboard vs. data visualization vs. Tableau vs. Excel? How much ambiguity can you preserve before it forces a choice, and can you control this?
- Self-correction vs. doubling down - When you push back on an LLM's answer, does it genuinely reconsider or does it defend its initial response with increasingly confident-sounding rationalization? Can you tell the difference? This determines whether iterative refinement actually improves outputs or just produces more elaborate wrongness.
- Tool-use decision boundaries - What triggers an LLM to use a tool (web search, code execution, file access) vs. rely on training data? Can you predict when it will search vs. synthesize? Can you influence this without explicitly commanding it? Understanding this is critical for building reliable workflows.
- Context window degradation patterns - How does performance change as conversations get longer? Is it gradual decline or are there cliff edges? Does it lose certain types of information (early instructions, code snippets, numerical data) faster than others? Knowing this tells you when to start fresh vs. keep going.
-
A split-brain patient is someone whose corpus callosum (the neural bridge connecting the brain's two hemispheres) has been surgically severed, usually to treat severe epilepsy, which allows researchers to present information to one hemisphere that the other hemisphere cannot access. ↩
