<- function(psi, a, z) {
ComputeGrover <- length(a)
ndecks <- diag(complex(ndecks, real = 1, imaginary = 0))
I <- 1 - complex(modulus = 1, argument = psi[1])
e1 <- 1 - complex(modulus = 1, argument = psi[2])
e2
<- I - e1 * (a %*% t(Conj(a)))
U1 <- e2 * (z %*% t(Conj(z))) - I
U2
return(U2 %*% U1)
}
Synergizing reinforcement learning with functional deep quantum neural manifold learning
It’s been more than a few years since I’ve actually played around with any gambling tasks, and in the intervening time, models of sequential decision making based on concepts from quantum mechanics were briefly in vogue in several journals devoted to decision making and learning. I haven’t studied any quantum mechanics in a while, either, so I thought it would be interesting to unpack one of these models: Specifically, the model of Iowa Gambling Task performance reported by Li et al. (2020), which claims to find better fits to Human performance than classical models, such as ones based on the delta learning rule.
I admit I’m largely ignorant of most of these models’ uses in the literature. There do seem to be cases where quantum and classical models produce measurably different predictions, and in where quantum models appear to explain patterns of preference judgement not captured by classical models (e.g. Epping and Busemeyer 2023); but the cynic in me wonders if a large part of the appeal isn’t the quantum language itself, especially as descriptions of these models in the literature tend to bury the reader in notation without providing insight.
In this case, the problem to solve is a standard Iowa gambling task (IGT), which is just a four-armed bandit disguised as a card game. On each trial, subjects are presented with four decks of cards, and upon choosing one, receives a monetary reward. The rewards associated with each deck are probabilistic, and the subject is told to earn as much money as possible by the end of the task, and must do so by selecting from each deck and learning which of the four is the most profitable.
A standard model for this task looks something like this: The subject maintains a valuation
Now, onto the quantum model. Let’s dispense with the quantum mechanical formalism, and especially with the quantum mechanical notation, which is needlessly opaque for anyone outside of the field. In finite dimensions (the only case anyone ever actually considers in these applications), nothing happens that can’t be more concisely written using standard matrix notation; and so we’ll treat this as an ordinary model of sequential decision making with a weird update rule, and we’ll use standard linear algebraic notation.
Assume that the subject maintains a valuation
The matrix
To be very explicit, for those uncomfortable working with complex numbers, the notation
The only thing that remains is to define
<- function(u, pars) {
ComputePsi <- pars$eta
eta <- pars$b1
b1 <- pars$b2
b2 <- c(pi * (u * cos(pi*eta) + b1),
psi * (u * sin(pi*eta) + b2))
pi return(psi)
}
Once we have the updated valuations
<- function(z) {
ComputeChoiceProbs Mod(z)^2
}
Complexity aside, what I want to make clear here is that, however the update rule was derived, ultimately we are just fitting curves to subjects’ behavior. In particular, the snippets of code that I’ve provided will implement this learning rule on your very own, boring, non-quantum computer. Using this model, your stupid classical laptop will succeed in learning a Bandit task like the IGT, even though it has never even seen Oppenheimer and nothing “quantum” is happening in the way your CPU manipulates bits in memory. This is, first and foremost, a flexible curve-fitting algorithm.
Intuition
Consider the more straightforward update of the delta learning rule, where the subject maintains a valuation
That is, if the observed utility was higher than expected, we update our expectation upward; and vice versa. The actual mechanism underlying the update of our quantum learning rule is much more opaque, but we can get some geometric intuition.
Suppose our current valuation of the four decks is
This is easiest to see in the simple two-dimensional example in the figure below, where we imagine two decks
Note that, however we update
This is, in fact, precisely what is happening. The matrices
correspond to the four respective decks, we would like the result of applying
Through a rather elaborate derivation, the authors give the ratio
What does the update look like?
Now, here’s the point of this exercise. We’ll consider two models in parallel, which we’ll call QM (for quantum model) and CM (classical model). Both will use the same utility, but QM will apply the update we’ve just discussed, while CM will use the delta learning rule along with a softmax function to determine deck selection probabilities.
CM | QM | |
---|---|---|
Valuation | ||
Update | ||
Selection | ||
Initial valuations |
This is what we’re going to do: We’re going to pick deck (say, deck 1) and look at the change in selection probability for that deck as a function of the utility. Specifically, since we have a closed-form expression for this for the QM model, we’ll be comparing the log-odds of selecting the deck before and after the outcome, as a function of the model parameters. Here is it for the model CM:
Note that this is completely sensible. As the utility increases, our probability of selecting the deck increases as the utility of the outcome increases, and this effect is greater for higher learning rates
Now onto the, uh…difficulty of model QM. First things first, we’ll ignore the parametrization in terms of
For clarity, the red region corresponds to values of
As the authors point out, the mapping
defines a line passing through the point
There are two things I dislike about this parametrization as a model for Human behavior. First, the parameters are extremely difficult to interpret, as the relationship between utility and the change in deck probability are very difficult to read from the parameter values themselves. If the user knows the landscape of the function, they can compute the effect of zero utility by looking at whether the update implied by
The second is that, because the log-odds are periodic in
We can see this exactly by choosing some specific parameters – say,
Actually, we can be more specific. As the update is periodic in both
The parametrization of the update in terms of (
I suspect that, for a specific payoff schedule, it would be possible to carefully tailor the parameters so that the log-odds over the range of experienced utility lies in a well behaved region of the space. For example, if the utility experienced by the subject is constrained within
For these specific values of
In fact, we get a pretty close approximation of the curves generated by the CM for different learning rate parameters. But not so for slightly different values
Now,
When we dispense with the quantum language and view it merely as a model of sequential decision making with a particularly flexible policy update, the structure of the model is quite simple: We have a smooth surface – a function of
As a model, it’s interesting, but since it is fundamentally just curve fitting, and since the weird oscillatory learning behavior is clearly inconsistent with the way decision making is likely to be implemented in the brain, I’m not sure that the model’s performance really suggests that the brain is leveraging “quantum” phenomena. Shor’s algorithm is a polynomial time quantum algorithm for integer factorization. And yet, we wouldn’t claim that the ability of a subject to factor small numbers in their head is evidence for some kind of quantum computation in the brain. Classical algorithms can do prime factorization too. Quantum algorithms are designed to take advantage of the architecture of a quantum computer, but they typically solve the same problems. The mere fact that a model based on quantum computation can predict behavior (e.g. the output of some computation) does not, by itself, suggest any kind of quantum computation in the brain unless we can also show that something in the computational architecture of the brain is actually quantized.