Approximately rational

Does the mind "approximate" Bayesianism? What does that mean?

May 01, 2024

This post is a break from our regularly scheduled programming for a bit of computational cognitive science.

In the past, I’ve found myself confused about what’s going on when people give a Bayesian model of some psychological process. What’s that supposed to mean? No one thinks that the mind is always fully doing full Bayesian updates about everything. More commonly, cognitive scientists say that the mind “approximates” Bayesian updating.

In “Rational analysis, intractability, and the prospects of ‘as if’ explanations”, several cognitive scientists and philosophers (Iris van Rooij, Cory Wright, Johan Kwisthout, and Todd Wareham) point out how surprisingly unclear these claims are.

What computational models are

As they tell it, here’s the basic picture of computational models of the mind–which applies to all kinds of models, not just Bayesian ones.

Marr’s three levels are a famous framework in cognitive science. Suppose there is some capacity, say, object recognition. There are three different levels of explanation for the question “how does the mind recognize objects?” Think of the problem as explaining how the brain moves from certain inputs – a visual array, say – to certain outputs, like the classification of an object as “apple.” According to Marr, there are three levels of analysis of this capacity.

Computational: a function f that describes the input-output mapping.
Algorithmic: an algorithm for computing that function: that is, a finite procedure for generating f(i) for any input i.
Implementation: how the brain physically carries out that algorithm

One upshot of this framework is that, for many questions about the mind, it might help to work from the “top down”. That is, to first characterize what the function is, which will constrain what the possible algorithms are, which in turn will constrain what the possible physical implementations are. Working from the top down can be easier than simply looking at a huge tangled mess of neurons and trying to tell what’s going on.

However, the key problem with figuring out what function f characterizes a capacity is that we can observe only a limited number of input-output behaviors. These input-output pairs will, by themselves, under-determine what the correct function f is. We need some way of constraining the possible functions.

Here is a key constraint proposed in cognitive modeling: assume that these functions must be fairly close to optimal, otherwise we wouldn’t have evolved or learned them. As Anderson (1990) put his ‘principle of rationality’: “The cognitive system optimizes the adaptation of the behavior of the organism”. In light of this, van Rooij et al. explain, Anderson proposed a six-step procedure called ‘rational analysis’ for deriving the function.

1. Goal: Specify precisely the goals of the cognitive system (goals G).
2. Environment: Develop a formal model of the environment to which the system is adapted (environment E).
3. Computational limitations: Make minimal assumptions about computational limitations (limitations L).
4. Optimization: Derive the optimal behavior function f given 1–3 above.
5. Data: Examine the empirical evidence to see whether the predictions of the behavior function are confirmed.
6. Iteration: Repeat, iteratively refine the theory.

The problem of intractability

Here’s the problem: a lot of the functions proposed by cognitive scientists using this method are, contra step 3, computationally intractable. That means that there is no algorithm that can compute f in a realistic amount of time for all inputs. Given that we have limited brains and we certainly do (for example) recognize objects very quickly, discovering that a proposed function f is intractable would seem to rule it out as a candidate explanation of what the mind is doing.

An abstract geometric impressionist painting symbolizing Bayesian computational cognitive science. The artwork should feature geometric shapes, with softer and more blurred edges to maintain an abstract quality. The color palette should remain intense with purple, blue, and red, creating a visually striking and deeply abstract interpretation of cognitive science themes.

As Chater and Oaksford (2000) acknowledge, “It might appear that there is an immediate contradiction between the limitations of the cognitive system and the intractability of rational explanations”.

This is the problem of intractability. For the rest of the paper, van Rooij et al. consider several related ways that rational analysts have tried to get around this problem. All of these ways, they claim, share the following motivation: “Rather than supposing that the target explanandum is explained by the exact computation of the intractable function modeling it, they instead just construe the target explanandum as if the intractable function modeling it were computed exactly.” But what might they mean by this as if?

According to van Rooij et al., it’s not always clear. And there are at least five (not all compatible) families of positions that people allude to. Unfortunately, none of them really addresses the problem of tractability. Here are the five:

Non-computationalism: the mind isn’t really computing the function
Implicit or sub-symbolic: the mind isn’t explicitly computing the function
Offline calculation: a lot of the computational work has been done “offline”, by evolution
Heuristic computation: the mind is doing a heuristic calculation which isn’t quite the same as f
Approximate computation: the mind is calculating some function which is approximately f

van Rooij et al. discuss the problems with each of these before proposing their own framework.

1. Non-computationalism

Here’s a proposal: if the mind is only behaving as if it’s computing f, maybe that means that in some sense, it just isn’t computing f? So then, who cares if f is computationally intractable (497). This thought is hinted at (though probably not endorsed) by computational cognitive scientist Nick Chater when he writes: “[Rational description] does not assume (though it does not rule out) that the thought processes underlying behavior involves any rational calculation. An analogy may be useful: the wings of a bird may approximate the results of a rational calculation of optimal aerodynamic design” (496).

Maybe the thought here is that intractable functions “may be realized tractably by physical means other than computation” (497). But this is false: intractability is so strong a result that it rules even this out. As van Rooij et al. write, “For a function f to be intractable in the sense of being NP-hard just is for there to exist no algorithm that can compute that function in a reasonable (polynomial) time. Here, ‘algorithm’ means any finite procedure under any formalism of computation, including those based in natural physical systems.” Not only does non-computationalism not avoid this problem – many cognitive scientists do want to give a distinctively computational explanation, in any case.

2. Implicit or sub-symbolic computation

Rational analysts point out, quite correctly, that one shouldn’t think of people as explicitly knowing the logic, probability theory, and computer science that are necessary to explain / describe their mental processes. For example, even if my visual system has a prior that most scenes are lit from above (and slightly to the left), that doesn’t necessarily mean that I have any such assumption, or know that such an assumption is shaping my visual perception. These priors and these processes are implicit, unconscious. While this thought is correct, it simply does nothing to circumvent the intractability worry. A function f is intractable, whether or not it is computing consciously or unconsciously, symbolically or sub-symbolically, explicitly or implicitly.

3. Offline calculation

Another obviously correct observation: an organism whose mind calculates function f needn’t have developed the function f itself, proven that it’s optimal itself, and so on. Presumably, the development of this function occurs in evolution and development without the creature having any notion of this development (do you remember building out your object recognition system?). So Chater and Oaksford say things like “Such behavior may be built in by evolution or be acquired via a long process of learning—but it need not require online computation of the optimal solution” (501, emph mine).

The first part is true – certainly “such behavior may be built in by evolution or be acquired via a long process of learning” – but what does the second part mean? The problem with this explanation is the problem of flexibility. Usually, we are in the business of positing a computational model because we want to explain some kind of flexible behavior: how does the visual system recognize a large class of objects, how can we learn a virtually unlimited number of concepts? The point of proposing a function that the mind computes, as opposed to “hard-coded” responses, is to account for this flexibility.

4. Heuristic computation

A heuristic is “an algorithm known not to compute f exactly; instead, it will output something different from f (i)” for some inputs. Van Rooij et al. neatly argue that these heuristic explanations are inadequate, precisely because of this mismatch. There’s a dilemma about the times f(i) and f_H(i) are different. In these cases of mismatch, either the actual observed output is f(i), or it is f_H(i). If the actual output behavior is f(i), then heuristic H does not explain it, for the simple reason that heuristic H doesn’t predict f(i). If the actual output behavior is f_H(i), then f(i) is an inadequate model.

5. Approximate computation

The best version of the ‘as if’ explanation is to say that the mind does something that is guaranteed to have outputs that are approximately those of f(i). There are three ways that f(i) can be approximated by f_A(i): the outputs of f_A(i) are guaranteed to ‘resemble’ those of f(i) in some way; the outputs of f_A(i) are guaranteed to be close to those of f(i); or the outputs of f_A(i) are guaranteed to be those of f(i) with some high probability.

This main problem with this kind of explanation is that if a function f is intractable, it’s usually also intractable to approximate.

Their solution: restrict the inputs

A function is often proven to be intractable with respect to all possible outputs. But if you limit the inputs, it can be tractable. And since as human beings we only need to manage certain inputs that are likely to occur in our environment, we have functions that are tractable on those inputs.

a computational-level theory f may be intractable for an unconstrained
input domain, but a cognitive capacity may not need to deal with arbitrary inputs
under ‘normal’ conditions. That is, the situation in which a capacity is effectively exercised by cognitive agents in their normal lives may be better modeled by a restricted input domain; and the function f restricted to that input domain. (507, emph mine)

If we explicitly (and correctly?) say what the input restriction is, then this does yield an explanation that is genuinely ‘as if’, but not in a problematic way:

the rational analysts may contend that it is ‘as if’ cognitive agents compute f in the sense that they compute f and in the normal range of inputs the functions f and f’ are indistinguishable. Importantly though, this indistinguishability is mere appearance; for if one were to present the cognitive agents with inputs outside the normal range then the behavior of the cognitive agents would no longer be guaranteed to look anything like f .

Thus, part of the job of computational modeling is to figure out what ‘normal’ inputs are, which both capture facts about the environment and yield a tractable function that the organism plausibly computes.

Now, that sounds much easier said than done—as the authors no doubt realize. That said, the paper does succeed admirably at clarifying why it would be more explanatory than many of the other frameworks they consider.

William D'Alessandro

May 2, 2024

Cool writeup, Rob, thanks! It would be interesting to hear more about how the normal range of inputs might be thought to differ from the range of all possible inputs for some sample cognitive function, and how this promises to deal with intractability. I'm kind of surprised by the idea that you can reliably get such big reductions in computational complexity without sacrificing (too much) performance on mundane tasks -- but maybe that's just my weak theoretical computer science talking.

Expand full comment

Experience Machines

Discussion about this post