Who’s in charge here?

In a very useful post, Jonah Lehrer wonders:

…if banal terms like “executive control” or “top-down processing” or “attentional modulation” hide the strangeness of the data. Some entity inside our brain, some network of neurons buried behind our forehead, acts like a little petit tyrant, and is able to manipulate the activity of our sensory neurons. By doing so, this cellular network decides, in part, what we see. But who controls the network?

I posted a comment on Jonah’s blog but it took so long to get approved that probably no one will see it. So I’m posting an enhanced version here.

Jonah’s final sentence, “But who controls the network?” illustrates to me the main obstacle to a sensible view of human thought, identity, and self-regulation.

We don’t ask the same question about the network that controls our heart rate. It is a fairly well defined, important function, tied to many other aspects of our mental state, but it is an obviously self-regulating network. It has evolved to manage its own fairly complex functions in ways that support the survival of the organism.

So why ask the question “Who controls it?” about attentional modulation? We know this network can be self-controlling. There are subjectively strange but fairly common pathologies of attentional modulation (such as hemi-neglect where we even understand some of the network behavior) that are directly traceable to brain damage, and that reveal aspects of the network’s self-management. We can measure the way attention degrades when overloaded through various cognitive tasks. Etc. etc. There’s nothing fundamentally mysterious or challenging to our current theoretical frameworks or research techniques.

Yet many people seem to have a cognitive glitch here, akin to the feeling people had on first hearing that the earth was round, “But then we’ll fall off!” Our intuitive self-awareness doesn’t stretch naturally to cover our scientific discoveries. As Jerry Fodor says “there had… better be somebody who is in charge; and, by God, it had better be me.”

I’ve written some posts (1, 2) specifically on why this glitch occurs but I think it will take a long time for our intuitive sense of our selves to catch up with what we already know.

And I guess I ought to write the post I promised back last April. I’ll call it “Revisiting ego and enforcement costs”. Happily it seems even more interesting now than it did then, and it ties together the philosophy of mind themes with some of my thinking on economics.

Meta: Patterns in my posting (and my audience)

I’ve been posting long enough, and have enough reaction from others (mainly in the form of visits, links and posts on other blogs) that I can observe some patterns in how this all plays out.

My posts cluster roughly around three main themes (in retrospect, not by design):

  • Economic thinking, informed more by stochastic game theory than Arrow-Debreu style models
  • The social impact of exponential increases in computer power, especially coupled with statistical modeling
  • Philosophical analysis of emergence, supervenience, downward causation, population thinking, etc.

These seem to be interesting to readers roughly in that order, in (as best I can tell) a power-law like pattern — that is, I get several times as many visitors looking at my economic posts than my singularity / statistical modeling posts, and almost no one looking for my philosophical analysis (though the early Turner post has gotten some continuing attention).

I find the economics posts the easiest — I just “write what I see”. The statistical modeling stuff is somewhat more work, since I typically have to investigate technical issues in more depth than I would otherwise. Philosophical analysis much harder to write, and I’m typically less satisfied with it when I’m done.

The mildly frustrating thing about this is that I think the philosophical analysis is where I get most of my ability to provide value. My thinking about economics, for example, is mainly guided by my philosophical thinking, and I wouldn’t be able to see what I see without an arduously worked out set of conceptual habits and frameworks. I’d enjoy the kind of encouragement and useful additional perspectives I get from seeing people react to the other topics.

Reflecting on this a bit, I think mostly what I’m doing with the philosophical work is gradually prying loose a set of deeply rooted cognitive illusions — illusions that I’m pretty sure arise from the way consciousness works in the human brain. Early on, I wrote a couple of posts that touch on this theme — and in keeping with the pattern described above, they were hard to write, didn’t seem to get a lot of interested readers, and I found them useful conceptual steps forward.

“Prying loose illusions” is actually not a good way to describe what needs to be done. We wouldn’t want to describe Copernicus’ work as “prying loose the geocentric illusion”. If he just tried to do that it wouldn’t have worked. Instead, I’m building up ways of thinking that I can substitute for these cognitive illusions (partially, with setbacks). This is largely a job of cognitive engineering — finding ways of thinking that stick as habits, that become natural, that I can use to generate descriptions of stuff in the world (such as economic behavior) which others find useful, etc.

In my (ever so humble) opinion this is actually the most useful task philosophers could be doing, although unfortunately as far as I can tell they mostly don’t see it an important goal, and I suspect in many cases would say it is “not really philosophy”. To see if I’m being grossly unfair to philosophers, I just googled for “goals of x” for various disciplines (philosophy, physics, sociology, economics, …). The results are interesting and I think indicate I’m right (or at least not unfair), but I think I’ll save further thoughts for a post about this issue. If you’re curious feel free to try this at home.

The path to a synthesis in statistical modeling

As I discuss in Dancing toward the singularity, progress in statistical modeling is a key step in achieving strongly reflexive netminds. However a very useful post by John Langford makes me think that this is a bigger leap than I hoped. Langford writes:

Attempts to abstract and study machine learning are within some given framework or mathematical model. It turns out that all of these models are significantly flawed ….

Langford lists fourteen frameworks:

  • Bayesian Learning
  • Graphical/generative Models
  • Convex Loss Optimization
  • Gradient Descent
  • Kernel-based learning
  • Boosting
  • Online Learning with Experts
  • Learning Reductions
  • PAC Learning
  • Statistical Learning Theory
  • Decision tree learning
  • Algorithmic complexity
  • RL, MDP learning
  • RL, POMDP learning

Within each framework there are often several significantly different techniques, which further divide statistical modeling practitioners into camps that have trouble sharing results.

In response, Andrew Gelman points out that many of these approaches use Bayesian statistics, which provides a unifying set of ideas and to some extent formal techniques.

I agree that Bayesian methods are helping to unify the field, but statistical modeling still seems quite fragmented.

So in “dancing” I was too optimistic to “doubt that we need any big synthesis or breakthrough” in statistical modeling to create strongly reflexive netminds. Langford’s mini-taxonomy, even with Gelman’s caveats, suggests that we won’t get a unified conceptual framework, applicable to actual engineering practice, across most kinds of statistical models until we have a conceptual breakthrough.

If this is true, of course we’d like to know: How big is the leap to a unified view, and how long before we get there?

Summary of my argument

The current state of statistical modeling seems pretty clearly “pre-synthesis” — somewhat heterogeneous, with different formal systems, computational techniques, and conceptual frameworks being used for different problems.

Looking at the trajectories of other more or less similar domains, we can see pretty clear points where a conceptual synthesis emerged, transforming the field from a welter of techniques to a single coherent domain that is then improved and expanded.

The necessary conditions for a synthesis are probably already in place, so it could occur at any time. Unfortunately, these syntheses seem to depend on (or at least involve) unique individuals who make the conceptual breakthrough. This makes the timing and form of the synthesis hard to predict.

When a synthesis has been achieved, it will probably already be embodied in software, and this will allow it to spread extremely quickly. However it will still need to be locally adapted and integrated, and this will slow down its impact to a more normal human scale.

The big exception to this scenario is that the synthesis could possibly arise through reflexive use of statistical modeling, and this reflexive use could be embodied in the software. In this case the new software could help with its own adoption, and all bets would be off.

Historical parallels

I’m inclined to compare our trajectory to the historical process that led to the differential and integral calculus. First we had a long tradition of paradoxes and special case solutions, from Zeno (about 450 BC) to the many specific methods based on infinitesimals up through the mid 1600s. Then in succession we got Barrow, Newton and Leibnitz. Newton was amazing but it seems pretty clear that the necessary synthesis would have taken place without him.

But at that point we were nowhere near done. Barrow, Newton and Leibnitz had found a general formalism for problems of change, but it still wasn’t on a sound mathematical footing, and we had to figure out how to apply it to specific situations case by case. I think it’s reasonable to say that it wasn’t until Hamilton’s work published in 1835 that we had a full synthesis for classical physics (which proved extensible to quantum mechanics and relativity).

So depending on how you count, the development of the calculus took around 250 years. We now seem to be at the point in our trajectory just prior to Barrow: lots of examples and some decent formal techniques, but no unified conceptual framework. Luckily, we seem to be moving considerably faster.

One problem for this analogy is that I can’t see any deep history for statistical modeling comparable to the deep history of the calculus beginning with Zeno’s paradox.

Perhaps a better historical parallel in some ways is population biology, which seems to have crystallized rather abruptly, with very few if any roots prior to about 1800. Darwin’s ideas were conceptually clear but mathematically informal, and the current formal treatment was established by Fisher in about 1920, and has been developed more or less incrementally since. So in this case, it took about 55 years for a synthesis to emerge after the basic issues were widely appreciated due to Darwin’s work.

Similarly, statistical modeling as a rough conceptual framework crystallized fairly abruptly with the work of the PDP Research Group in the 1980s. There were of course many prior examples of specific statistical learning or computing mechanisms, going back at least to the early 1960s, but as far as I know there was no research program attempting use statistical methods for general learning and cognition. The papers of the PDP Group provided excellent motivation for the new direction, and specific techniques for some interesting problems, but they fell far short of a general characterization of the whole range of statistical modeling problems, much less a comprehensive framework for solving such problems.

Fisher obviously benefited from the advances in mathematical technique, compared with the founders of calculus. We are benefiting from further advances in mathematics, but even more important, statistical modeling depends on computer support, to the point where we can’t study it without computer experiments. Quite likely the rapid crystallization of the basic ideas depended on rapid growth in the availability and power of computers.

So it is reasonable to hope that we can move from problems to synthesis in statistical modeling more quickly than in previous examples. If we take the PDP Group as the beginning of the process, we have already been working on the problems for twenty years.

The good news is that we do seem to be ready for a synthesis. We have a vast array of statistical modeling methods that work more or less well in different domains. Computer power is more than adequate to support huge amounts of experimentation. Sources of almost unlimited amounts of data are available and are growing rapidly.

On the other hand, an unfortunate implication of these historical parallels is that our synthesis may well depend on one or more unique individuals. Newton, Hamilton and Fisher were prodigies. The ability to move from a mass of overlapping problems and partial solutions to a unified conceptual system that meets both formal and practical goals seems to involve much more than incremental improvement.

Adoption of the synthesis

Once a synthesis is created, how quickly will it affect us? Historically it has taken decades for a radical synthesis to percolate into broad use. Dissemination of innovations requires reproducing the innovation, and it is hard to “copy” new ideas from mind to mind. They can easily be reproduced in print, but abstract and unfamiliar ideas are very hard for most readers to absorb from a printed page.

However, the situation for a statistical modeling synthesis is probably very different from our historical examples. Ideas in science and technology are often reproduced by “black boxing” them — building equipment that embodies them and then manufacturing that equipment. Depending on how quickly and cheaply the equipment can be manufactured, the ideas can diffuse quite rapidly.

Development of new ideas in statistical modeling depends on computer experiments. Thus when a synthesis is developed, it will exist at least partly in the form of software tools — already “black boxed” in other words. These tools can be replicated and distributed at almost zero cost and infinite speed.

So there is a good chance that when we do achieve a statistical modeling synthesis, “black boxes” that embody it will become available everywhere almost immediately. Initially these will only be useful to current statistical modeling researchers and software developers in related areas. The rate of adoption of the synthesis will be limited by the rate at which these black boxes can be adapted to local circumstances, integrated with existing software, and extended to new problems. This make adoption of the synthesis comparable to the spread of other innovations through the internet. However the increase in capability of systems will be far more dramatic than with prior innovations, and the size of subsequent innovations will be increased by the synthesis.

There is another, more radical possibility. A statistical modeling synthesis could be developed reflexively — that is, statistical modeling could be an essential tool in developing the synthesis itself. In that case the black boxes would potentially be able to support or guide their own adaptation, integration and extension, and the synthesis would change our world much more abruptly. I think this scenario currently is quite unlikely because none of the existing applications of statistical modeling lends themselves to this sort of reflexive use. It gets more likely the more we use statistical modeling in our development environments.

A reflexive synthesis has such major implications that it deserves careful consideration even if it seems unlikely.

The (Self)Importance of Ego

At the end of my previous post I asked

Why do we typically, like Fodor, feel uncomfortable with the possibility of being a community of subsystems, often loosely coordinated, with our conscious self acting as an intermittent framing mechanism for their activities? Surely this is a better description of our ordinary experience than vivid, comprehensive internal perception of a luminous self?

I think there are two reasons for our demand that “there had… better be somebody who is in charge; and, by God, it had better be me.” Unfortunately my opinions aren’t grounded in research — I’d love to know of any empirical results in this area.

First, to fulfill its basic functions, our current conscious sense of ourself in the world must provide a unified frame for our experience. The point of a conscious ego (in Baars’ model, which I think is basically correct) is to coordinate potentially conflicting behaviors, and to recruit the cognitive resources necessary to do the right things at the right times. To perform these functions, the ego must be the dominant narrative frame at any point in time. If we are aware of all the possible ways we could move at a given moment, we’ll stumble and fall.

The unity required for ego to function is of limited scope. Ego does not need to maintain a consistent view over long periods — in fact, everyone who tries meditation quickly discovers how little continuity ego really provides! Ego also does not need to unify all the potentially conflicting mental activities at any given time — it just needs to integrate the ones that are expressed in action, and even those only need to be integrated well enough so that they don’t overtly conflict. Finally, ego doesn’t need to “look inside” activities that are well enough rehearsed so that they don’t suffer from internal conflicts, even though those activities can be enormously complex — such as driving a car, or speaking a language, or doing both at the same time.

However these limitations of ego do not weaken our need to sustain its focus at any given moment. To act gracefully and effectively the current narrative frame must dominate our awareness — however incomplete that awareness may be, and however frequently the frame changes.

So the first reason we resist awareness of our own complexity, opacity and limited coherence is that we have a strong drive to “stick to our (current) story”, based on some combination of hard wired and learned patterns of cognitive self-management.

Second, in social interaction, especially verbal interaction, being able to operate smoothly within the current shared narrative frame is is very important, and being able to recruit others to one’s own narrative frame provides many advantages. Having a strong, even unshakeable grip on the appropriate narrative frame is a key to making social interaction work. For example “sometimes wrong, but never uncertain” is a important (if not generally admitted) tactic for being a “leader”. Of course many other specific tactics are important in social frame management — but a firm grip on one’s own frame is a necessary prerequisite to all of them.

Loss of faith in the validity of our own narrative frame would be a serious stumbling block to graceful and effective participation in social activity. A desire to “keep the faith” encourages individual habits of thought, and “folk” models of mental organization, which suppress awareness of our complexity, opacity and limited coherence. Furthermore, since in most social situations we all want to maintain a mutually agreeable narrative frame, our social norms strongly discourage undercutting faith in the frame.

I think that together, these personal and social motivations can explain our resistance to giving up our insistence that “somebody” must be “in charge”.

At this point, the relevance of these thoughts to the idea of a “unitary executive” is probably obvious, but the relevance to enforcement costs may still be obscure. I’ll take these up in my next post.

Ego, enforcement costs, and the unitary executive

Preparing the ground

This is the first of a few posts on this topic. Here I ask the motivating question, and review evidence on human (and animal) consciousness that will help to answer it.

Jerry Fodor, with his usual brilliant phrasing of bad ideas, sums up the issue:

If… there is a community of computers living in my head, there had also better be somebody who is in charge; and, by God, it had better be me.
In Critical Condition

More generally, human beings seem to feel that any time there is a population working together, somebody has to be in charge — and it had better be “one of us”.

But this is at best a debatable proposition, and it sometimes has very bad consequences. Furthermore, as I will discuss in subsequent posts, it acts as a barrier to population thinking. Why do we hold to it so strongly, and often (like Fodor) treat it as an axiom in philosophical, social, or political reasoning? And more practically, how can we adopt a more sensible stance?

I believe we can start to answer the first question by looking at the function of consciousness.

Bernard Baars, in his Cognitive Theory of Consciousness proposes a functional account of consciousness: an animal needs to be able to make coherent responses to potential threats or opportunities, and more generally to act as a coherent whole. Uncoordinated behavior is likely to be ineffective or even damaging. Consciousness provides a common frame that multiple sensory-motor systems can use to integrate their responses; most importantly, it helpt the animal to generate appropriate coordinated behavior in novel situations.

Libet’s experiments throw interesting additional light on the role of consciousness. Summarizing somewhat brutally, Libet found that our conscious awareness lags our perceptual stimulation, response, and even voluntary decisions by 200-500 milliseconds (1/5 to 1/2 second). Sometimes responses triggered by conscious awareness can intervene in an ongoing process and stop it, but in other cases the conscious awareness comes too late. We have probably all had the experience of seeing a glass tip over or fall off a table and being unable to get our body to move quickly enough to avert the mess, even though it seemed like there was enough time. The problem, of course, is that “we” (i.e. our conscious awareness) didn’t “get the news” until it was too late to do anything — and our sensory-motor subsystems weren’t ready to respond pre-consciously to that class of event because we weren’t in the right (conscious) “frame of mind” to prepare them.

All of this is consistent with Baar’s hypothesis. If consciousness mainly provides a “frame of mind” that helps (fairly autonomous) subsystems coordinate their activities, then it can still work fine even if it tracks events with some delay. Routine activities can proceed with minimal conscious awareness. Drastic interruptions or significant violations of expectations will generate an orienting response that pushes conscious awareness to the forefront.

This is a good story and may well be true — the empirical jury is still deliberating, though research continues to generate supporting evidence, making a favorable verdict increasingly likely.

But this story doesn’t account for the rhetorical verve and intensity of Fodor’s comment, or more generally our passionate attachment to feelings of comprehensive self-awareness and self-control. Why do we typically, like Fodor, feel uncomfortable with the possibility of being a community of subsystems, often loosely coordinated, with our conscious self acting as an intermittent framing mechanism for their activities? Surely this is a better description of our ordinary experience than vivid, comprehensive internal perception of a luminous self?

I’ll take up these questions in the next post of this series.