Bubbles of humanity in a post-human world

Austin Henderson had some further points in his comment on Dancing toward the singularity that I wanted to discuss. He was replying to my remarks on a social phase-change toward the end of the post. I’ll quote the relevant bits of my post, substituting my later term “netminds” for the term I was using then, “hybrid systems”:

If we put a pot of water on the stove and turn on the heat, for a while all the water heats up, but not uniformly–we get all sorts of inhomogeneity and interesting dynamics. At some point, local phase transitions occur–little bubbles of water vapor start forming and then collapsing. As the water continues to heat up, the bubbles become more persistent, until we’ve reached a rolling boil. After a while, all the water has turned into vapor, and there’s no more liquid in the pot.

We’re now at the point where bubbles of netminds (such as “gelled” development teams) can form, but they aren’t all that stable or powerful yet, and so they aren’t dramatically different from their social environment. Their phase boundary isn’t very sharp.

As we go forward and these bubbles get easier to form, more powerful and more stable, the overall social environment will be increasingly roiled up by their activities. As the bubbles merge to form a large network of netminds, the contrast between people who are part of netminds and normal people will become starker.

Unlike the pot that boils dry, I’d expect the two phases–normal people and netminds–to come to an approximate equilibrium, in which parts of the population choose to stay normal indefinitely. The Amish today are a good example of how a group can make that choice. Note that members of both populations will cross the phase boundary, just as water molecules are constantly in flux across phase boundaries. Amish children are expected to go out and explore the larger culture, and decide whether to return. I presume that in some cases, members of the outside culture also decide to join the Amish, perhaps through marriage.

After I wrote this I encountered happiness studies that show the Amish are much happier and dramatically less frequently depressed than mainstream US citizens. I think its very likely that the people who reject netminds and stick with GOFH (good old fashioned humanity) may similarly be much happier than people who become part of netminds (on the average).

It isn’t too hard to imagine why this might be. The Amish very deliberately tailor their culture to work for them, selectively adopting modern innovations and tying them into their social practices in specific ways designed to maintain their quality of life. Similarly, GOFH will have the opportunity to tailor its culture and technical environment in the same way, perhaps with the assistance of friendly netminds that can see deeper implications than the members of GOFH.

I’m inclined to believe that I too would be happier in a “tailored” culture. Nonetheless, I’m not planning to become Amish, and I probably will merge into a netmind if a good opportunity arises. I guess my own happiness just isn’t my primary value.

[A]s the singularity approaches, the “veil” between us and the future will become more opaque for normal people, and at the same time will shift from a “time-like” to a “space-like” boundary. In other words, the singularity currently falls between our present and our future, but will increasingly fall between normal humans and netminds living at the same time. Netminds will be able to “see into” normal human communities–in fact they’ll be able to understand them far more accurately than we can now understand ourselves–but normal humans will find hybrid communities opaque. Of course polite netminds will present a quasi-normal surface to normal humans except in times of great stress.

By analogy with other kinds of phase changes, the distance we can see into the future will shrink as we go through the transition, but once we start to move toward a new equilibrium, our horizons will expand again, and we (that is netminds) may even be able to see much further ahead than we can are today. Even normal people may be able to see further ahead (within their bubbles), as long as the equilibrium is stable. The Amish can see further ahead in their own world than we can in ours, because they have decided that their way of life will change slowly.

Austin raises a number of issues with my description of this phase change. His first question is why we should regard the population of netminds as (more or less) homogeneous:

All water boils the same way, so that when bubbles coalesce they are coherent. Will bubbles of [netmind] attempt to merge, maybe that will take more work than their hybrid excess capability provides, so they will expend all their advantage trying to coalesce so that they can make use of that advantage. Maybe it will be self-limiting: the “coherence factor” — you have to prevent it from riding off at high speed in all directions.

Our current experience with networked systems indicates there’s a messy dynamic balance. Network effects generate a lot of force toward convergence or subsumption, since the bigger nexus tends to outperform the smaller one even if it is not technically as good. (Here I’m talking about nexi of interoperability, so they are conceptual or conventional, not physical — e.g. standards.)

Certainly the complexity of any given standard can get overwhelming. Standards that try to include everything break down or just get too complex to implement. Thus there’s a tendency for standards to fission and modularize. This is a good evolutionary argument for why we see compositionality in any general purpose communication medium, such as human language.

When a standard breaks into pieces, or when competing standards emerge, or when standards originally developed in different areas start interacting, if the pieces don’t work together, that causes a lot of distress and gets fixed one way or another. So the network effects still dominate, through making pieces interact gracefully. Multiple interacting standards ultimately get adjusted so that they are modular parts of a bigger system, if they all continue to be viable.

As for riding off in all directions, I just came across an interesting map of science. In a discussion of the map, a commenter makes just the point I made in another blog post, that real scientific work is all connected, pseudo-science goes off into little encapsulated belief systems.

I think that science stays connected because each piece progresses much faster when it trades across its boundaries. If a piece can’t or won’t connect for some reason it falls behind. The same phenomenon occurs in international trade and cultural exchange. So probably some netminds will encapsulate themselves, and others will ride off in some direction far enough so they can’t easily maintain communication with the mainstream. But those moves will tend to be self-limiting, as the relatively isolated netminds fall behind the mainstream and become too backward to have any power or influence.

None of this actually implies that netminds will be homogeneous, any more than current scientific disciplines are homogeneous. They will have different internal languages, different norms, different cultures, they will think different things are funny or disturbing, etc. But they’ll all be able to communicate effectively and “trade” questions and ideas with each other.

Austin’s next question is closely related to this first one:

Why is there only one phase change? Why wouldn’t the first set of [netminds] be quickly passed by the next, etc. Just like the generation gap…? Maybe, as it appears to me in evolution in language (read McWharter, “The Word on the Street” for the facts), the speed of drift is just matched by our length of life, and the bridging capability of intervening generations; same thing in space, bridging capability across intervening African dialects in a string of tribes matches the ability to travel. Again, maybe mechanisms of drift will limit the capacity for change.

Here I want to think of phase changes as occurring along a spectrum of different scales. For example, in liquid water, structured patterns of water molecules form around polar parts of protein molecules. These patterns have boundaries and change the chemical properties of the water inside them. So perhaps we should regard these patterns as “micro-phases”, much smaller and less robust than the “macro-phases” of solid, liquid and gas.

Given this spectrum, I’m definitely talking about a “macro-phase” transition, one that is so massive that it is extremely rare in history. I’d compare the change we’re going through to the evolution of the genetic mechanisms that support multi-cellular differentiation, and to the evolution of general purpose language supporting culture that could accumulate across generations. The exponential increases in the power of digital systems will have as big an impact as these did. So, yes, there will be more phase changes, but even if they are coming exponentially closer the next one of this magnitude is still quite some time away:

  • Cambrian explosion, 500 Million Years ago
  • General language, 500 Thousand Years ago
  • Human / Digital hybrids (netminds), now
  • next phase change, 500 years from now?

Change vs. coherence is a an interesting issue. We need to distinguish between drift (which is fairly continuous) and phase changes (which are quite discontinuous).

We have a hard time understanding Medieval English, as much because of cultural drift as because of linguistic drift. The result of drift isn’t that we get multiple phases co-existing (with rare exceptions), but that we get opaque history. In our context this means that after a few decades, netminds will have a hard time understanding the records left by earlier netminds. This is already happening as our ability to read old digital media deteriorates, due to loss of physical and format compatibility.

I imagine it would (almost) always be possible to go back and recover an understanding of historical records, if some netmind is motivated to put enough effort into the task — just as we can generally read old computer tapes, if we want to work hard enough. But it would be harder for them than for us, because of the sheer volume of data and computation that holds everything together at any given time. Our coherence is very very thin by comparison.

For example the “thickness” of long term cultural transmission in western civilization can be measured in some sense by the manuscripts that survived from Rome and Greece and Israel at the invention of printing. I’m pretty sure that all of those manuscripts would fit on one (or at most a few) DVDs as high resolution images. To be sure these manuscripts are a much more distilled vehicle of cultural transmission than (say) the latest Tom Cruise DVD, but at some point the sheer magnitude of cultural production overwhelms this issue.

Netminds will up the ante at an exponential rate, as we’re already seeing with digital production technology, blogging, etc. etc. Our increasing powers of communication pretty quickly exceed my ability to understand or imagine the consequences.

Who’s in charge here?

In a very useful post, Jonah Lehrer wonders:

…if banal terms like “executive control” or “top-down processing” or “attentional modulation” hide the strangeness of the data. Some entity inside our brain, some network of neurons buried behind our forehead, acts like a little petit tyrant, and is able to manipulate the activity of our sensory neurons. By doing so, this cellular network decides, in part, what we see. But who controls the network?

I posted a comment on Jonah’s blog but it took so long to get approved that probably no one will see it. So I’m posting an enhanced version here.

Jonah’s final sentence, “But who controls the network?” illustrates to me the main obstacle to a sensible view of human thought, identity, and self-regulation.

We don’t ask the same question about the network that controls our heart rate. It is a fairly well defined, important function, tied to many other aspects of our mental state, but it is an obviously self-regulating network. It has evolved to manage its own fairly complex functions in ways that support the survival of the organism.

So why ask the question “Who controls it?” about attentional modulation? We know this network can be self-controlling. There are subjectively strange but fairly common pathologies of attentional modulation (such as hemi-neglect where we even understand some of the network behavior) that are directly traceable to brain damage, and that reveal aspects of the network’s self-management. We can measure the way attention degrades when overloaded through various cognitive tasks. Etc. etc. There’s nothing fundamentally mysterious or challenging to our current theoretical frameworks or research techniques.

Yet many people seem to have a cognitive glitch here, akin to the feeling people had on first hearing that the earth was round, “But then we’ll fall off!” Our intuitive self-awareness doesn’t stretch naturally to cover our scientific discoveries. As Jerry Fodor says “there had… better be somebody who is in charge; and, by God, it had better be me.”

I’ve written some posts (1, 2) specifically on why this glitch occurs but I think it will take a long time for our intuitive sense of our selves to catch up with what we already know.

And I guess I ought to write the post I promised back last April. I’ll call it “Revisiting ego and enforcement costs”. Happily it seems even more interesting now than it did then, and it ties together the philosophy of mind themes with some of my thinking on economics.

Social fixed points

Austin Henderson in his comment on Dancing toward the singularity starts by remarking on an issue that often troubles people when dealing with reflexive (or reflective) systems:

On UI, when the machine starts modeling us then we have to incorporate that modeling into our usage of it. Which leads to “I think that you think that ….”. Which is broken by popping reflective and talking about the talk. Maybe concurrently with continuing to work. In fact that may the usual case: reflecting *while* you are working.

We need the UIs to support his complexity. You talk about the ability to “support rapid evolution of conventions between the machine and the human,”. …. As for the “largely without conscious human choice” caveat, I think that addresses the other way out of the thinking about thinking infinite regress: practice, practice, practice.

I think our systems need to be reflexive. Certainly our social systems need to be reflective. But then what about the infinite regress that concerns Austin?

There are many specific tricks, but really they all boil down to the same trick: take the fixed point. Fixed points make recursive formal systems, such as lambda calculus, work. They let us find stable structure in dynamic systems. They are great.

Fixed points are easy to describe, but sometimes hard to understand. The basic idea is that you know a system is at a fixed point when you apply a transformation f to the system, and nothing happens. If the state of the system is x, then at the fixed point, f(x) = x — nothing changes. If the system isn’t at a fixed point, then f(x) = x’ — when you apply f to x, you “move” the system to x’.

A given system may have a unique fixed point — for example, well behaved expressions in the lambda calculus have a unique least fixed point. Or a system may have many fixed points, in which case it will get stuck at the one it gets to first. Or it may have no fixed points, in which case it just keeps changing each time you apply f.

Now suppose we have a reflective system. Let’s say we’re modeling a computer system we’re using (as we must to understand it). Let’s also say that at the same time, the system is modeling us, with the goal of (for example) showing us what we want to see at each point. We’d like our behavior and the system’s behavior to converge to a fixed point, where our models don’t change any more — which is to say, we understand each other. If we never reached a fixed point, we’d find it very inconvenient — the system’s behavior would keep changing, and we’d have to keep “chasing” it. This sort of inconvenience does arise, for example, in lists that try to keep your recent choices near the top.

Actually, of course, we probably won’t reach a truly fixed point, just a “quiescent” point that changes much more slowly than it did in the initial learning phase. As we learn new aspects of the system, as our needs change, and perhaps even as the system accumulates a lot more information about us, our respective models will adjust relatively slowly. I don’t know if there is a correct formal name for this sort of slowly changing point.

People model each other in interactions, and we can see people finding fixed points of comfortable interaction, that drift and occasionally change suddenly when they discover some commonality or difference. People can also get locked into very unpleasant fixed points with each other. This might be a good way to think about the sort of pathologies that Ronald Laing called “knots”.

Fixed points are needed within modeling systems, as well as between them. The statistical modeling folks have recently (say the last ten years) found that many models containing loops, which they previous thought were intractable, are perfectly well behaved with the right analysis — they provably converge to the (right) fixed points. This sort of reliably convergent feedback is essential in lots of reasoning paradigms, including the set of compression / decompression algorithms that come closest to the the Shannon bounds on channel capacity.

Unfortunately we typically aren’t taught to analyze systems in terms of this sort of dynamics, and we don’t have good techniques for designing reflexive systems — for example, UIs that model the user and converge on stable, but not excessively stable fixed points. If I’m right that we’re entering an era where our systems will model everything they interact with, including us, we’d better get used to reflexive systems and start working on those design ideas.

Information technology and social change

I’ve recently written up some strategic thoughts for a university (which shall remain nameless) and will post them here, since they develop some themes that I’ve discussed in other posts.

Information technology driving social change

Our information environment is rapidly being transformed by digital systems. Today’s students will work most of their lives in a world transformed by digital information. Their success will depend to a large extent on how well they cope with, understand and anticipate the social and institutional consequences of these technology trends.

The technical trend is that the cost of storing, transmitting and processing digital information has been declining exponentially for decades, and will continue to decline at more or less the same rate for decades. This creates immense pressure for economic and social changes unprecedented in history.

The economic trend is radical factor substitution. Any activity that can take advantage of the declining cost of digital information gets “sucked into” the digital domain. In many cases, the costs become so low that they are effectively zero, like the cost of napkins or a glass of water in a restaurant — there is indeed a cost, but it is below the threshold of individual accounting or control.

While these are simple trends, their social implications are far from simple, because we have no easy way to anticipate what changes are possible or likely. These factor substitutions are radical because they typically involve reinvention of a business, and such drastic changes can only be discovered through innovation and testing in the real world. We have been repeatedly surprised by personal computers, the internet, the world wide web, search engines,Wikipedia, YouTube, etc.

So the overall effect is that large social changes will be driven by simply, easily stated technical trends for at least several more decades. Even though we know the cause, we will be continually surprised by these changes because they arise from technical, business and social innovation that takes advantage of exponentially falling costs.

The ground rules of information goods

Information goods are very different from material goods. Scientific and scholarly communities have always operated largely by the ground rules of information goods, but since material goods were dominant in most areas of society, information goods haven’t gotten major attention from scholars, until recently.

Until the early 1990s, essentially all information goods were embedded in material goods (books, vinyl records, digital tapes, etc.). High-speed digital communication finally split material and information goods completely, and enabled new modes of production. We are finally understanding how the differences in ground rules between material and information goods arise from very different transaction costs, coordination costs, and different levels of asymmetric information on the part of producers and consumers.

One key ground rule is becoming clear: Voluntary contribution and review are essential and often dominant aspects of information good production.

This ground rule has always been important in scholarship. Scholars have always done research, written articles and performed peer review primarily because producing information goods was intrinsic to their vocation. Now, due to the exponential shifts in the cost of information technology, this ground rule is applying to a much wider swath of society.

The successful businesses of the internet era such as Amazon, Google and EBay, depend almost entirely on external content voluntarily contributed and reviewed by their stakeholders — buyers, sellers, creators of indexed web sites, people who create and post video (in the case of YouTube), etc.

The same pattern applies to major new social enterprises enabled by information technology. For example Wikipedia, Linux and Apache all produce information goods (software and content) that are dominant in their very large and important markets, and they produce them through voluntary contributions and review by their stakeholders.

Meta: Patterns in my posting (and my audience)

I’ve been posting long enough, and have enough reaction from others (mainly in the form of visits, links and posts on other blogs) that I can observe some patterns in how this all plays out.

My posts cluster roughly around three main themes (in retrospect, not by design):

  • Economic thinking, informed more by stochastic game theory than Arrow-Debreu style models
  • The social impact of exponential increases in computer power, especially coupled with statistical modeling
  • Philosophical analysis of emergence, supervenience, downward causation, population thinking, etc.

These seem to be interesting to readers roughly in that order, in (as best I can tell) a power-law like pattern — that is, I get several times as many visitors looking at my economic posts than my singularity / statistical modeling posts, and almost no one looking for my philosophical analysis (though the early Turner post has gotten some continuing attention).

I find the economics posts the easiest — I just “write what I see”. The statistical modeling stuff is somewhat more work, since I typically have to investigate technical issues in more depth than I would otherwise. Philosophical analysis much harder to write, and I’m typically less satisfied with it when I’m done.

The mildly frustrating thing about this is that I think the philosophical analysis is where I get most of my ability to provide value. My thinking about economics, for example, is mainly guided by my philosophical thinking, and I wouldn’t be able to see what I see without an arduously worked out set of conceptual habits and frameworks. I’d enjoy the kind of encouragement and useful additional perspectives I get from seeing people react to the other topics.

Reflecting on this a bit, I think mostly what I’m doing with the philosophical work is gradually prying loose a set of deeply rooted cognitive illusions — illusions that I’m pretty sure arise from the way consciousness works in the human brain. Early on, I wrote a couple of posts that touch on this theme — and in keeping with the pattern described above, they were hard to write, didn’t seem to get a lot of interested readers, and I found them useful conceptual steps forward.

“Prying loose illusions” is actually not a good way to describe what needs to be done. We wouldn’t want to describe Copernicus’ work as “prying loose the geocentric illusion”. If he just tried to do that it wouldn’t have worked. Instead, I’m building up ways of thinking that I can substitute for these cognitive illusions (partially, with setbacks). This is largely a job of cognitive engineering — finding ways of thinking that stick as habits, that become natural, that I can use to generate descriptions of stuff in the world (such as economic behavior) which others find useful, etc.

In my (ever so humble) opinion this is actually the most useful task philosophers could be doing, although unfortunately as far as I can tell they mostly don’t see it an important goal, and I suspect in many cases would say it is “not really philosophy”. To see if I’m being grossly unfair to philosophers, I just googled for “goals of x” for various disciplines (philosophy, physics, sociology, economics, …). The results are interesting and I think indicate I’m right (or at least not unfair), but I think I’ll save further thoughts for a post about this issue. If you’re curious feel free to try this at home.

A good example of post-capitalist production

This analysis of the Firedoglake coverage of the Libby trial hits essentially all the issues we’ve been discussing.

  • Money was required, but it was generated by small contributions from stakeholders (the audience), targeted to this specific project.
  • A relatively small amount of money was sufficient because the organization was very lightweight and the contributors were doing it for more than just money.
  • The quality was higher than the work done by the conventional organizations (news media) because the FDL group was larger and more dedicated. They had a long prior engagement with this story.
  • FDL could afford to put more feet on the ground than the (much better funded) news media, because they were so cost-effective.
  • The group (both the FDL reporters and their contributors) self-organized around this topic so their structure was very well suited to the task.
  • Entrepreneurship was a big factor — both long-term organization of the site, and short-term organization of the coverage.
  • FDL, with no prior journalistic learning curve, and no professional credentials, beat the professional media on their coverage of a high-profile hard-core news event.

This example suggests that we don’t yet know the inherent limits of this post-capitalist approach to production of (at least) information goods. Most discussions of blogs vs. (traditional) news media have assumed the the costs inherent in “real reporting” meant blogs couldn’t do it effectively. The FDL example shows, among other things, that the majority of those costs (at least in this case) are due to institutional overhead that can simply be left out of the new model.

We’re also discovering that money can easily be raised to cover specific needs, if an audience is very engaged and/or large. Note that even when raising money, the relationship remains voluntary rather than transactional — people contribute dollars without imposing any explicit obligations on their recipient. No one incurs the burden of defining and enforcing terms. In case of fraud or just disappointing performance, the “customers” will quickly withdraw from the relationship, so problems will be self-limiting.

It is interesting to speculate about how far this approach could go. To pick an extreme example, most of the current cost of new drugs is not manufacturing (which will remain capital intensive for the forseeable future), but rather is the information goods — research, design, testing, education of providers, etc. — needed to bring drugs to market. At this point it seems impossible that these processes could be carried out in a post-capitalist way. But perhaps this is a failure of imagination.

Rationality is only an optimization

I’m reading a lovely little book by H. Peyton Young, Individual Strategy and Social Structure, very dense and tasty. I checked out what he had done recently, and found “Individual Learning and Social Rationality” in which, as he says, “[w]e show how high-rationality solutions can emerge in low-rationality environments provided the evolutionary process has sufficient time to unfold.”

This reminded me of work by Duncan Foley on (what might be called) low-rationality economics, beginning with “Maximum Entropy Exchange Equilibrium” and moving to a more general treatment in “Classical thermodynamics and economic general equilibrium theory“. Foley shows that the equilibria of neoclassical economics, typically derived assuming unbounded rationality, can in fact be approximated by repeated interactions between thoughtless agents with simple constraints. These results don’t even depend on agents changing due to experience.

So from the careful, well grounded results by these two scholars, I’d like to take an alarmingly speculative leap: I conjecture that all rationality is an optimization, which lets us get much faster to the same place we’d end up after sufficiently extended thoughtless wandering of the right sort. This hardly makes rationality unimportant, but it does tie it to something less magical sounding.

I like this way of thinking about rationality, because it suggests useful questions like “What thoughtless equilibrium does this rational rule summarize?” and “How much rationality do we need to get close to optimal results here?” In solving problems a little rationality is often enough, trying to add more just may just produce gratuitous formality and obscurity.

At least in economics and philosophy, rationality is often treated as a high value, sometimes even an ultimate value. If it is indeed an optimization of the path to thoughtless equilibria, it is certainly useful but probably not worthy of such high praise. Value is more to be found through comparing the quality of the equilibria and understanding the conditions that produce them, than by getting to them faster.

Capital is just another factor

Wow! Lots of people came to see Capitalists vs. Entrepreneurs, via great responses by Tim Lee, Jesse Walker, Tech and Science News Updates, and Logan Ferree (scroll down) and maybe others I didn’t see. Thanks! Reading over those posts and comments, I think perhaps the issue is simpler than I realized, although the implications certainly aren’t.

Really we are talking about a very basic idea: Capital is just another factor in production, like labor or material resources.

Since capital is just a factor, its importance in production will change over time. Specifically now, the importance of capital is falling. As we get richer and industry gets more productive,any given capital item gets cheaper. Things like a fast computer, a slice of network bandwidth, etc. are so cheap that any professional in a developed economy can do their own production of information goods with no outside capital.

It seems that we’ve confused free markets with “capitalism”. This only makes sense as long as the key issue in markets is the availability of capital. From a long term perspective, naming our economic system after one factor of production is just silly.

On the other hand, free markets depend essentially on individual judgment, choice, creativity, and on people’s ability to sustain a network of social relationships. These make free markets possible, and taken together they constitute entrepreneurship.

So unlike capital, entrepreneurship is central to any possible free market system.

The inevitability of peer production

In this context, rather than being strange or subversive, or even needing to be explained, peer production is viable when:

  1. capital costs (needed for production) fall far enough and
  2. coordination costs fall far enough.

Cheap computing and communication reduce both of these exponentially, so peer production becomes inevitable.

This was not apparent until recently, and even now is hard for many people to believe. People are still looking for an “economic justification” for peer production. “How does it help people make money?” they ask. But this confuses the means with the end. Money is a means of resource allocation and coordination. If we have other means that cost less or work better, economics dictates that we will use them instead of money.

A digression on coordination

Economists typically talk about “transaction costs” but I’m deliberately using the term “coordination costs”. Transactions (a la Coase) typically involve money, and certainly require at least contractual obligations. Coordination by contrast only depends on voluntary cooperation. Transaction costs will always be higher than coordination costs, because transactions require the ability to enforce the terms of the transaction. This imposes additional costs — often enormously larger costs.

As I point out in “The cost of money” introducing money into a relationship creates a floor for costs. I didn’t say it there, but it is equally true that contractual obligations introduce the same kind of floor for costs. Only when a relationship is freely maintained by the parties involved, with no requirement to monitor and enforce obligations, can these costs be entirely avoided.

Not surprisingly, peer production succeeds in domains where people can coordinate without any requirement to enforce prior obligations. Even the most limited enforcement costs typically kill it. Clay Shirky develops this argument in the specific case of Citizendum (a replacement for Wikipedia that attempts to validate the credentials of its contributors).

A shift of perspective

I’m only beginning to see the implications of this way of thinking about capital, but it has already brought to mind one entertaining analogy.

In the late middle ages, feudalism was being undermined by (among other things) the rise of trade. Merchants, previously beneath notice, began to get rich enough so that they could buy clothes, furniture and houses that were comparable to those of the nobility.

One response of the “establishment” was to institute sumptuary laws, strictly limiting the kinds of clothes, furniture, houses, etc. merchants could own. There was a period where rich merchants found ways to “hack” the laws with very expensive plain black cloth and so forth, and then the outraged nobility would try to extend the laws to prohibit the hack. Of course this attempt to hold back the tide failed.

I think that in the current bizarre and often self-damaging excesses of copyright and patent owners, we’re seeing something very like these sumptuary laws. Once again, the organization of economic activity is changing, and those who’ve benefited from the old regime aren’t happy about that at all. They are frantically throwing up any legal barriers they can to keep out the upstarts. But once again, attempts to hold back the tide will fail.

The path to a synthesis in statistical modeling

As I discuss in Dancing toward the singularity, progress in statistical modeling is a key step in achieving strongly reflexive netminds. However a very useful post by John Langford makes me think that this is a bigger leap than I hoped. Langford writes:

Attempts to abstract and study machine learning are within some given framework or mathematical model. It turns out that all of these models are significantly flawed ….

Langford lists fourteen frameworks:

  • Bayesian Learning
  • Graphical/generative Models
  • Convex Loss Optimization
  • Gradient Descent
  • Kernel-based learning
  • Boosting
  • Online Learning with Experts
  • Learning Reductions
  • PAC Learning
  • Statistical Learning Theory
  • Decision tree learning
  • Algorithmic complexity
  • RL, MDP learning
  • RL, POMDP learning

Within each framework there are often several significantly different techniques, which further divide statistical modeling practitioners into camps that have trouble sharing results.

In response, Andrew Gelman points out that many of these approaches use Bayesian statistics, which provides a unifying set of ideas and to some extent formal techniques.

I agree that Bayesian methods are helping to unify the field, but statistical modeling still seems quite fragmented.

So in “dancing” I was too optimistic to “doubt that we need any big synthesis or breakthrough” in statistical modeling to create strongly reflexive netminds. Langford’s mini-taxonomy, even with Gelman’s caveats, suggests that we won’t get a unified conceptual framework, applicable to actual engineering practice, across most kinds of statistical models until we have a conceptual breakthrough.

If this is true, of course we’d like to know: How big is the leap to a unified view, and how long before we get there?

Summary of my argument

The current state of statistical modeling seems pretty clearly “pre-synthesis” — somewhat heterogeneous, with different formal systems, computational techniques, and conceptual frameworks being used for different problems.

Looking at the trajectories of other more or less similar domains, we can see pretty clear points where a conceptual synthesis emerged, transforming the field from a welter of techniques to a single coherent domain that is then improved and expanded.

The necessary conditions for a synthesis are probably already in place, so it could occur at any time. Unfortunately, these syntheses seem to depend on (or at least involve) unique individuals who make the conceptual breakthrough. This makes the timing and form of the synthesis hard to predict.

When a synthesis has been achieved, it will probably already be embodied in software, and this will allow it to spread extremely quickly. However it will still need to be locally adapted and integrated, and this will slow down its impact to a more normal human scale.

The big exception to this scenario is that the synthesis could possibly arise through reflexive use of statistical modeling, and this reflexive use could be embodied in the software. In this case the new software could help with its own adoption, and all bets would be off.

Historical parallels

I’m inclined to compare our trajectory to the historical process that led to the differential and integral calculus. First we had a long tradition of paradoxes and special case solutions, from Zeno (about 450 BC) to the many specific methods based on infinitesimals up through the mid 1600s. Then in succession we got Barrow, Newton and Leibnitz. Newton was amazing but it seems pretty clear that the necessary synthesis would have taken place without him.

But at that point we were nowhere near done. Barrow, Newton and Leibnitz had found a general formalism for problems of change, but it still wasn’t on a sound mathematical footing, and we had to figure out how to apply it to specific situations case by case. I think it’s reasonable to say that it wasn’t until Hamilton’s work published in 1835 that we had a full synthesis for classical physics (which proved extensible to quantum mechanics and relativity).

So depending on how you count, the development of the calculus took around 250 years. We now seem to be at the point in our trajectory just prior to Barrow: lots of examples and some decent formal techniques, but no unified conceptual framework. Luckily, we seem to be moving considerably faster.

One problem for this analogy is that I can’t see any deep history for statistical modeling comparable to the deep history of the calculus beginning with Zeno’s paradox.

Perhaps a better historical parallel in some ways is population biology, which seems to have crystallized rather abruptly, with very few if any roots prior to about 1800. Darwin’s ideas were conceptually clear but mathematically informal, and the current formal treatment was established by Fisher in about 1920, and has been developed more or less incrementally since. So in this case, it took about 55 years for a synthesis to emerge after the basic issues were widely appreciated due to Darwin’s work.

Similarly, statistical modeling as a rough conceptual framework crystallized fairly abruptly with the work of the PDP Research Group in the 1980s. There were of course many prior examples of specific statistical learning or computing mechanisms, going back at least to the early 1960s, but as far as I know there was no research program attempting use statistical methods for general learning and cognition. The papers of the PDP Group provided excellent motivation for the new direction, and specific techniques for some interesting problems, but they fell far short of a general characterization of the whole range of statistical modeling problems, much less a comprehensive framework for solving such problems.

Fisher obviously benefited from the advances in mathematical technique, compared with the founders of calculus. We are benefiting from further advances in mathematics, but even more important, statistical modeling depends on computer support, to the point where we can’t study it without computer experiments. Quite likely the rapid crystallization of the basic ideas depended on rapid growth in the availability and power of computers.

So it is reasonable to hope that we can move from problems to synthesis in statistical modeling more quickly than in previous examples. If we take the PDP Group as the beginning of the process, we have already been working on the problems for twenty years.

The good news is that we do seem to be ready for a synthesis. We have a vast array of statistical modeling methods that work more or less well in different domains. Computer power is more than adequate to support huge amounts of experimentation. Sources of almost unlimited amounts of data are available and are growing rapidly.

On the other hand, an unfortunate implication of these historical parallels is that our synthesis may well depend on one or more unique individuals. Newton, Hamilton and Fisher were prodigies. The ability to move from a mass of overlapping problems and partial solutions to a unified conceptual system that meets both formal and practical goals seems to involve much more than incremental improvement.

Adoption of the synthesis

Once a synthesis is created, how quickly will it affect us? Historically it has taken decades for a radical synthesis to percolate into broad use. Dissemination of innovations requires reproducing the innovation, and it is hard to “copy” new ideas from mind to mind. They can easily be reproduced in print, but abstract and unfamiliar ideas are very hard for most readers to absorb from a printed page.

However, the situation for a statistical modeling synthesis is probably very different from our historical examples. Ideas in science and technology are often reproduced by “black boxing” them — building equipment that embodies them and then manufacturing that equipment. Depending on how quickly and cheaply the equipment can be manufactured, the ideas can diffuse quite rapidly.

Development of new ideas in statistical modeling depends on computer experiments. Thus when a synthesis is developed, it will exist at least partly in the form of software tools — already “black boxed” in other words. These tools can be replicated and distributed at almost zero cost and infinite speed.

So there is a good chance that when we do achieve a statistical modeling synthesis, “black boxes” that embody it will become available everywhere almost immediately. Initially these will only be useful to current statistical modeling researchers and software developers in related areas. The rate of adoption of the synthesis will be limited by the rate at which these black boxes can be adapted to local circumstances, integrated with existing software, and extended to new problems. This make adoption of the synthesis comparable to the spread of other innovations through the internet. However the increase in capability of systems will be far more dramatic than with prior innovations, and the size of subsequent innovations will be increased by the synthesis.

There is another, more radical possibility. A statistical modeling synthesis could be developed reflexively — that is, statistical modeling could be an essential tool in developing the synthesis itself. In that case the black boxes would potentially be able to support or guide their own adaptation, integration and extension, and the synthesis would change our world much more abruptly. I think this scenario currently is quite unlikely because none of the existing applications of statistical modeling lends themselves to this sort of reflexive use. It gets more likely the more we use statistical modeling in our development environments.

A reflexive synthesis has such major implications that it deserves careful consideration even if it seems unlikely.

Leaving knowledge on the table

Yesterday I had a very interesting conversation with an epidemiologist while I was buying a cup of coffee (it’s great to live in a university town).

She confirmed a dark suspicion I’ve had for some time — large population studies do a terrible job of extracting knowledge from their data. They use basic statistical methods, constrained by the traditions of the discipline, and by peer review that has an extremely narrow and wasteful view of what count as valid statistical tools. She also said that even if they had the freedom to use other methods, they don’t know how to find people who understand better tools and can still talk their language.

The sophisticated modeling methods that have been developed in fields like statistical learning aren’t being applied (as far as either of us know) to the very large, rich, expensive and extremely important datasets collected by these large population studies. As a result, we both suspect a lot of important knowledge remains locked up in the data.

For example, her datasets include information about family relationships between subjects, so the right kind of analysis could potentially show how specific aspects of diet interact with different genotypes. But the tools they are using can’t do that.

We’d all be a lot better off if some combinations of funding agencies and researchers could bridge this gap.

Next Page »