As I discuss in Dancing toward the singularity, progress in statistical modeling is a key step in achieving strongly reflexive netminds. However a very useful post by John Langford makes me think that this is a bigger leap than I hoped. Langford writes:
Attempts to abstract and study machine learning are within some given framework or mathematical model. It turns out that all of these models are significantly flawed ….
Langford lists fourteen frameworks:
- Bayesian Learning
- Graphical/generative Models
- Convex Loss Optimization
- Gradient Descent
- Kernel-based learning
- Boosting
- Online Learning with Experts
- Learning Reductions
- PAC Learning
- Statistical Learning Theory
- Decision tree learning
- Algorithmic complexity
- RL, MDP learning
- RL, POMDP learning
Within each framework there are often several significantly different techniques, which further divide statistical modeling practitioners into camps that have trouble sharing results.
In response, Andrew Gelman points out that many of these approaches use Bayesian statistics, which provides a unifying set of ideas and to some extent formal techniques.
I agree that Bayesian methods are helping to unify the field, but statistical modeling still seems quite fragmented.
So in “dancing” I was too optimistic to “doubt that we need any big synthesis or breakthrough” in statistical modeling to create strongly reflexive netminds. Langford’s mini-taxonomy, even with Gelman’s caveats, suggests that we won’t get a unified conceptual framework, applicable to actual engineering practice, across most kinds of statistical models until we have a conceptual breakthrough.
If this is true, of course we’d like to know: How big is the leap to a unified view, and how long before we get there?
Summary of my argument
The current state of statistical modeling seems pretty clearly “pre-synthesis” — somewhat heterogeneous, with different formal systems, computational techniques, and conceptual frameworks being used for different problems.
Looking at the trajectories of other more or less similar domains, we can see pretty clear points where a conceptual synthesis emerged, transforming the field from a welter of techniques to a single coherent domain that is then improved and expanded.
The necessary conditions for a synthesis are probably already in place, so it could occur at any time. Unfortunately, these syntheses seem to depend on (or at least involve) unique individuals who make the conceptual breakthrough. This makes the timing and form of the synthesis hard to predict.
When a synthesis has been achieved, it will probably already be embodied in software, and this will allow it to spread extremely quickly. However it will still need to be locally adapted and integrated, and this will slow down its impact to a more normal human scale.
The big exception to this scenario is that the synthesis could possibly arise through reflexive use of statistical modeling, and this reflexive use could be embodied in the software. In this case the new software could help with its own adoption, and all bets would be off.
Historical parallels
I’m inclined to compare our trajectory to the historical process that led to the differential and integral calculus. First we had a long tradition of paradoxes and special case solutions, from Zeno (about 450 BC) to the many specific methods based on infinitesimals up through the mid 1600s. Then in succession we got Barrow, Newton and Leibnitz. Newton was amazing but it seems pretty clear that the necessary synthesis would have taken place without him.
But at that point we were nowhere near done. Barrow, Newton and Leibnitz had found a general formalism for problems of change, but it still wasn’t on a sound mathematical footing, and we had to figure out how to apply it to specific situations case by case. I think it’s reasonable to say that it wasn’t until Hamilton’s work published in 1835 that we had a full synthesis for classical physics (which proved extensible to quantum mechanics and relativity).
So depending on how you count, the development of the calculus took around 250 years. We now seem to be at the point in our trajectory just prior to Barrow: lots of examples and some decent formal techniques, but no unified conceptual framework. Luckily, we seem to be moving considerably faster.
One problem for this analogy is that I can’t see any deep history for statistical modeling comparable to the deep history of the calculus beginning with Zeno’s paradox.
Perhaps a better historical parallel in some ways is population biology, which seems to have crystallized rather abruptly, with very few if any roots prior to about 1800. Darwin’s ideas were conceptually clear but mathematically informal, and the current formal treatment was established by Fisher in about 1920, and has been developed more or less incrementally since. So in this case, it took about 55 years for a synthesis to emerge after the basic issues were widely appreciated due to Darwin’s work.
Similarly, statistical modeling as a rough conceptual framework crystallized fairly abruptly with the work of the PDP Research Group in the 1980s. There were of course many prior examples of specific statistical learning or computing mechanisms, going back at least to the early 1960s, but as far as I know there was no research program attempting use statistical methods for general learning and cognition. The papers of the PDP Group provided excellent motivation for the new direction, and specific techniques for some interesting problems, but they fell far short of a general characterization of the whole range of statistical modeling problems, much less a comprehensive framework for solving such problems.
Fisher obviously benefited from the advances in mathematical technique, compared with the founders of calculus. We are benefiting from further advances in mathematics, but even more important, statistical modeling depends on computer support, to the point where we can’t study it without computer experiments. Quite likely the rapid crystallization of the basic ideas depended on rapid growth in the availability and power of computers.
So it is reasonable to hope that we can move from problems to synthesis in statistical modeling more quickly than in previous examples. If we take the PDP Group as the beginning of the process, we have already been working on the problems for twenty years.
The good news is that we do seem to be ready for a synthesis. We have a vast array of statistical modeling methods that work more or less well in different domains. Computer power is more than adequate to support huge amounts of experimentation. Sources of almost unlimited amounts of data are available and are growing rapidly.
On the other hand, an unfortunate implication of these historical parallels is that our synthesis may well depend on one or more unique individuals. Newton, Hamilton and Fisher were prodigies. The ability to move from a mass of overlapping problems and partial solutions to a unified conceptual system that meets both formal and practical goals seems to involve much more than incremental improvement.
Adoption of the synthesis
Once a synthesis is created, how quickly will it affect us? Historically it has taken decades for a radical synthesis to percolate into broad use. Dissemination of innovations requires reproducing the innovation, and it is hard to “copy” new ideas from mind to mind. They can easily be reproduced in print, but abstract and unfamiliar ideas are very hard for most readers to absorb from a printed page.
However, the situation for a statistical modeling synthesis is probably very different from our historical examples. Ideas in science and technology are often reproduced by “black boxing” them — building equipment that embodies them and then manufacturing that equipment. Depending on how quickly and cheaply the equipment can be manufactured, the ideas can diffuse quite rapidly.
Development of new ideas in statistical modeling depends on computer experiments. Thus when a synthesis is developed, it will exist at least partly in the form of software tools — already “black boxed” in other words. These tools can be replicated and distributed at almost zero cost and infinite speed.
So there is a good chance that when we do achieve a statistical modeling synthesis, “black boxes” that embody it will become available everywhere almost immediately. Initially these will only be useful to current statistical modeling researchers and software developers in related areas. The rate of adoption of the synthesis will be limited by the rate at which these black boxes can be adapted to local circumstances, integrated with existing software, and extended to new problems. This make adoption of the synthesis comparable to the spread of other innovations through the internet. However the increase in capability of systems will be far more dramatic than with prior innovations, and the size of subsequent innovations will be increased by the synthesis.
There is another, more radical possibility. A statistical modeling synthesis could be developed reflexively — that is, statistical modeling could be an essential tool in developing the synthesis itself. In that case the black boxes would potentially be able to support or guide their own adaptation, integration and extension, and the synthesis would change our world much more abruptly. I think this scenario currently is quite unlikely because none of the existing applications of statistical modeling lends themselves to this sort of reflexive use. It gets more likely the more we use statistical modeling in our development environments.
A reflexive synthesis has such major implications that it deserves careful consideration even if it seems unlikely.