Turking! The idea and some implications

I recently read an edited collection of five stories, Metatropolis; the stories are set in a common world the authors developed together. This is a near future in which nation state authority has eroded and in which new social processes have grown up and have a big role in making things work. In some sense the theme of the book was naming and exploring those new processes.

One of those processes was turking. The term is based on Amazon’s Mechanical Turk. Google shows the term in use as far back as 2005 but I hadn’t really wrapped my mind around the implications; Metatropolis broadens the idea way beyond Amazon’s implementation or any of the other discussions I’ve read.

Turking: Getting bigger jobs done by semi-automatically splitting them up into large numbers of micro-jobs (often five minutes long or less), and then automatically aggregating and cross-checking the results. The turkers (people doing the micro-jobs) typically don’t have or need any long term or contractual relationship with the turking organizers. In many cases, possibly a majority, the turkers aren’t paid in cash, and often they aren’t paid at all, but do the tasks as volunteers or because they are intrinsically rewarding (as in games).

One key element that is distinctive to turking is some sort of entirely or largely automated process for checking the results — usually by giving multiple turkers the same task and comparing their results. Turkers who screw up too many tasks aren’t given more of those tasks. Contrast this with industrial employment where the “employer” filters job candidates, contracts with some to become “employees”, and then enforces their contracts. The relationship in turking is very different: the “employer” lets anybody become an “employee” and do some tasks, doesn’t (and can’t) control whether or how the “employee” does the work, but measures each “employee’s” results and decides whether and how to continue with the relationship.

This is an example of a very consistent pattern in the transition from industrial to networked relationships: a movement from gatekeeping and control to post hoc filtering. Another example is academic publishing. The (still dominant) industrial model of publishing works through gatekeeping — articles and books don’t get published until they are approved through peer review. The networked model works through post hoc processes: papers go up on web sites, get read, commented and reviewed, often are revised, and over time get positioned on a spectrum from valid/valuable to invalid/worthless. The networked model is inexorably taking over, because it is immensely faster, often fairer (getting a few bad anonymous reviews can’t kill a good paper), results in a wider range of better feedback to authors, etc.

It seems quite possible — even likely — that post hoc filtering for work will produce substantially better results than industrial style gatekeeping and control in most cases. In addition to having lower transaction costs, it could produce better quality, a better fit between worker and task, and less wasted effort. It also, of course, will change how much the results cost and how much people get paid — more on that below.

Amazon’s Mechanical Turk just involves information processing — web input, web output — and this is typical of most turking today. However there are examples which involve real world activities. In an extreme case turking could be used to carry out terrorist acts, maybe without even doing anything criminal — Bruce Sterling has some stories that explore this possibility. But there are lots of ordinary examples, like counting the empty parking spaces on a given block, or taking a package and shipping it.

Examples

  • Refugees in camps are turking for money. The tasks are typical turking tasks, but the structure seems to be some more standard employment relationship. If there were enough computers, I bet a high percentage of the camp residents would participate, after some short period in which everyone learned from each other how to do the work. Then the organizers would have to shift to turking methods because the overhead of managing hundreds of thousands of participants using contracting and control would be prohibitive.
  • A game called FoldIt is using turking to improve machine solutions to protein folding. Turns out humans greatly improve on the automatic results but need the machine to do the more routine work. The turkers have a wide range of skill and a variety of complementary strategies, so the project benefits from letting a many people try and then keeping the ones who succeed. (This is an example where the quality is probably higher than an industrial style academic model could generate.) The rewards are the intrinsic pleasure of playing the game, and also maybe higher game rankings.
  • There’s a startup named CrowdFlower that aims to make a business out of turkingrowdFlower has relationships with online games that include the turking in their game play. So the gamers get virtual rewards (status, loot). I can easily imagine that the right turking tasks would actually enhance game play. CrowdFlower are also doing more or less traditional social science studies of turking motivations etc. Of course the surveys that generate data for the research are also a form of turking.
  • Distributed proofreading. OCR’d texts are distributed to volunteers and the volunteers check and correct the OCR. (They get both the image and the text.) The front page goes out of its way to note that “there is no commitment expected on this site beyond the understanding that you do your best.” This is an early turking technology, and works in fairly large chunks, a page at a time. It may be replaced by a much finer grained technology that works a word at a time — see below.
  • Peer production (open source and open content). An important component of peer production is small increments of bug reporting, testing, code review, documentation editing, etc. Wikipedia also depends on a lot of small content updates, edits, typo fixes, etc. These processes have the same type of structure as turking, although they typically hasn’t been called turking. The main difference from the other examples is there’s no clear-cut infrastructure for checking the validity of changes. This is at least partly historical, these processes arose before the current form of turking was worked out. The incentive — beyond altruism and the itch to correct errors — is that one can get credit in the community and maybe even in the product.
  • I just recently came across another good example that deserves a longer explanation: ReCaptcha. It is cool because it takes two bad things, and converts them into two good things, using work people were doing anyway.

    The first bad thing is that OCR generates lots of errors, especially on poorly printed or scanned material — which is why the distributed proofreading process above is required. These can often be identified because the results are misspelled and/or the OCR algorithm reports low confidence. From the OCR failures, you can generate little images that OCR has trouble recognizing correctly.

    The second bad thing is that online services are often exploited by bad actors who use robots to post spam, abusively download data, etc. Often this is prevented by captchas, images that humans can convert into text, but that are hard for machines to recognize. Since OCR failures are known to be hard for machines to recognize correctly, they make good captchas.

    Recaptcha turns the user effort applied to solving captchas, which would otherwise be wasted, into turking to complete the OCR — essentially very fine grained distributed proofreading. Recaptcha figures out who’s giving the correct answers by having each user recognize both a known word and an unknown word, in addition to comparing answers by different users. Users are rewarded by getting the access they wanted.

    Note that if spammers turk out captcha reading (which they are doing, but which increases their costs significantly) then they are indirectly paying for useful work as well. Potentially Recaptcha could be generalized to any kind of simple pattern recognition that’s relatively easy for humans and hard for machines, which could generate a lot of value from human cognitive capacities.

    Some implications

    It seems that over time a huge variety and quantity of work could be turked. The turking model has the capacity to productively employ a lot of what Clay Shirky calls our “cognitive surplus”, and also whatever time surplus we have. Many unemployed people, refugee populations and I’m sure lots of other groups have a lot of surplus. As Shirky points out, even employed people have a discretionary surplus that they spend watching TV, reading magazines, playing computer games, etc. However right now there’s no way to bring this surplus to market.

    Switching from industrial relationships (heavyweight, gatekeeping and control) to networked relationships (lightweight, post hoc filtering) reduces per task transaction costs to a tiny fraction of their current level, and makes it feasible to bring much of this surplus to market.

    The flip side of that of course is that the more this surplus is available for production, the less anyone will get paid for the work it can do. Already in a lot of existing turking, the participants aren’t getting paid — and in many cases the organizers aren’t getting paid either. Also, more or less by definition, the surplus that would be applied to turking currently isn’t being used for any other paid activity, so potential workers aren’t giving up other pay to turk. Therefore, I expect the average payment for a turked task to approach zero, for both turkers and organizers. Usually there will still be rewards, but they will tend to be locally generated within the specific context of the tasks (online community, game, captcha, whatever). Often the entity that generates the rewards won’t won’t get any specific benefit from the turking — for example, in the case of ReCaptcha, the sites that use it don’t particularly benefit from whatever proofreading gets done.

    Mostly turking rewards won’t be measurable in classical monetary terms — in some cases rewards may involve “in game” currency but this doesn’t yet count in the larger economy. In classical monetary terms, the marginal cost of getting a job turked will probably approach the cost of building, maintaining and running the turking infrastructure — and that cost is exponentially declining and will continue to do so for decades.

    This trend suggests that we need to find some metric complementary to money to aggregate preferences and allocate large scale social effort. But I’m not going to pursue that question further here.

    Obviously it will be important to understand what types of work can be turked and what can’t. For example, could the construction of new houses be turked? That may seem like a stretch, but Habitat for Humanity and other volunteer groups do construct houses with a process very much like turking — and of course this has a long history in the US, with institutions like barn raising. Furthermore the use of day labor isn’t that different from turking. I’d guess that within ten years we’ll be turking much of the construction of quite complex buildings. It is interesting to try to imagine what this implies for construction employment.

    Realistically, at this point we just don’t know the limits of turking. My guess is that the range of things that can be done via turking will turn out to be extremely broad, but that it will take a lot of specific innovations to grow into that range. Also of course there will be institutional resistance to turking many activities.

    When a swarm of turkers washes over any given activity and devours most of it, there will typically be a bunch of nuggets left over that can’t be turked reliably. These will probably be things that require substantial specialized training and/or experience, relatively deep knowledge of the particular circumstances, and maybe certification and accountability. Right now those nuggets are embedded in turkable work and so it is hard or impossible to figure out their distribution, relative size, etc. For a while (maybe twenty years or so) we’ll keep being surprised — we’ll find some type of nuggets we think can’t be turked, and then someone will invent a way to make most of them turkable. Only if and when turking converges on a stable institution will we be able to state more analytically and confidently the characteristics that make a task un-turkable.

    Another issue is security / confidentiality. Right now, corporations are willing to use turking for lots of tasks, but I bet they wouldn’t turk tasks involving key market data, strategic planning, or other sensitive material. On the other hand, peer production projects are willing to turk almost anything, because they don’t have concerns about maintaining a competitive advantage by keeping secrets. (They do of course have to keep some customer data private if they collect it at all, but usually they just avoid recording personal details.) I’d guess that over time this will give entities that keep fewer secrets a competitive advantage. I think this is already the case for a lot of related reasons because broadly speaking “Trying to keep secrets imposes huge transaction costs.” Eventually keeping big secrets may come to be seen as an absurdly expensive and dubious proposition and the ability to keep big pointless secrets will become an assertion of wealth and power. (Every entity will need to keep a few small secrets, such as the root password to their servers. But we know how to safely give everyone read only access to almost everything, and still limit changes to those who have those small secrets.)

    There’s lots more to say, but that’s enough for now.

    2 Responses to “Turking! The idea and some implications”

    1. Elfie
      August 28th, 2010 | 3:30 pm

      Hi Jed,
      Interesting stuff, . I can’t agree that building will be turked out in the near future because you don’t have the luxury of having multiple versions of the same thing done by different people and barns were very simple buildings indeed. I photograph the groups that work for habitat and their investment is pretty big time and money wise. For Habitat I think the payoff is more in social involvement than in actual monetary savings, the corporations that donate (and that is a big saving) do it to polish their public profile. Notably habitat doesn’t use volunteers for anything complex or structural. In a complex building (and even the simplest one gets more complex every day), one mistake on an essential part of the project (concrete, electric, carpentry etc) and the whole project is in trouble. The costs of mistakes would be huge. Designing the building could possibly be turked but even then, one small mistake, one big problem. Buildings cannot be easily taken apart and redone right. I wonder, however, about design for prefab elements and buildings made from them. That is an area where a lot of different knowledge bases and perspectives could potentially contribute hugely.

      Elfie

    2. Jed
      August 28th, 2010 | 4:47 pm

      Excellent points about the risk of errors — applicable much more widely than just to buildings, but certainly relevant there.

      I tried to address this in passing when I said the un-turkables “will probably be things that require substantial specialized training and/or experience, relatively deep knowledge of the particular circumstances, and maybe certification and accountability.” But your points make clearly there’s at minimum a big missing aspect of that.

      Certainly in building etc. I’d expect lots of aspects that require skilled trades — e.g. electricians, plumbers, etc. That was more or less the issue I had in mind.

      But your examples make clear that in many domains whoever’s running the process will need to protect against errors and unqualified turkers throughout, not just in a subset of the tasks.

      However a more industrial model, in which the owner of the process contracts with individuals or companies to do the work, and then monitors and controls their efforts, by no means guarantees error free, correct results.

      So the question isn’t how we get things done perfectly, but how we do at least as well as the industrial model, and preferably better.

      Here’s at least a piece of an answer.

      In cases where errors are costly, the people turking probably need to have adequate reputations (maintained in some very explicit, easily checkable form). In relatively extreme cases, they may even need to be bonded to a level sufficient to pay for normal mistakes.

      Also, the turking model already implies that any work is going to be checked. As you point out in the case of building that couldn’t be done by building multiple copies and comparing them, but checking can certainly be turked out, and if necessary multiple turkers can do the same check and their results can be compared.

      I bet using normal contracting to get the work done, and just turking out checking would prevent and/or identify a lot of problems. So that might be a safer intermediate step.

      This process would automatically generate reputations for the checkers and the people doing the work (even if the work was done using normal contracting). Those could then be fed back in to make the process more efficient and reliable.

      The closest we come now to this sort of checking is that architects typically monitor and to some degree control the building process, and that city inspectors do final checks for conformance to code. But both of these are very spotty, neither is self-enforcing through cross-checks, and neither leads to public reputations for workers.

      Obviously there’s a lot of hand waving here, but this process is quite similar to what I do when I hire contractors. I have several look at the job and explain what they want to do and why, and I evaluate them by cross-checking their explanations. Then I check references for the one or two I like the best. My working assumption is that I have to prevent mistakes by picking the right people, not by contracting and monitoring.

      My process would work much better if I had a detailed database of what these contractors had done in the past and how it checked afterward. Also I’d feel more confident if I could rely on good checking by experts along the way.

      Concretely, suppose I wanted to replace my roof (something I’ve done) and rather than spending weeks interviewing roofers and then checking their references (as I did), I could just put a job description up on some web site. A few days before before the job, the site would quote me a team of turkers qualified to do roofing. Actually it would need to be several sub-teams, since for example the team that tears off the old roof is significantly bigger and less skilled than the team that puts the new one on.

      Thinking about this example, I guess there’d probably be a team boss who would actually check and coordinate the team. But the boss wouldn’t need to have a company or any contracts with the team members, they might not select or even know the team members, and someone who’s the boss for one team might be just a member on another team.

      I don’t know if this makes sense for you but I think it would work for me.

    Leave a reply