Home Home | About Us | Sitemap | Contact  
  • Info For
  • Professionals
  • Students
  • Educators
  • Media
  • Search
    Powered By Google

I-O and the Crowd: Frequently Asked Questions About Using Mechanical Turk for Research 

Patricia Barger*
Kronos, Inc.

Tara S. Behrend
The George Washington University

David J. Sharek
North Carolina State University

Evan F. Sinar
Development Dimensions International (DDI)


I-O researchers are constantly seeking sources for large and representative participant samples. A relatively new option which has seen growing use is Amazon’s Mechanical Turk (MTurk; www.mturk.com). MTurk is an online marketplace connecting two groups: requesters offering payment for completion of human intelligence tasks (HITs) and workers willing to complete such tasks. Though human subjects research was not the intended purpose of this marketplace, it has proved viable as a source of participants. Our goal in this article is to—using a frequently asked questions (FAQ) format—provide an overview of MTurk for the I-O community and to discuss practical applications of this method for academic and applied research, drawing on recent research and our own experiences.

Frequently Asked Questions

How Does MTurk Work?
Launched in 2005 by Amazon.com, MTurk was originally used as an internal tool (for example, searching for duplicate products). It has since grown rapidly and now lists over 100,000 public HITs on average. The number of registered workers has also grown substantially; from 100,000 in 2007 (Pontin, 2007) to currently over 500,000, from more than 190 countries. Workers can browse a listing of all available HITs or search by keyword to find tasks they would like to work on. Anyone can sign up to use MTurk as either a requester or worker (or both).

Who Are These People?
The first questions asked by every researcher we have spoken to about MTurk, and indeed, by ourselves, focus on the workers themselves: Who are these mysterious individuals interested and willing to complete microtasks for miniscule payments? This is a nontrivial issue, as generalizability of research findings is bounded by characteristics of the sample. In addition, an entirely new data collection method such as MTurk is likely to face particular scrutiny given its major deviation from status quo approaches to data-gathering. We elaborate below.

What are the characteristics of MTurk workers? Because the MTurk system is set up to strictly protect workers’ anonymity, self-report surveys must be used to gauge this information. Two such studies have shown that MTurk worker populations are multinational yet primarily from the U.S. and India (Ipeirotis, 2010; Ross, Zaldivar, Irani, & Tomlinson, 2010). Of the U.S. workers, approximately 65% are female and 60% are older than 30.The modal household income for these U.S. workers is $40,000 to $60,000, and 78% have at least a bachelor’s degree. Our experience has also revealed that workers come from a wide range of industries and work backgrounds.

How representative is an MTurk sample? The qualifications feature included in MTurk can provide substantial flexibility for a researcher to limit their sample to a subset of the total worker population. However, it is also important to understand the characteristics of MTurk workers as a whole. Past research has shown that compared to the U.S. Internet population, MTurk workers are slightly younger and more likely to be female (Paolacci et al., 2010), and have lower household incomes (Ipeirotis, 2010).Compared to the U.S. workforce, MTurk workers are younger, more likely to be female, have higher education levels, and have fairly similar levels of household income.

Generalizability can also be viewed relative to other available data sources. One relevant comparison is between an MTurk sample and the university-based samples common to many psychological studies. Many researchers (e.g., Behrend, Sharek, Meade, & Wiebe, 2011; Buhrmeister, Kwang, & Gosling, 2011; Paolacci, Chandler, & Ipeirotis, 2011) have viewed this comparison very favorably for MTurk, in that samples obtained through this method are substantially more representative than student participants. Based on the available research and our own experience, we feel that the sample representativeness of MTurk workers makes them well-suited to employee-focused research, particularly in comparison to many other alternatives.

Why do individuals participate in MTurk? Motivation for MTurk participation has been a key question for several researchers. Essentially, participants complete these tasks because they find them to be interesting enough to fill time between other activities while at the same time accumulating some money. A general conclusion can be drawn that although payment is not irrelevant, it is not the primary motivating factor for most participants (Behrend et al., 2011; Buhrmester et al., 2011; Ross et al., 2010).

How Are Workers Paid?
Workers can be rewarded in one of two ways. Most commonly, a predetermined payment is automatically transferred to a worker once he or she completes a task (reward per assignment). As a second method, a bonus can be directed toward specific workers based on requester-determined criteria. This can be used as an incentive for workers to produce high quality data. Bonuses can also be used as part of the research design if you want to give top performers additional compensation, for example.

Payment is integrated into the Amazon interface. Requesters prepay for HITs by transferring money to an Amazon requester account. Amazon manages all payments to workers, thus protecting anonymity. When budgeting for an experiment, note that Amazon charges a 10% fee on top of the worker payments.

In order for a worker to receive payment, their work must be approved. If a worker does not complete a HIT adequately, their work can be rejected by the requester. This track record of approval and rejection (HIT approval rate) is public. This mechanism encourages high quality work, although it can also introduce ethical considerations (discussed below).
HITs can be approved in one of two ways. Requesters can review and approve each worker’s output on a per-case basis, or they can specify that all work should be automatically approved after a set time period. Workers generally expect that their work will be approved within a few days of completion.

How Much Should I Pay for my HIT?
There are no hard and fast rules for setting the reward amount, but, as a requester, you will be competing with many other HITs. Before settling on a payment, we recommend that you visit the HITs page and look at the other tasks that are available so you can more thoughtfully price your HIT based on the current payment landscape and time required. Some requesters aim to pay close to the equivalent of minimum wage (e.g., $1 for a 10-minute assignment). Other requesters opt to pay significantly less than this, as low as 50 cents per hour. We have found that approximately 75 cents is a reasonable rate for a 30-minute survey, though if you need to collect data very quickly, or have a complex task or study, then consider paying more per HIT.

How Good Are the Data That Come out of an MTurk Survey?
Given the relative low cost and speed of gathering data on MTurk, it is natural to wonder about the quality of the data. There have been multiple studies examining this question, and the results have largely been favorable. For example, Paolacci et al. (2010) replicated three well-established decision-making experiments with MTurk participants, online discussion boards, and a college student sample. The results revealed only slight quantitative differences between the samples. Buhrmester et al. (2011) compared MTurk participants to a large Internet sample and found no differences in task scores. Behrend et al. (2011) studied data quality differences between a worker sample and a college student sample on a variety of measures (e.g., Big Five personality and goal orientation). For both quantitative and qualitative data, they found slightly higher data quality in the MTurk sample, along with higher social desirability, and measurement invariance among the majority of the items between groups. Lastly, a series of studies by Barger and Sinar (2011) demonstrated no differences between MTurk and applicant populations on personality and situational judgment assessment item scores when data quality enhancement techniques were applied. Overall, the evidence seems to suggest that data collected from MTurk are as valid as data collected from other sources, if data quality assurance steps are taken as noted below.

What Should I Do To Ensure High Data Quality?
Although initial evidence is encouraging regarding the validity of MTurk data, careful study design is critical. The nature of MTurk is such that workers earn more money by completing more HITs. Thus, some workers may be incentivized to “rush through” HITs. One way to mitigate such careless responding is to embed quality control items within a survey—items that require the same amount of effort as other items but have a verifiable response. For example, a Likert-type personality scale may include a quality control question that directs the participant to “please select ‘agree’ if you are paying attention.” Participants who fail to provide the correct answers to these items can be filtered out of the analysis, yielding a cleaner sample (Kittur et al., 2008; Barger & Sinar, 2011). Keep in mind that these participants will still need to be paid unless you specifically state otherwise in your HIT. Bonus payments can also be used to improve data quality. For example, previous research has shown that offering a bonus to the top 25% of performers on a series of situational judgment items increased data quality across the entire sample (Barger & Sinar, 2011). Examining the time spent completing HITs may also be an easy way to identify careless responders (Mason & Suri, 2011, Kittur et al., 2008). Lastly, we advise including a text box in HITs for participants to report confusing items or instructions, as these issues may contribute to poor data quality (Mason & Suri, 2011), and open-ended feedback from participants can be very useful in improving future research efforts.

Will Journal Editors Be Receptive to MTurk-Collected Data?
Given the potential for high-quality data that can be collected through MTurk, it stands to reason that journals would be receptive to receiving articles using this platform. Because this is a new data collection source, few studies using this methodology have been published (beyond the articles published that explore the use of this platform). However, studies using similar online panels (e.g., Study Response project) have previously been published in top journals (e.g., Piccolo & Colquitt, 2006). Thus, assuming care is taken in sample representativeness, study design, and data quality assurance (in addition to other important factors such as solid theory, methods, measures, and analyses), we see no inherent obstacles to publication of studies using MTurk in traditional I-O outlets.
 
Do I Need a Strong Technology Background to Use MTurk?
The level of Web development skill required to use MTurk can be low in many cases but will vary depending on the specifics of your experimental design. Any study that can be deployed online in an unproctored setting can be managed within MTurk; complex designs may require some technical proficiency but simple designs should be within the reach of most I-O researchers.

Surveys can be developed and administered directly in a HIT template using the design layout function. Data are stored in the HIT management interface and results can be viewed and downloaded as a comma-separated file (CSV). In this scenario, the experiment is embedded directly into MTurk,  and knowledge of HTML and some Amazon-specific functionality is required. Amazon provides thorough documentation including video tutorials for the novice to assist with this approach.

External survey tools such as SurveyMonkey or Qualtrics also can be used. In these situations, the HIT displays a link to the questionnaire. Once completed, the survey software can provide a completion code that participants must enter into the HIT to indicate they have completed the experiment, as no other linking mechanism exists. We have posted a step-by-step tutorial for this design at: http://www.playgraph.com/mturk. In this case a minimal level of Web development skill is required; however, some knowledge of HTML would be useful.

Some types of tasks may require custom-designed experiments using Web technologies such as Adobe Flash or Java. These designs will require a skilled Web programmer with experience using server-side scripting languages and databases. Though developing these types of HITs can quickly become complicated, tools are becoming available to support complex research designs, such as Turkit’s API for running iterative tasks (http://groups.csail.mit.edu/uid/turkit/). It is also worth noting that vendors exist who will manage and deploy a HIT for a fee if you lack the technical expertise or time to manage the HIT yourself.

How Is MTurk Different From Other Online Survey Panels?
One unique feature of MTurk is that social communities are being formed around the use of the tool. Researchers planning on setting up experiments on the MTurk should be aware of the existence of MTurk community forums on the Web. In these forums, workers commonly share information on which HITs are worth participation. For example, TurkerNation (http://turkers.proboards.com/) hosts multiple posts where workers describe their positive and negative experiences with surveys. One such post has over 48,000 views. Workers also share examples of how they have been treated by requesters, such as the “Requester Hall of Fame/Shame” section with over 10,000 posts.

MTurk also differs from other data collection methods in its speed. Most studies can gather several hundred participants in a few days, though this may vary depending on many factors such as payment amount, task complexity and length, uniqueness of the task, and whether your study requires multiple sessions.

What Are the Ethical Factors to Consider?
The ethical treatment of participants should be of primary concern for all researchers who work with human subjects. Given that MTurk represents a new way to gather data, best practices to ensure ethical treatment of workers are not yet well defined. However, the fundamental principles of research ethics still apply.

Informed consent: It’s important to provide workers with a transparent description of the research study, level of effort required, and associated payment so they can make a fully informed decision about whether to participate. Given that their approval rating can be affected by unfinished work, workers may feel pressure to complete the task even if they wish to withdraw. In addition, because data quality can affect the rate of payment, participants may also feel pressured to respond in socially desirable ways. For these reasons, it is particularly important to provide a robust informed consent so that workers fully understand what they are expected to do and the associated reward contingencies (Behrend et al., 2011).

Privacy and confidentiality: Workers are anonymous to requesters; they are identified by an alphanumeric worker ID containing no identifiers. If participants complete a HIT using external survey software, individual responses are kept separate from worker IDs, reducing concerns about how to safely store sensitive data (Paolacci et al., 2010). Note, however, that the MTurk interface also allows requesters to contact individual workers via e-mail, and worker replies to these messages could be sent from the worker’s personal e-mail. In addition, such contact by a requester might be considered spam. It is advised that, for any study requiring a follow up, workers should be asked explicitly if they agree to be contacted for future studies.

What Are the Legal Factors to Consider?
User agreement. Researchers using MTurk are bound by Amazon’s user agreement and must comply with these guidelines. For example, MTurk’s user agreement specifically prohibits the collection of personally identifiable information and e-mail addresses. Thus, it is advised that researchers carefully read the user agreement to ensure compliance with all policies before posting. For more detail about the MTurk agreement, visit https://requester.mturk.com/ policies/conditionsofuse.

Taxes. Participants in studies on MTurk are technically independently contracted workers that researchers “hire” to complete their HITs. Requesters in the United States are required to pay taxes on individual workers whom they have paid more than the IRS tax reporting threshold in a single year (currently $600). Given the relatively low payment that most studies offer, though, it’s unlikely that a single worker would earn a taxable amount from research participation.


Conclusion

The use of MTurk is growing quickly within the I-O community, but questions still remain as to the best way to use this tool. We hope this article has provided an initial introduction to some of the issues I-O researchers may want to consider when using MTurk, and we look forward to continued conversations on this cutting-edge topic in the future.


* Author’s Note: Authors are listed alphabetically; all authors contributed equally. Correspondence concerning this article should be directed to Tara S. Behrend; Department of Organizational Sciences & Communication; The George Washington University; 600 21st St. NW, Washington, D.C. 20052; behrend@gwu.edu.

References

Barger, P. B. & Sinar, E. F. (2011, April). Psychological data from Amazon.com’s MTurk: Rapid and inexpensive—But high-quality? Poster presented at the 26th Annual Conference for the Society for Industrial and Organizational Psychology, Chicago, IL.

Behrend, T. S., Sharek, D. J., Meade, A. W. & Wiebe, E. N. (2011). The viability of crowdsourcing for survey research. Behavior Research Methods, 43, 1–14.

Buhrmester, M., Kwang, T., & Gosling, S. D. (2011). Amazon’s Mechanical Turk: A new source of inexpensive, yet high-quality, data? Perspectives on Psychological Science, 6, 3–5.

Ipeirotis, P. (2010). Demographics of mechanical turk. Retrieved from http://hdl.handle.net/ 2451/29585.

Kittur, A., Chi, E. H., & Suh, B. (2008). Crowdsourcing user studies with Mechanical Turk. In Proceedings of CHI 2008. Seattle, WA: ACM Press.

Mason, W., & Suri, S. (2011). Conducting behavioral research on Amazon’s Mechanical Turk. Behavior Research Methods 43, 1–23.

Mason, W., & Watts, D. J. (2010). Financial incentives and the “performance of crowds.” SIGKDD Explore Newsletter, 11(2), 100-108. doi: 10.1145/1809400.1809422.

Paolacci, G., Chandler, J., & Ipeirotis, P. G. (2010). Running experiments on Amazon Mechanical Turk. Judgment and Decision Making, 5, 411–419.

Piccolo, R. F., & Colquitt, J. A. (2006). Transformational leadership and job behaviors: The mediating role of core job characteristics. Academy of Management Journal, 49, 327–340.

Pontin, J. (2007, March 25). Artificial intelligence: With help from the humans. The New York Times. Retrieved from http://www.nytimes.com/2007/03/25/business/yourmoney/ 25Stream.html.

Ross, J., Zaldivar, A., Irani, L., & Tomlinson, B. (2010, April). Who are the cowdworkers? Shifting demographics in Mechanical Turk. Presented at the Annual CHI Conference, Atlanta, GA.

Zhu, D., & Carterette, B. (2010). An analysis of assessor behavior in crowd sourced preference judgments. In M. Lease, V. Carvalho, & E. Yilmaz (Eds.), Proceedings of the ACM SIGIR 2010 Workshop on Crowdsourcing for Search Evaluation (CSE 2010, pp. 21–26). Geneva, Switzerland.