Jim Rebar / Friday, July 1, 2016 / Categories: 541 Auto-Detection Versus Self-Report: Best Practices in Mobile Device Research Jessica R. Petor, Ted B. Kinney, Luye Chang, Amie D. Lawrence, and Don Moretti Notes: This paper is a revision of a poster submission presented at the SIOP conference, 2016 at Anaheim, CA. Morrie Mullins was the editor for this article Testing on mobile devices is becoming increasingly popular among test takers and organizations. A survey conducted by PEW Research Center in 2015 found that 64% of adults now own a smartphone of some kind, up from 35% in 2011 (PEW Research Center, 2015). Among those who own a smartphone, 18% have used their phone to submit a job application in the last year. This research coincides with Select International’s data that, in 2014, 17.2% of applicants used a tablet or smartphone to complete screening assessments. More interestingly, those with relatively low income, younger adults, and non-Whites are likely to be “smartphone-dependent;” 10% of Americans own a smartphone but do not have broadband at home, and 15 % own a smartphone but say they have limited options for going online other than their cell phone (Smith, 2015). In addition to the growing number of individuals who have access to mobile technologies, the increase in technology advancement entices organizations to adopt unproctored Internet testing.1These factors have contributed to the growing use of mobile devices in unproctored test administration. By allowing individuals to take assessments on mobile devices, organizations cannot only improve the diversity of their applicant pool (Kinney, Lawrence, & Chang, 2014) but also can lower assessment costs by reducing or eliminating the need for administrators (Arthur, Doverspike, Muñoz, Taylor, & Carr, 2014). Many organizations do find their selection systems fast forwarding into this emerging mobile technology trend, which has pushed I-O psychologists (e.g., Arthur et al., 2014; Doverspike, Arthur, Taylor, & Carr, 2012; Illingworth, Morelli, Scott, & Boyd, 2015; Illingworth, Morelli, Scott, Moon, & Boyd, 2013; Impelman, 2013; Lawrence, Wasko, Delgado, Kinney, & Wolf, 2013; Morelli, Illingworth, Scott, & Lance, 2012; Morelli, Mahan, & Illingworth, 2014) to investigate the impact that mobile technology brings to employment testing and assessment. Last year, mobile assessment was voted as the top of the 10 workplace trends by nearly 8,000 SIOP members (https://www.siop.org/article_view.aspx?article=1343; Below, 2014). In the recent SIOP conference program (2016, Anaheim), at least 10 sessions centered on mobile device testing; the present paper was one of the poster presentations. Research topics included measurement equivalence (Grossenbacher, Brown, & Nguyen, 2016; LaPort, Huynh, Moretti, Stemer, & Ryer, 2016; Rossini, 2016), validity and application reactions (Dudley, Carpenter, Ferrell, Boyer, & Reeves, 2016), performance changes in cognitive ability tests and simulations (Huff, 2016; O’Connell, Chang, Lawrence, & Kinney, 2016), device characteristics (Chang, Lawrence, O’Connell, & Kinney, 2016), mobile optimization (Tomczak, 2016), and correction for device responsiveness (Lawrence, Chang, O’Connell, & Kinney, 2016). When comparing mobile administration to their non-mobile counterparts, several studies have shown measurement equivalence with noncognitive, text-based measures (Doverspike et al., 2012; Illingworth et al., 2013; Impelman, 2013; Kinney et al., 2014; Lawrence et al., 2013; Morelli et al., 2012), whereas others that examined cognitive ability tests and/or interactive simulations found moderate differences, where mobile users scored lower than nonmobile users (Doverspike et al., 2012; Impelman, 2013; O’Connell et al., 2016). To fully understand the assessment issues related to mobile devices, the first step is to accurately classify test takers into the right device group or category (e.g., mobile/nonmobile). Moreover, as devices continue to evolve, it is necessary to discriminate in greater detail than just mobile versus nonmobile device types. Device features such as screen size, touch screen, and mouse usage are examples of critical variables to understand as they relate to each device. In several presentations at SIOP this year, studies have found that screen size of the devices was related to mobile performance (Chang et al., 2016; O’Connell et al., 2016). Therefore, it becomes critically important how researchers detect devices in their research. In fact, if we are not confident in the accuracy of our device detection, it calls all mobile device results into question. For example, a conclusion that mobile and nonmobile devices performed equivalently on a test may be questionable if the mobile group in the study is mixed with nonmobile test takers as a result of inaccurate device detection. Even if the proportion of misidentification does not suffice to overturn a conclusion, it very likely led to biased magnitudes of relationships and effect sizes. Getting this device detection correct is a critical starting point to this entire stream of research, and as non-IT experts, I-O psychologists are learning that there is some reason for concern about whether or not the research reported to this point has identified devices accurately in the samples under investigation. As we investigate device detection, it is apparent that in most cases, researchers do not know with 100% confidence the device type applicants used to take assessments in their samples. Several approaches to automatic device detection have been deployed with varying degrees of success; however, it is not clear that these automatic detection approaches are more accurate than simply asking participants to self-report device type. As research on mobile devices develops, it is critical to understand whether or not our classification of participants to device groups is accurate, if we are to draw meaningful conclusions. The purpose of this paper is to compare the accuracy of device detection methods (auto-detection vs. self-report) using two applicant samples. Results from the study will guide future research on mobile devices in selection by providing a best practice recommendation for identifying the device type of participants in research samples. Current Mobile Research Device Detection To uncover how applicants’ device types are captured in current mobile device research, we corresponded with the authors of various mobile studies to understand the device detection strategies (e.g., Arthur et al., 2014; Illingworth et al., 2014; Kinney et al., 2014; Lawrence et al., 2013; Morelli et al., 2014). Each study compared mobile devices to nonmobile devices to understand measurement equivalence across device types. The researchers categorized mobile devices in different ways and used different device detection approaches. Primarily, these studies employed two different strategies for device identification: automatic device detection and self-report. Automatic device detection. Automatic device detection systems mechanically code “user agent” data, where a “user-agent string” (UA) is extracted from the http request headers sent to assessment web applications. Examples of the type of information included in user agent strings include browser type (e.g., Internet Explorer, Safari) and operating systems (e.g., OSX, Android). These types of automatic systems utilize large device databases. These databases (which need to be updated frequently to keep up with technology) are used to compare the user’s device information to “automatically” code device type. Auto detection using these data usually relies on the operating systems or browsers that are indicated in the UA string, which can typically indicate whether a device is mobile (smartphone, tablet) or nonmobile (desktop PC, laptop). It is important to note that there are cases where the device’s operating system or browser doesn’t necessarily indicate mobile versus nonmobile because the operating system or browser is used on both types of devices (e.g., Windows 8) and cases where tablets and smartphones can use the same operating systems (e.g., Android). By primarily relying on operating systems or browsers for auto detection, a researcher could easily incorrectly identify devices used by participants in the dataset, although the extent or frequency of these errors has not been examined. Although theoretically this poses a problem for all research using this strategy, the magnitude of the problem with auto-device detection in the research has not been investigated. These auto-detection strategies have been used in a number of mobile device studies. For example, Morelli et al. (2014) collected applicant device (N~220,000) usage through an auto-detection method where the online testing platform automatically recorded the device type when an applicant accessed the online assessment. Arthur et al. (2014) used an auto-detection method where the web-based assessment delivery software was able to determine and record the type of device (N = 3,575,207) by means of the unique http signature (of the operating system) of the device job applicants used to log on to the assessment software. Illingworth et al. (2014), like the other two studies, recorded the device (mobile; n = 7,743 and nonmobile; n = 929,341), browser, and operating system for every applicant that accessed the organization’s web portal and used these data to automatically assign device type. For the results of these studies to be interpretable, these auto-detection strategies must be reliable. Self -report.Several studies have also simply used self-report measures to identify devices (e.g., Kinney et al., 2014; Lawrence et al., 2013), because arguably test takers have the best knowledge about the device they were using. This approach is used partly due to the relative ease of collection and partly because this approach is often the only feasible way to access device information when auto detection is not possible. Although with many self-report data sources participants may respond in socially desirable ways, the motivation to misrepresent mobile devices is not high among test takers. There is no perceived preference for taking a test with any particular device, so there is no grounding for a hypothesis suggesting that self-report would be inaccurate for this type of question. Although, it appears that auto-detection systems are popular among mobile equivalence research, it could be that self-report strategies are more accurate. Given the known issues with current user agent data regarding operating system and browsers, it is very likely that actual test takers do a better job of identifying the device category than auto-detection methods. Hypothesis:Between auto-detection device identification strategies and self-report device identification strategies, self-reporting of device type will be MORE accurate than auto-detection strategies in two samples of job applicants. Methods Sample and Procedure To test the hypothesis that self-report data will lead to fewer errors in identifying actual device type compared to auto-detection methods, we used two samples to cross validate results representing a variety of industries. Sample A included candidates from a large retail organization who completed the hiring assessment in March 2015. Sample B represented candidates from a wide variety of organizations within the manufacturing sector who completed the hiring assessment from December 2014 through August 2015. Most candidates from the two samples had both self-report device data and UA data. Three approaches were applied to detect devices (automatic detection, self-report detection, and manual coding) and placed them into one of the three groups—personal computers (PC), smartphones, and tablets. With the automatic detection approach, devices were categorized based on the operating system and/or browser data automatically sent from the assessment web applications. A categorical distinction was made automatically based on this information. Self-report detection relied on the responses from test takers regarding the question: “What device are you using to complete this assessment?” The choices available to test takers were PC/laptops, smartphones, tablets, and other (please specify). In the third approach, researchers manually coded each test taker’s device by using the full UA string, which includes much more information than browser and operating system. The full string often includes key information such as device model, device type, and browser. Although this information is available, it is often provided in an inconsistent way and different operating systems, devices, and browsers provide different kinds of information back in a UA string. By leveraging all possibly useful features and information from the UA string, the research team was able to identify the specific device (e.g., Samsung Galaxy S6) to accurately categorize the devices. Thus, this strategy was theoretically the most accurate and served as the criterion check in this study. The accuracy of this approach begs the question, “Why doesn’t mobile device research just use this method, if it is the most accurate?” Unfortunately, this approach is not practical in large-scale device research as it is resource exhaustive to sort through all the data and device models in any given sample. Still, manual coding does provide a good comparison group to test the accuracy of auto-detection and self-report strategies to help understand the best strategy for device identification in future research. Results of device categorization from auto detection and self-report were compared against the manual coding approach to determine their accuracy rate. We presented the comparative results from Sample A, Sample B, and a combination of the two samples. Results Descriptives Table 1 summarizes the frequencies of manually coded device type (e.g. PC/laptop, smartphone, or tablet) for each sample. General device usage across the samples shows that PC/laptops are the most frequently used device (81.1%), followed by smartphones (13.1%), and tablets (5.7%). Overall, mobile device users made up approximately 20% of the applicant pool across samples included in this study. Device Identification Differences The auto-detection approach and self-report approach were compared against the manually coded approach. If a device was categorized differently from the manual coding, it was marked as an error. When we looked at Sample A, the retail sample (n = 787), the auto-detection strategy showed a 19.6% error rate, whereas the self-report strategy (n = 800) was much more accurate with only a 0.9% error rate. In Sample B (manufacturing), an 18.4% error rate was found in the auto-detection approach compared to 4.8% using self-report. Sample A and Sample B were combined to get an overall error rate among the various industries. The auto-detection strategy (n = 4,058) had an overall error rate of 18.6%, which is significantly higher when compared to the self-report error rate of 3.7% (n = 3,005), as indicated by a t-test; t (7061) = 18.88, p < 0.001, see Table 2. Post-Hoc Investigation of the Types of Errors in Self-Report Strategies Although self-report strategies are considerably more accurate than currently available methods of auto detection, they are not without errors. Consequently, we investigated further to understand the nature of the errors present in the self-reported data. Cross tabulations were run between self-reported data and manually reviewed data in order to determine the level of agreement in device types. For Sample A (n = 748, χ²=1368.80, p < 0.001), results revealed that 0.8% of the candidates who reported using a smartphone actually used a PC/laptop. Of those who reported using a tablet, 14.3% of the sample actually used a PC/laptop. Last, among those who reported to have used a PC/laptop, 0.2% were found to have in fact used a smartphone. See Table 3 Sample A. In the manufacturing sample (n = 2,138, χ²=3644.85, p < 0.001), 0.8% of candidates who self-reported using a smartphone actually used a PC/laptop and 2.0% used a tablet. Of those who reported to use a tablet, 2.5% of them have used a PC/laptop and 2.5% detected to have used a smartphone through the manual coding. Last, of those who self-reported as PC/laptop users, 0.2% used a smartphone and 1.2% used a tablet. See Table 3 Sample B. Discussion Results from the study supported the hypothesis that self-reporting of device type is considerably more accurate than auto-detection strategies in two applicant samples. In both the retail and manufacturing samples, we found lower error rates for the self-report device detection strategies (0.9% and 4.8%, respectively) compared to auto-detection strategies (19.6% and 18.4%, respectively). When combining both samples, the overall error rate was 3.7% for the self-reported strategy, only one-fifth of the overall error rate observed for the auto-detection strategy (18.6%). Although the self-reported strategy was found to be most accurate, post-hoc investigations reported the largest disagreement was found between those who reported using a tablet with actual PC/laptop use for both samples. This could be explained because newer laptops (e.g., Macbook Air) are so slim, they can be confused with tablets. It is important that researchers are coding the right device types if we want our conclusions to be reliable. In this study, we found that auto-detection strategies, which have been widely used in mobile device research to this point, can lead to errors, specifically when looking further into device categories (i.e., smartphone, tablet, PC, touch screen, etc.) as opposed to just making a distinction between mobile and nonmobile. With the pervasiveness of unproctored Internet testing and virtual research participants, researchers may be unable to verify the actual devices people used and rely upon accurate detection technologies and/or strategies. At this point, self-report strategies may be the most accurate way to detect device for use in future mobile device research. From a practical standpoint, decisions are made by organizations and testing companies with regard to mobile devices. For instance, due to potential issues with measurement equivalence, some organizations and personnel psychologists are recommending that certain assessments and assessment methods not be completed on mobile devices (Leeson, 2006). In fact, SIOP this year has seen intensive discussion about whether or not mobile devices should be allowed on simulations and cognitive ability tests (e.g., Chang et al., 2016; Grossenbacher et al., 2016; Huff, 2016; Lawrence et al., 2016; O’Connell et al., 2016; Tomczak, 2016). Some test providers are even considering using the device information to “correct” individuals’ scores to account for response latencies and other artifacts related to device. For either of these strategies to work effectively and reliably, an accurate device categorization is an essential prerequisite. Future research should continue to look at auto-detection strategies that are more accurate (i.e., those that further detect mobile vs. non-mobile and are able to more specifically identify devices with greater precision). As device identification strategies with more precision become available, mobile device researchers will be able to research more specific features of different devices to understand what features may or may not lead to performance decrements across devices (e.g., screen sizes, touch screen, and peripheral usage). This study is not without limitations. The “accurate” device comparisons group in this study was based on manual coding strategy of 3,692 participants. Although this approach is time consuming and impractical in an applied setting, we believe this strategy leads to the most accurate possible approach to detection. Still, if devices were miscoded or misunderstood, it could impact the results here; although there is no reason to believe such errors have occurred. Another limitation of this study is that it represents a cross section of the accuracy/reliability of the device detection strategies employed by researchers up to this point in time; however, as rapidly as mobile devices change, so do detection strategies. The current auto-detection strategies used in most mobile device research, although somewhat imprecise now, could drastically improve as IT experts devise more accurate strategies. Consequently, although this study calls into question datasets built on the current state auto-detection strategies, future research relying on improved auto-detection strategies could be much more accurate. In conclusion, we are not “throwing away” past research, as all device detection strategies are largely accurate, at this point; research questions pursued so far have been broad enough that precision in device detection has not yet been an issue that would lead to misleading decisions. As we refine our research questions where mobile device usage in UIT settings is concerned, researchers are beginning to investigate finer grained differences than simply looking at mobile versus nonmobile differences (e.g. separating phones from tablet, looking at the impact of screen size and touch screen, etc.). As the research programs on mobile device testing progress, getting device detection right becomes critically important. For now, it would appear that the most accurate way to understand the nature of the device a candidate uses to take an assessment is to simply ask rather than relying on using automated technology. Note 1 Although mobile device testing is one important subtopic of unproctored Internet testing, this paper is mainly focused on the device itself and method of detecting devices. A comprehensive discussion of unproctored Internet testing is beyond the scope of this paper. References Arthur, W. J., Doverspike, D., Muñoz,G. J., Taylor, J. E., Carr, A. E. (2014). The use of mobile devices in high-stakes remotely delivered assessments and testing. International Journal of Selection and Assessment, 22(2), 113–123. Below, S. (2014, Dec.). New year, new workplace! SIOP announces top 10 workplace trends for 2015. Retrieved fromhttps://www.siop.org/article_view.aspx?article=1343 Chang, L., Lawrence, A. D., O’Connell, M. S., & Kinney, T. B. (2016, April). Mobile Versus PC delivered simulations: Screen size matters.In T. D. McGlochlin (Chair), Mobile equivalence: expanding research across assessment methods, levels and devices. Paper presented at the 31st Annual Conference of the Society for Industrial and Organizational Psychology, Anaheim, CA. Doverspike, D., Arthur, Jr., W., Taylor, J., & Carr, A. (2012, April). Mobile mania: The impact of device type on remotely delivered assessments. In J. Scott (Chair), Chasing the tortoise: Zeno’s paradox in technology-based assessment.Paper presented at the 27th Annual Conference of The Society for Industrial and Organizational Psychology, San Diego, CA. Dudley, N., Carpenter, J., Ferrell, J., Boyer, A. L., & Reeves, M. (2016, April). Examining equivalence, validity, and reactions in three mobile-optimized simulations. In J. Ferrell & M. Hudy (Co-chairs), Going mobile: empirical evidence from higher-fidelity mobile simulations. Paper presented at the 31st Annual Conference of the Society for Industrial and Organizational Psychology, Anaheim, CA. Grossenbacher, M., Brown M., & Nguyen, D. (2016, April). Assessing the equivalence of mobile-based GMA testing.Poster presented at the 31st Annual Conference of the Society for Industrial and Organizational Psychology, Anaheim, CA. Huff, K. (2016, April). Do mobile devices have an impact on working memory? Poster presented at the 31st Annual Conference of the Society for Industrial and Organizational Psychology, Anaheim, CA. Illingworth, A.J., Morelli, N., Scott, J.C., & Boyd, S. (2015). Internet-based, unproctored assessments on mobile and nonmobile devices: Usage, measurement equivalence, and outcomes. Journal of Business Psychology, 30, 325–343. Illingworth, A.J., Morelli, N., Scott, J.C., Moon, S., & Boyd, S. (2013, April). Equivalency of non-cognitive assessments on mobile and nonmobile devices: The influence of device browsers and operating systems. In N. Morelli (Chair), Mobile devices in talent assessment: Where are we now? Paper presented at the 28th Annual Conference of the Society for Industrial and Organizational Psychology, Houston, TX. Impelman, K. (2013, April). Mobile assessment: Exploring candidate differences and implications for selection. In N. Morelli (Chair), Mobile devices in talent assessment: Where are we now? Paper presented at the 28th Annual Conference of the Society for Industrial and Organizational Psychology, Houston, TX. Kinney, T.B., Lawrence, A.D., & Chang, L. (2014, May). Understanding the mobile candidate experience: Reactions across device and industry. In T. Kantrowitz & C. Reddock (Co-chairs), Shaping the future of mobile assessment: Research and practice update. Symposium presented at 29th the Annual Conference of the Society for Industrial and Organizational Psychology, Honolulu, HI. LaPort, K., Huynh, C. T., Moretti, D. M., Stemer, A., & Ryer, J. A. (2016). Mobile assessment: comparing traditional cognitive, cognitive-reasoning, and non-cognitive performance. In T. D. McGlochlin (Chair), Mobile equivalence: expanding research across assessment methods, levels and devices. Paper presented at the 31st Annual Conference of the Society for Industrial and Organizational Psychology, Anaheim, CA. Lawrence, A. D., Chang, L., O’Connell, M. S., & Kinney, T. B. (2016). Mobile simulations: can you control for device? In T. D. McGlochlin (Chair), Mobile equivalence: expanding research across assessment methods, levels and devices.Paper presented at the 31st Annual Conference of the Society for Industrial and Organizational Psychology, Anaheim, CA. Lawrence, A. D., Wasko, L., Delgado, K., Kinney, T., & Wolf, D. (2013). Does mobile assessment administration impact psychological measurement? In N. Morelli (Chair), Mobile devices in talent assessment: Where are we now? Paper presented at the 28th annual meeting of the Society for Industrial and Organizational Psychology, Houston, TX. Leeson, H. V. (2006). The mode effect: A literature review of human and technological issues in computerized testing. International Journal of Testing, 6(1), 1–24. doi:10.1207/s15327574ijt0601_1 Morelli, N. A., Illingworth, A. J., Scott, J. C., & Lance, C. E. (2012). Are Internet-based, unproctored assessments on mobile and nonmobile devices equivalent? In J. C. Scott (Chair), Chasing the tortoise: Zeno’s paradox in technology based assessment. Symposium presented at the 27th Annual Conference of the Society for Industrial and Organizational Psychology, San Diego, CA. Morelli, N.A.,Mahan, R.P., & Illingworth, A.J. (2014). Establishing the measurement equivalence of online selection assessments delivered on mobile versus nonmobile devices. International Journal of Selection and Assessment, 22(2), 124–138. O’Connell, M. S., Chang, L., Lawrence, A. D., Kinney, T. B. (2016, April). PC–mobile equivalence of four interactive simulations: a within-subject design. In J. Ferrell & M. Hudy (Co-chairs), Going mobile: Empirical evidence from higher-fidelity mobile simulations. Paper presented at the 31st Annual Conference of the Society for Industrial and Organizational Psychology, Anaheim, CA. Pew Research Center. (2015). Pew Research Center Internet project survey. Retrieved fromhttp://www.pewInternet.org/fact-sheets/mobile-technology-fact-sheet/ Rossini, J. (2016, April). Mobile device testing: A five-year look across job level. In T. D. McGlochlin (Chair), Mobile equivalence: expanding research across assessment methods, levels and devices. Paper presented at the 31st Annual Conference of the Society for Industrial and Organizational Psychology, Anaheim, CA. Smith, A. (2015, April). U.S. smartphone use in 2015. Retrieved from http://www.pewInternet.org/2015/04/01/us-smartphone-use-in-2015/ Tomczak, K. (2016, April). Vertical analysis of mobile-optimized simulations in Fortune 100 organizations.In J. Ferrell & M. Hudy (Co-chairs), Going mobile: empirical evidence from higher-fidelity mobile simulations. Paper presented at the 31st Annual Conference of the Society for Industrial and Organizational Psychology, Anaheim, CA. Print 1647 Rate this article: No rating Comments are only visible to subscribers.