mainheader

Jenny Baker
/ Categories: 582

The Bridge: Connecting Science and Practice

Kimberly Adams, LeadPath Solutions, LLC; Stephanie Zajac, UT MD Anderson Cancer Center; and Tara Myers, American Nurses Credentialing Center

 

“The Bridge: Connecting Science and Practice” is a TIP column that seeks to help facilitate additional learning and knowledge transfer to encourage sound, evidence-based practice. It can provide academics with an opportunity to discuss the potential and/or realized practical implications of their research as well as learn about cutting-edge practice issues or questions that could inform new research programs or studies. For practitioners, it provides opportunities to learn about the latest research findings that could prompt new techniques, solutions, or services that would benefit the external client community. It also provides practitioners with an opportunity to highlight key practice issues, challenges, trends, and so forth that may benefit from additional research. In this issue, David Futrell and colleagues provide an overview of the development, validation, and implementation of a prehire assessment that won the team the 2019–2020 Human Resources Management (HRM) Impact Award.

 

High-Velocity Selection: Predicting Performance and Retention at Walmart

David Futrell and Josh Allen

Walmart is honored to be recognized as one of the winners of the 2019–2020 Human Resource Management (HRM) Impact Award given by the Society for Industrial and Organizational Psychology (SIOP) and SHRM (Society for Human Resource Management), along with their foundations. Walmart was recognized for its work on developing, validating, and implementing an assessment for hourly associates working in Walmart stores and Sam’s Clubs called the Retail Associate Assessment (RAA).

Beginning in 1962 with a single discount store in Rogers, Arkansas, Walmart has opened thousands of stores in the US and internationally. Through continuous, customer-focused innovation, we have also created a seamless experience that allows our customers to shop online anywhere, anytime. Every Day Low Price (EDLP) is the cornerstone of our strategy, and our price focus has never been stronger. We currently operate over 11,300 stores under 58 banners in 27 countries with e-commerce websites in 10 countries. We employ approximately 2.2 million associates around the world—1.4 million in the US alone.

A job at Walmart is an opportunity to build a career. About 79% of our store management team members were hired as hourly associates. Last year, we promoted more than 215,000 of these associates to jobs with more responsibility and higher pay.

The Challenge

The selection system used to hire entry-level associates at Walmart is likely the highest volume system in the world. In a typical year, we receive millions of applicants, hiring hundreds of thousands of new associates. We receive more applications (10k+ is not unusual) in a single day than many companies process in an entire year. Each of our 7,000+ Walmart stores and Sam’s Clubs is required to use our selection tools and processes to screen and select their new associates. The selection and assessment team collaborates with HR operations associates in each business unit to ensure that the hiring teams in the stores and clubs understand the system and manage their selection processes in a consistent manner. Our primary challenge is meeting the often competing needs and desires of a variety of stakeholders while ensuring that the assessment and selection processes are being properly managed across thousands of stores.

Because of the high applicant volume, it’s essential that we provide an efficient method for identifying the best candidates with minimal time and energy expended by both the applicants and hiring teams in the stores. To accomplish this, we rely on pre-employment assessments. This assessment process, however, must balance many competing requirements and demands. The assessment must be

  • Predictive of both job performance and turnover
  • As brief as possible
  • Optimized for mobile devices
  • Candidate friendly
  • Legally defensible
  • Easily adapted to changing business requirements
  • Effective at predicting outcomes across a broad range of jobs
  • Easy to use for both applicants and the hiring teams in the stores

Two of our biggest challenges are the changing nature of the job and the frequent need to demonstrate impact. Validation research conducted 12 months ago may not be recent enough to convince some key stakeholders (e.g., business and HR leaders) of the value of assessments. Jobs sometimes change substantially between the time assessments are developed and when they’re actually implemented. Both the jobs and the work context (e.g., working hierarchically vs. working in teams) evolves continuously. This requires us to predict a moving target while still meeting the requirements described above. Unfortunately, maintaining a relevant assessment is not simply a matter of adding new content. Each second of the applicant’s time must be considered, justified, and defended. With our applicant volume, the addition of a single 5-second biodata item adds up to almost four FTE’s worth of time spent by applicants over the course of a year. Some recent additions of content required over 20 years of applicant testing time for us to gather the necessary validation data. 

Our Solution

To meet these requirements and constraints, we sought a prehire assessment that would be developed based on a rigorous job analysis and supported by a large concurrent validation study. Walmart selected Modern Hire, the creators of the Virtual Job Tryout® (VJT) pre-employment simulation, to develop and validate a multimethod, pre-employment assessment. Assessments using multimethod approaches, such as those integrating measures of biodata, situational judgment, work samples, and work style, are among the most valid tools for predicting workplace performance (Bettencourt et al., 2001; Hough & Ones, 2002; Hunter & Hunter, 1984; Mael, 1991; Salgado et al., 2002; Schmidt & Hunter, 1998; Weekley & Ployhart, 2005).

As a team of industrial-organizational psychologists, our top priority in creating the Retail Associate Assessment (RAA) was that it be firmly grounded in science. Our team worked with the Modern Hire team throughout the design and validation process and made every effort to follow the principles of rigorous and objective test validation established by the Uniform Guidelines on Employee Selection Procedures (1978) and the Principles for the Validation and Use of Personnel Selection Procedures (2018). Our team also actively works to contribute new findings to the field (e.g., Futrell, 2018) and continuously looks for new research that may have implications for the RAA (e.g., Sajjadiana et al., 2019).

A concurrent validation study was conducted to determine the validity and fairness of the assessment. This study used a representative sample of over 1,000 incumbents that varied by age, gender, ethnicity, business (Walmart vs. Sam’s Club), job focus (customer service vs. productivity), job group, and location (Uniform Guidelines, 1978). Subsequent enhancements to the tool have been made and validated using data from actual job applicants.

Measuring performance for the retail associate proved challenging because consistent, formal performance appraisal data were not readily available. In addition to the problems inherent with the ratings in most performance management systems (DeNisi & Murphy, 2017), these types of retail jobs often have high turnover. Using typical annual performance ratings would limit the validation study to only incumbents who were retained for at least 1 year. This could have introduced a substantial amount of bias. Instead, we created a custom performance evaluation form. The content for this evaluation form was derived from Walmart’s universal competency model and the findings from a comprehensive job analysis. Supervisors evaluated the performance of associates across 28 items covering seven competencies. In addition to the competency measures, we collected single-item ratings on eight additional dimensions, including attendance, cleanliness and safety, identifying customers’ needs, and making appropriate sales recommendations. These performance dimensions were found to be critical behaviors during the job analysis and served to broaden the performance evaluation domain. The validation study results revealed strong uncorrected validities across a variety of performance dimensions.

Measured Outcomes and Feedback

One of the largest business problems facing retail organizations today is turnover. A primary requirement from our stakeholders is that the assessment predict and ultimately improve employee retention. Accordingly, our current scoring algorithm is evenly balanced between performance and retention prediction.

Development and validation of this retention predictor was conducted separately from the initial validation study, which focused on predicting job performance. The retention predictor study was conducted by administering a set of items (primarily biodata) to applicants. These items were not scored in the selection algorithm during the study. The study utilized 200,000 applicants who were subsequently hired. After several months, we had adequate data to conduct an empirical keying study for these items. Scoring keys were created and cross-validated, and the resulting algorithm showed a large difference in retention between the top and bottom assessment bands against a hold out sample. A conservative return-on-investment analysis showed savings in the hundreds of millions annually. This estimate only includes eliminating replacement costs from not hiring the lowest scoring candidates. If we included estimates of lost productivity and training costs, the real impact might exceed $1 billion annually. To achieve such strong prediction of turnover, we have gone beyond the scholarly literature to identify constructs such as novel methods for identifying applicant faking (Futrell, 2018), which is a strong predictor of subsequent turnover. 

At Walmart, similar to many other organizations, our applicants are also our customers. Accordingly, our selection experience must be positive, whether the candidate is offered a position or not. To measure that experience, the RAA tracks candidate feedback using research-only items administered following assessment completion. Feedback from millions of candidates reveals that because of the assessment, they

  • have a better understanding of the role,
  • would recommend applying to others, and
  • did not experience significant technology issues.

Agreement with these items was very high, ranging 98–100%.

Completion rates from the assessment are quite strong, averaging over 95%, which is extraordinarily high for an entry-level assessment (Hardy et al., 2017). These data suggest the application process is a positive experience for candidates and provides them with greater understanding of the role.

We believe the best way to track internal feedback is to examine the extent to which the end users are actually using the results to make hiring decisions. If the hiring managers did not believe the RAA added value, we would expect to see roughly random hiring across the RAA bands. Our analysis of the hiring patterns clearly show that significantly more hires come from the top band than from the lower bands. Our surveys of hiring managers support this finding; they indicate that the assessment score is one of the most important factors when deciding who to interview or ultimately hire.

Conclusion

The assessment process was designed from the outset to minimize or eliminate any group differences. We continuously track the results for any evidence of adverse impact. With sample sizes in the hundreds of thousands, these results are stable and provide strong evidence that the assessment is fair for all applicant groups with little risk of any substantial adverse impact.

In many ways, launching the RAA was just the beginning of the work. Since implementation, we have conducted multiple predictive validations and other analytic work. Since the launch of the RAA, we have

  • Built an enhanced retention predictor.
  • Reweighted the combined performance retention predictor.
  • Added in teamwork content to reflect the changing nature of the job.
  • Shortened the assessment; average completion time is now under 15 minutes.
  • Renormed the assessment.

Walmart is a very data-driven environment with strong competition for internal resources. This selection system has been held to the highest standards and required to show impact and results across many stakeholders and time periods. Implementing what is almost certainly the highest volume selection system in the world has been a massive undertaking, requiring years of planning, persistence, and effort.

Given the high-stakes nature of this endeavor, we are proud of the results and the enormous financial impact we have achieved. With so many stakeholders and their competing priorities, it has been a balancing act to achieve the diverse goals of the RAA.

Of course, more remains to be done. As with most research, every answer leads to more questions and ideas. We plan to continue developing new items, algorithms, and methodology (including machine-learning techniques) to continually improve prediction while maintaining fairness and minimizing applicant time.

References

Bettencourt, L. A., Gwinner, K. P., & Meuter, M. L. (2001). A comparison of attitude, personality, and knowledge predictors of service-oriented organizational citizenship behaviors. Journal of Applied Psychology, 86(1), 29–41.

DeNisi, A. S., & Murphy, K. R. (2017). Performance appraisal and performance management: 100 years of progress? Journal of Applied Psychology, 102(3), 421–433.

Equal Employment Opportunity Commission, & Civil Service Commission. (1978). Department of Labor, & Department of Justice. (1978). Uniform guidelines on employee selection procedures. Federal Register, 43(166), 38290-38315.

Futrell, D. (2018). Big fakers and bigger data: Emerging insights from practice. Paper presented at the 33rd Annual Conference of the Society for Industrial and Organizational Psychology, Chicago, IL.

Hardy, J. H., Gibson, C., Sloan, M., & Carr, A. (2017). Are applicants more likely to quit longer assessments? Examining the effect of assessment length on applicant attrition behavior. Journal of Applied Psychology, 102(7), 1148-1158. doi: 10.1037/apl0000213

Hough, L. M., & Ones, D. S. (2002). The structure, measurement, validity, and use of personality variables in industrial, work, and organizational psychology. In N. Anderson, D. S. Ones, H. K. Sinangil, & C. Viswesvaran (Eds.), Handbook of industrial, work and organizational psychology, Vol. 1. Personnel psychology (pp. 233–277). Sage Publications Ltd.

Hunter, J. E., & Hunter, R. F. (1984). Validity and utility of alternate predictors of job performance. Psychological Bulletin, 96, 72–98.

Mael, F. A. (1991). A conceptual rationale for the domain and attributes of biodata items. Personnel Psychology, 44(4), 763–792.

Sajjadiani, S., Sojourner, A. J., Kammeyer-Mueller, J. D., & Mykerezi, E. (2019). Using machine learning to translate applicant work history into predictors of performance and turnover. Journal of Applied Psychology, 104(10), 1207–1225.

Salgado, J. F., Viswesvaran, C., & Ones, D. S. (2002). Predictors used for personnel selection: An overview of constructs, methods and techniques. In N. Anderson, D. S. Ones, H. K. Sinangil, & C. Viswesvaran (Eds.), Handbook of industrial, work and organizational psychology, Vol. 1. Personnel psychology (pp. 165–199). Sage Publications Ltd.

Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology: Practical and theoretical implications of 85 years of research findings. Psychological Bulletin, 124(2), 262–274.

Society for Industrial and Organizational Psychology. (2018). Principles for the validation and use of personnel selection procedures (5th ed.). Bowling Green, OH: Author.

Weekley, J. A., & Ployhart, R. E. (2005). Situational judgment: Antecedents and relationships with performance. Human Performance, 18(1), 81–104.

Print
670 Rate this article:
4.5
Comments are only visible to subscribers.

Theme picker

Categories