Featured Articles
Jenny Baker
/ Categories: 2024, 613

Navigating the Open Seas of AI-Based Hiring Technologies: An Open Fishbowl Discussion

Felix Y. Wu, Karyssa A. Courey, and Frederick L. Oswald, Rice University; S. Morton McPhail, Independent Researcher; & Nancy T. Tippins, The Nancy T. Tippins Group, LLC

Introduction

Like other applications of artificial intelligence (AI) technologies, the use of AI-based assessments for employment purposes is increasing rapidly. These assessments may incorporate a variety of applications of AI such as (a) complex algorithms that combine data from traditional sources (e.g., Likert-scale personality tests) and/or nontraditional ones (e.g., social media data), (b) the analysis and interpretation of content from employment interviews through natural language processing (NLP), and (c) the evaluation of job candidate characteristics that may have questionable job relevance, such as facial features and vocal features. In addition, the explosion of interest in and use of ChatGPT and other large language model (LLM) applications can usefully aid in employment test development (e.g., preliminary item content generation) but may also facilitate applicant cheating on employment tests.  

Both the Equal Employment Opportunity Commission (EEOC) and the Office of Federal Contract Compliance Programs (OFCCP) have made it clear that, as with traditional employment tests, AI-based assessments must also comply with Title VII of the Civil Rights Act of 1964 (amended in 1991) and the Uniform Guidelines on Employee Selection Procedures (UGESP, 1978; OFCCP, 2019, statement; EEOC, 2023). The White House and federal agencies, including the Department of Commerce, have also voiced their belief that AI-based assessments should take into account fairness, equity, and privacy (National Institute of Standards and Technology, 2016; White House Office of Science and Technology Policy, 2022).

State and local governments have begun enacting legislation regulating the use of AI-based assessments for employment purposes. For example, the state of Illinois enacted the Artificial Intelligence Video Interview Act (2020, amended in 2022) that requires informing candidates of the use of AI to analyze interviews, obtaining their consent, providing alternative selection procedures, and destroying recordings according to a strict time schedule (which may be in conflict with other data retention requirements such as those issued by the OFCCP and reporting race/ethnicity when the video interview is used to determine who gets an in-person interview). New York City has recently passed Local Law 144 regulating the use of automated employment decision tools (AEDTs) requiring employers to conduct and post an annual bias audit of algorithmic hiring tools that “substantially [assist] or [replace]” an employer’s discretion when hiring (Automated Employment Decision Tools, 2021). 

The Society for Industrial and Organizational Psychology (SIOP) recently created a set of recommendations specifically for the use of AI-based employment assessments, reinforcing applicability of the Principles for the Validation and Use of Personnel Selection Procedures (2018) to all employment tests, including AI-based assessments (SIOP, 2023). Other professional groups outside of SIOP and industrial-organizational (I-O) psychology have also weighed in on the use of AI-based assessments. For example, the Institute for Workplace Equality assembled a Technical Advisory Committee composed of I-O psychologists, attorneys, and human resource leaders who produced a report, EEO and DEI&A Considerations in the Use of Artificial Intelligence in Employment Decision Making (The Institute for Workplace Equality, 2022). Additionally, many organizations and individuals have discussed the issues surrounding the use of AI for employment and standards that have been created or proposed (Sonderling & Kelley, 2023).

With increasing use of AI-based assessments, coupled with heightened legal and ethical concerns, we have been working on a research project collecting and synthesizing opinions from a wide range of employment testing and machine learning (ML) experts in I-O psychology, both practitioners and academics. At the 2023 SIOP Annual Conference in Boston, we moderated an alternative open fishbowl session in which we first presented a summary of some of this survey work. Then, in the session, we gathered open-ended comments from the audience to supplement our ongoing research. Both the survey summary and open-ended comments covered a wide range of AI-based assessment topics:

  1. The role of theory
  2. The necessity of job analysis
  3. Applicant reactions and experiences
  4. Reliability and validity
  5. Performance metrics for machine learning (ML) algorithms
  6. Assessing adverse impact
  7. Ethical issues

The 30–40 people in attendance were split into smaller groups. Each group randomly received two of the seven topics above for discussion, with each topic being addressed by at least one group. Each topic was introduced with some descriptive information, followed by several specific questions as prompts. Participants were given 15 minutes for each topic assigned to their group for a total of 30 minutes. Each group documented its responses to the questions using an online platform (Padlet.com), which were then displayed on a large screen in front of the room and used for subsequent moderator-facilitated discussions with the entire audience. Keeping in mind that there was not sufficient time to cover all participants’ comments, the following summaries contain the major ideas generated in these small and large group discussions.

Content of Discussion

Small Group Discussion

To what extent are theoretical justifications necessary in employment testing (e.g., deciding among relevant predictor constructs, developing tests)?

The small group participants generally took the position that theory should support employment testing but that it is not always required. Participants also acknowledged that new or refined theory can be derived from data.

How might a job analysis that supports an AI-based assessment differ from any other assessment (e.g., in terms of sampling requirements, unique KSAOs that can be assessed)?

Three primary issues were raised by the participants. First, job analysis is important, if not essential, in maintaining the legal defensibility of assessments. Second, without a job analysis, we may miss measuring important knowledge, skills, abilities, and other attributes (KSAOs) or may have difficulty explaining and interpreting AI results. Third, participants mentioned an important need for specificity of job analysis (e.g., comprehensive set of KSAOs tailored to the job, direct ratings of tasks and KSAOs) used for AI-based assessments. The need might be greater for AI-based assessments than for traditional assessments due to the inability to calculate traditional psychometric measures (e.g., scale reliability, dimensionality).

What are general applicant perceptions regarding the fairness of AI-based selection procedures? How might applicant reactions to AI-based assessments change over time as they become more commonplace?

Some group responses described negative applicant reactions as functions of lack of transparency and human interactions, both contributing to negative perceptions of procedural or interactional justice. Another group suggested that applicant reactions may be idiosyncratic, depending on the applicant’s specific preferences and prior experience with a given AI-based assessment.

How should reliability be appropriately assessed and reported for relatively complex AI-based assessments? Specifically, what approaches to reliability might be appropriate for AI-based assessments (e.g., types or extensions of internal consistency, test–retest, and alternate forms measures of reliability)?

Participants voiced the general need for assessing reliability in AI-based assessments. Although there was no consensus and many challenges in how to do so, several suggestions were made: for example, assessing internal consistency; using approaches similar to reliability assessment in computer adaptive testing; and assessing measurement consistency over time (e.g., test–retest reliability). Moreover, participants differentiated between the types of assessment data influencing the possible ways to measure reliability. It was unclear how many of those suggestions could be implemented in practice or in some cases how these suggestions differed from traditional reliability assessments.

Can machine learning (ML) performance metrics (e.g., mean-squared error, area under the curve) be interpreted as demonstrating validity, and if so, how might they be (and not be) usefully compared to correlational validities? Related to this, what defines acceptable levels (and perhaps types) of prediction or model accuracy when ML algorithms are used?

The small-group responses did not directly address the technical aspects of these questions. Nonetheless, their responses raised several interesting issues specific to AI-based assessments, such as problems in using “stealth data” that are presumably collected without the knowledge of the applicant that might include potentially irrelevant data leading to biases (e.g., word choice, facial features, or internet data) and other random noise. Concurrent validation was suggested as a practical method for validating ML-derived predictions, but problems of the nature and quality of criteria were noted, along with questions about how to deal with unreliable data and restriction of range. Further issues were discussed, such as the difficulty of generalizing to other contexts with complex ML models and lack of explainability, even if ML-based criterion-related validity were somehow supported.

How should various forms of bias (predictive bias, algorithmic bias) be assessed with respect to ML algorithms and AI implementations?

Participants suggested using AI to monitor the AI program for bias (e.g., adverse impact, item analysis auditing). Other potential applications include testing large samples, using cognitive assessments, and constantly reevaluating testing programs. Note that many responses to these questions lacked sufficient detail or did not directly address the critical problem of bias.

Is it fair for employers to seek out and use internet data that are outside the control of an applicant, accessed without the applicant’s knowledge, or are out of date?

Participants reached the highest level of consensus on this question, converging on the opinion that using internet data without applicants’ knowledge and consent is problematic. The most commonly mentioned data were those scraped from social media, but they could also include less obvious data such as in-game elements like latency, mouse clicks, and so on, and individual responses to interview questions. The need to be transparent about what data are collected and how they are used was heavily emphasized. One group averred that using such data is likely more ethical if the applicant knows what information is being examined and consents to its use. Participants also expressed concern about what will happen to these data after the selection process ends, echoing a growing problem voiced by the general public, the media, and regulators and legislators (e.g., Illinois law). Generally, a major concern is repurposing data without informed consent.

Large Group Discussion

Following 30 minutes in small groups, we proceeded to a large-group discussion involving the audience, in which we summarized the Padlet responses to the questions and invited additional comments. Some of the topics more extensively discussed by the audiences include age differences in how applicants might perceive the use of AI in personnel selection, integration of AI into the workplace (e.g., for worker education), and automation of the assessment of adverse impact. It was pointed out that applicant reactions are often related to whether the applicant was hired or not. Therefore, reactions may fluctuate not only across types of tests or selection systems involving that test but also selection ratios that lead to more versus less favorable hiring outcomes.

Conclusion

The area of strongest agreement and most concern involved use of “stealth” data that are not explicitly provided by or with the knowledge of applicants. The underlying issue involved the need for AI-based selection systems to be sufficiently transparent for applicants to be adequately informed so that they can provide informed consent and, if needed, request accommodations or alternative selection procedures.

Discussions of assessing reliability of AI-based assessments were diverse, but participants identified several important issues. Of particular interest were two types of differentiation: (a) appropriate reliability evaluation based on static data (data that do not change over time, e.g., college degree obtained) and dynamic data (data that do change over time, e.g., credit scores; Wang et al., 2016), and (b) formative or summative data (combining multiple data points; Edwards & Bagozzi, 2000) versus single-variable measures (a measure based on one data point). Participants readily acknowledged the difficulties introduced by inclusion of a wide array of data types and methods of collection in evaluating ML results. Concerns about the extent and appropriateness of generalizing ML results to new situations were also apparent.

Despite broad agreement on some issues, there remained considerable diversity of thought on other topics. For example, the small-group discussions about the role of theory in AI-based selection seemed to suggest a limited concern for the need for theory as a basis for developing such assessment on the part of some, although respondents recognized the value of using theory to explain ML findings, especially in a legal defense. The small groups also expressed the importance of theory not limiting the development and application of innovative methods and tools that might not otherwise be available to researchers (an idea that was also mentioned in the large-group discussion).

The small-group response to the question about job analysis identified such information as valuable in AI assessment development, primarily in terms of defensibility and identification of job-relevant KSAOs. The discussion group tended to reflect a less central role (e.g., “concerns about legal defensibility”), although the responses also included a recommendation that job analysis in support of AI-based assessments be more “specific” than for traditional assessments that “might be better psychometrically validated.” These differences, though subtle, are consistent with the breadth of opinion that seems to be developing within I-O about the methods to be used for, need for, and value of job analysis in the AI-based selection context.

In general, the discussions during our session highlighted the continuing concerns about privacy and the use of AI in employee selection. Many I-O psychologists have important yet divergent opinions about how this area of our field might proceed. Participants expressed their worry for protection of the individual’s rights and acknowledge their own uncertainty about some of the technical issues around AI and ML; others expressed a desire for clarity in the ways in which AI-based tools are developed and deployed; and still others, even those who possessed personal knowledge of the topics, conveyed unease about how AI/ML developments are speeding far ahead of regulation addressing privacy and transparency. At the same time, there was a general sense of the inevitability of continued AI/ML developments in the employment arena and, therefore, the need for I-O psychology to be deeply involved. We fully expect I-O psychologists to be involved in discussions and development of AI/ML employment tools in future years. Of utmost importance is that I-O psychology continues to work with other professions relevant to AI/ML and employment (e.g., lawyers, computer scientists, policymakers) to help ensure scientific integrity, effectiveness, and protection of human dignity and worth in the workplace. AI and ML are supposed to improve human resources, after all.

References

Artificial Intelligence Video Interview Act, Pub. L. No. 101–260 (2020). https://www.ilga.gov/legislation/ilcs/ilcs3.asp?ActID=4015&ChapterID=68

Automated Employment Decision Tools, Pub. L. No. Local Law 144 (2021). https://rules.cityofnewyork.us/rule/automated-employment-decision-tools-updated/

Edwards, J. R., & Bagozzi, R. P. (2000). On the nature and direction of relationships between constructs and measures. Psychological Methods, 5(2), 155–174. https://doi.org/10.1037/1082-989X.5.2.155

Equal Employment Opportunity Commission. (2023 May 18). EEOC releases new resource on artificial intelligence and Title VII. https://www.eeoc.gov/newsroom/eeoc-releases-new-resource-artificial-intelligence-and-title-vii

National Institute of Standards and Technology. (2016). About the RMF -NIST Risk Management Framework | CSRC | CSRC. National Institute of Standards and Technology. https://csrc.nist.gov/projects/risk-management/about-rmf

Office of Federal Contractor Compliance Programs. (2019, July 23). Validation of employee selection procedures. https://www.dol.gov/agencies/ofccp/faqs/employee-selection-procedures#Q6

Society for Industrial and Organizational Psychology. (2018). Principles for the validation and use of personnel selection procedures (Fifth Ed.). Cambridge University Press. https://www.apa.org/ed/accreditation/personnel-selection-procedures.pdf

Society for Industrial and Organizational Psychology. (January 2023). Considerations and recommendations for the validation and use of AI-based assessments for employee selection. https://www.siop.org/Portals/84/SIOP%20Considerations%20and%20Recommendations%20for%20the%20Validation%20and%20Use%20of%20AI-Based%20Assessments%20for%20Employee%20Selection%20010323.pdf?ver=5w576kFXzxLZNDMoJqdIMw%3d%3d

Sonderling, K. E., & Kelley, B. J. (2023). Filling the void: Artificial intelligence and private initiatives. North Carolina Journal of Law & Technology, 24(4), 153–200.

The Institute for Workplace Equality. (2022). Artificial Intelligence Technical Advisory Committee report: EEO and DEI&A considerations in the use of artificial intelligence in employment decision making.  https://www.theinstitute4workplaceequality.org/download-ai-tac-files

Uniform Employee Selection Guidelines on Employee Selection Procedures, 43 FR 38295 § 60-3 (1978). https://www.uniformguidelines.com/uniform-guidelines.html

Wang, M., Zhou, L., & Zhang, Z. (2016). Dynamic modeling. Annual Review of Organizational Psychology and Organizational Behavior, 3(1), 241–266. https://doi.org/10.1146/annurev-orgpsych-041015-062553

White House Office of Science and Technology Policy. (2022). Blueprint for an AI Bill of Rights: Making automated systems work for the American people. https://www.whitehouse.gov/ostp/ai-bill-of-rights

Print
1264 Rate this article:
3.0
Comments are only visible to subscribers.

Categories

Information on this website, including articles, white papers, and other resources, is provided by SIOP staff and members. We do not include third-party content on our website or in our publications, except in rare exceptions such as paid partnerships.