On the Legal Front: Parsing Big Data Risk
Rich Tonowski, U.S. Equal Employment Opportunity Commission
The opinions expressed in this article are those of the author and not necessarily those of any government agency. The article should not be construed as legal advice.
Rewards and Risks
Big Data is often characterized by “3Vs:” volume (very, very big), velocity (generated fast and furiously), and variety (multiple sources, and types of data, both structured and unstructured). The Big Data phenomenon is the confluence of the data, sophisticated analytic methodology, and readily available computing power. Big Data is making its splash with I-O psychology. Given the coverage provided in the 2018 SIOP annual conference, there could have been a theme track, like “Machine Learning for the Masses.” This is good news because the some of the more esoteric aspects of dealing with Big Data are becoming accessible thanks to the pioneers among us who have summarized and disseminated what they have learned. Part of this education has dealt with analytic methods applicable to “data big and small” (Putka & Landers, 2018), a call to I-O psychology that we now have more tools available, the better to answer questions of interest, and not only for data by the deluge.
The bad news is that the legal concerns over abuses of Big Data and analytic methods generally have not gone away. Insofar as these concerns are real issues, they should not simply go away until they have been addressed.
Risks: A First Cut
One simple division of the legal issues is between the data itself and the algorithms used to analyze data. Data issues involve collection, storage, and access: responsibility for security; who owns those bytes of data we continuously generate about ourselves; who has the right to harvest and sell them; what information we can choose to divulge for a specific purpose, but not for others. Besides privacy, there is the matter of accuracy. If the data are untimely, not attributed to the correct person at the individual level, not representative of the population of interest at the aggregate level, or simply full of reporting errors and miscodes, use of the data can have negative consequences, possibly severe, for the users and the objects of their use. Then there is the way the data are analyzed.1 In some of the popular media, “algorithm” applied to personnel matters has become the boogeyman, an all-knowing yet mindless entity that that integrates information (or misinformation) with cold mathematical calculation, shape-shifting as it learns to use the data better, with complexity beyond the ken of mortals, to decide who gets jobs. A specific concern is equal employment opportunity (EEO) law: how the algorithm might be tapping into protected employee and applicant characteristics (e.g., race, sex, age, disability) in those decisions, or perpetuating the biases (unconscious or intentional) of its developers.
Table 12 focuses on the dimensions of legal risk. It joins summaries touching on other aspects of Big Data risk. Privacy protection strategies were covered by Guzzo, Fink, King, Tonidandel, and Landis (2015).3 Handler, Bergman, and Taylor (2018) provided a table of suppliers and products, and a chart summarizing technologies, uses, and risks. Whelan and DuVernet’s (2015) table summarizes threats to validity from statistical and construct considerations.
A Five-Factor Model of Big Data Risk
A process whose workings are apparent is less risky than one that is obscure.
- Measurement proximity: Directly measuring the variables is less risky than dealing with data mediated by other processes and thus subject to interference.
- Transparency of the analysis: Being obvious with what is being done is less risky than running a black box (unless the process is bogus and needs concealment).
- Unambiguity of the predictor: It is less risky when measure’s meaning is universally recognized, rather than vaguely defined. (Cf. discussion of visual or verbal mannerisms that may reflect cultural attributes more than individual difference characteristics.)
- Observability of the criterion: Measurable outcome is less risky than inferring what happened. Simply differentiating applicants without evidence that the differentiation has organizational benefits (PreVisor, 2010) is risky.
Exploratory studies are less risky than implementations that drive operational decisions. Reduce risk by doing studies in advance of final implementation. Consider firewalling the development, maintenance, and internal workings of the system from operational users to avoid possible misuse by decision makers.
- Recruitment: Traditionally there has been a separation of recruitment (establishing an applicant pool with 1 or more prospects) and selection (decisions made on applicants regarding employment), unless the recruitment process was tantamount to selection. Low risk is extending an invitation rather than making the selection. Using multiple recruitment sources is less risky than reliance on one which may be tilted toward or away from a particular group. This may be an issue particularly with passive recruitment.
- Suitability: An applicant might be fully qualified for the job but there is good reason not to employ this applicant.
A sweep of social media and other public sources for indicators of problematic behavior (e.g., membership in a hate group) likely would be low risk. Applying this only to finalists would further lower risk; the numbers are smaller, reducing the possibility of legally cognizable adverse impact. Exclusions based on stereotypes rather than specific business-critical reasons are risky.
- Organizational development: Included are various practices that promote communication, teamwork, and efficient organization.
Processes that promote inclusion without being intrusive are less risky. Analysis of work patterns and interests that lead to invitations to a shared work community may exemplify this and be low risk. Interaction monitoring is less risky for relatively short studies compared to long-term or time-intensive (24/7) monitoring. Explaining the reason for the monitoring and getting employee buy-in diminishes risk. Attitude surveys with little guarantee of privacy or anonymity can be risky unless supported by a compelling business rationale and an accepting culture; as with monitoring, some employee groups may feel singled out for unwanted attention if the purpose of the information gathering is not accepted.
- Selection: There is a differentiation of personnel with important consequences.
Because there are legally protected classes for employment selection, there is always risk. Risk is lessened when the process is supportable by traditional validation concepts and practices, validation information comes from more than one validation strategy, there are no alternatives equally effective with less adverse impact, there is a review mechanism, and cross-validation or other review is built into the process. Practices likely to increase risk include dust bowl empiricism, lack of the job analysis and criterion development that define the job and why the criterion is important, and machine learning with continuous automatic model revision so that the decision rules change for each successive cohort of selections.
Per Kurt Lewin (social psych and organizational development pioneer), there is nothing so practical as a good theory. Theory organizes isolated facts into a coherent story, highlights where parts of the story are missing, predicts new facts to be established, and helps troubleshoot the predictions when they come up short. Processes built on a reasonably sound framework of theory are less risky. Analyzing anything that’s conveniently lying around can be very risky.
- Narrative. There is less risk if the process can be described with a coherent narrative. This is not only for the benefit of the audience; it makes the designers/users walk through the process, allowing for inconsistent or vague areas to be examined.
- Vulnerability. A good theory should be specific as to what, given certain conditions, should happen when a process is engaged in; if it doesn’t happen, there’s a signal that something needs to be checked. There are specifics of what the theory says are vulnerable to disconfirmation; of course, confirmation is likely the preferred outcome. Higher risk comes with the atheoretical situation where, whatever happens, the process is self-validating
- Nomological network: A process is less risky to the extent that it is embedded in a network of constructs and measurement practices that support it.
As the saying goes: Garbage in, garbage out. Data with known provenance that establishes accuracy and completeness are less risky.
- Data. Risk is reduced to the extent that there is data validation (cleansing to ensure correctness and usefulness) and data integrity (accuracy and consistency of data over its entire life-cycle).
- Representativeness: The data represent the population of interest for the specific purpose. This reduces risk, particularly of the kind where algorithms are trained to identify characteristics that are demographically biased.
- Analyzability: The data can be meaningfully analyzed for the intended purpose; there are the tools and expertise to perform the analyses. Preferably, there is generally accepted professional practice to back this up, not something put together for the immediate purpose.
The meaning of the data analyses does not change. Having a means to ensure this makes the process less risky.
- Temporal stability: To the extent that there is reason (argument and/or data analysis) to believe that the relationship to the criterion will endure as long as necessary for the intended purpose, there is less risk.
- Statistical stability: The relationship to the criterion is less risky to the extent that it is not built on chance or overfitting the data initially at hand. Cross-validation as an ongoing activity comes in here.
The above is hardly the last word on how risk can be parsed. But if it clarifies what we mean by risk and stimulates discussion on how to control it, then its purpose will have been served. It can also be seen that many risk factors apply to data regardless of size.
But it is not enough to describe risk. As a profession, we need to deal with it.
Define the Professional and Legal Matters to Be Addressed
The boogeyman, as noted above, is machine learning run amok. That is not the totality of Big Data and its analytics. The picture is further obscured by commentators who seem to have little awareness of personnel selection science and practice that originated over a century ago. Big Data is providing some interesting possibilities for selection methods. For now, that contribution is more incremental than revolutionary. It is not yet supplanting established methodology. Morrison and Abraham (2015) noted that the 3Vs are generally not essential for selection work—at least not yet. Identifying applicants with sufficient reading comprehension ability to deal with materials used on the job could be done with a computer-administered reading test. Presumably the employer could data mine proxy indicators of reading comprehension from various sources, but why? If Big Data is more suited to new areas (neuroscience games, micro-behaviors during interviews), it may avoid traditional areas important but burdened by adverse impact (e.g., cognitive ability and race, upper body strength, and sex). If not used to perpetuate patterns of exclusion, applications regarding interpersonal skills, corporate citizenship, and organizational fit may be promising areas with low adverse impact.
Although we can hypothesize on how Big Data can generate fresh concerns regarding adverse impact, there is little available information on how specific applications are confronting adverse impact and job relatedness issues. Dunleavy and Morris (2017) note this lack and recommend monitoring events as they unfold. This suggests that although some issues may be addressed now, responding to the challenges of Big Data is a long-term enterprise.
But there are some issues that can be discussed and acted upon in the short term.
There is anecdotal evidence that “profiling” (PreVisor, 2010) has gained some traction with employers. This practice involves the assessment, usually on multiple factors, of “top performers” identified by the client organization’s management (perhaps a very small number of people). The profile of assessment scores is then used as the benchmark for new hires. The supposition is that the profile denotes factors which, if present in the new hires, will result in identifying more top performers. The PreVisor white paper identifies seven problems with this supposition. In essence, selection can be based on superficial factors rather than anything substantive for the organization, producing risk that those selected simply share a superficial similarity with high performers (worse if what’s shared is protected class group), and thwarting the opportunity to bring to the organization those without the superficial factors but having substantive capabilities. This is a problem previously associated with empirically keyed biodata.
The federal Uniform Guidelines on Employment Selection Procedures (1978, § 14B(5); hereafter UGESP, with associated questions and answers [Q&As]), provides a simple standard for criterion validity: statistical significance, α = 0.05. Statistical power increases with sample size, and with big samples it will be possible to get the requisite probability with tiny effect size. Whelan and DuVernet (2015) add concern for violation of statistical assumptions underlying the tests, inflation of statistically significant findings due to repeated testing of the same data and post-hoc hypothesizing (peek at the data first, then state a hypothesis suggested by peeking), and heterogeneity of units thrown together in one big analysis. Add to this the stability of relationships derived from the analysis with no theoretical backing to suggest stability, and no cross-validation to check for overfitting the relationship so that it hold only for the initial data. None of these concerns are new, but they get new salience with the analysis of Big Data.
King and Mrkonich (2016) raise a more fundamental issue, causality versus correlation4 in adverse impact cases, noting “the affirmative defense of job-relatedness protects Big Data methodologies only to the extent that courts countenance criteria that are not directly job-related but instead correlate with job performance.” Their argument seems to rest on a distinction between job-related assessment and assessment “not necessarily job-related but highly correlated with behavior relevant to the job.” This may seem a semantic quibble, but a similar issue has come in other legal discussions: the difference between competencies viewed as enabling an applicant’s performance now, and less immediate considerations such as tardiness, absenteeism, turnover, and susceptibility to on-the-job injury.
Their argument seems to be directed primarily at web scraping and incorporates concerns for statistical significance of trivial effects and instability of correlations. They raise the possibility of differential validity by demographic group, depending on what data are used. They also mention the less discriminatory alternative problem: “Whether an algorithm’s marginally greater predictive ability is sufficient to justify its greater adverse impact, if indeed the law recognizes any trade-off between the two.” How big a potential difference in adverse impact demands action as a matter of law? If the application does not self-adjust to maximize validity and minimize adverse impact, how often must the user check whether an adjustment needs to be done? Q&A 49 declares that continually investigating alternatives is not necessary after the initial investigation is done for the validity report, “until such time as a new study is called for.” But likely this was not written with self-adjusting algorithms in mind.
Some of their concerns seem well-taken, although not necessarily unique to Big Data; other concerns, maybe not so much. Causality inferred from any observational study, in contrast to a controlled experiment, is tenuous. That has not produced an insurmountable hurdle to litigation for either side. Reliance on UGESP may be misplaced5; UGESP does not directly address the causality matter, but Q&A 58 states regarding criterion studies that “measures such as absenteeism, tardiness or turnover may be used without a full job analysis if these behaviors are shown by a review of information about the job to be important in the specific situation.” This would seem to blunt in one stroke the causation argument and the assertion (p. 577) that all validation under UGESP requires a job analysis.
The Internet Applicant Rule used by the U.S. Department of Labor’s Office of Federal Contract Compliance Programs may need a look. Essentially, the rule allows contractors to forego applicant demographic record keeping for those who do not meet “basic qualifications.” But the rule is restrictive in how much consideration can be given to applicants other than a quick screen; these is no elaboration on whether specific Big Data applications would or would not comply.
Educate Vendors, Users, and Enforcement Agencies
There maxims apply here:
- Good practice drives out bad practice.
- Nature abhors a vacuum.
- When self-interest joins with the public interest, great things are possible.
The Guzzo et al. (2015) recommendations started as the product of a SIOP ad hoc committee. The evolving nature of Big Data applications does not allow for a one-shot approach to professional guidance. That guidance can be done without calling out specific service providers, overly constraining professional judgment, or inhibiting innovation. But there are risks to manage, and some practices with known flaws to avoid. The precise mechanism for doing this within SIOP, such as an ad hoc or standing committee, can be worked out as needed. There are reasons for I-O psychology to take ownership, certainly not of all Big Data issues, but those that impinge on employment practices. Although the current federal administration is not into regulation proliferation, various federal agencies have expressed concern for how Big Data applications impact consumers and employees. Protecting these constituencies is their mission; professional inattention invites filling the vacuum in the public interest. Efforts to provide that protection would be enhanced by guidance on professional issues from professional sources. It also is in the interest of the profession that government agencies understand how guidance or action that they take could impinge on professional practice. There also is a need to protect our brand. Handler, Bergman, and Taylor’s (2018) abatement of risk included the presence of I-O psychologists in the development of Big Data applications. There is also a marketing aspect. A plaintiff-side employment attorney was describing practices that could fit under the Big Data header and explaining why some were ineffective and potentially discriminatory. When asked why employers would sign on to these applications, he said that the marketing presentations were attractive, and the applications were cheap and quick to implement. If the SIOP constituency seeks to compete with data scientists and other non-I-Os in the employment sphere based on quality and effectiveness rather than that of cheap and quick, education on the issues is critical, for us and for potential clients.
Go Back to School
See Guzzo, Park, Stanton, McAbee & Landis (2018) for the new curriculum. Or, if you, gentle reader, are not already into Big Data and the new analytics, at least avail yourself of materials that can be found on the SIOP website, presenters’ blog sites, or general sources. Then use what you learned. Doing good things with Big Data and modern analytics is more fun than the legal stuff!
1 Here are simple definitions for terminology associated with analyzing Big Data: ”Web-scraping, searching the Web across huge amounts of data and downloading relevant portions of it; data mining, algorithms for finding patterns in massive data, regardless of source; unsupervised learning, methods for reliable grouping or classification, rather than making specific predictions; supervised learning, methods where, given a criterion and predictors, the algorithm “leans” how to predict the criterion from data.
2 This originated with Guzzo, MacLane, Mondragon, Tison, and Tonowski (2017). Special thanks are due to Gary Behrens for editing the table. Praise the editor, you the reader, if you like the table; blame this writer if you don’t.
3 This is a focal article regarding Big Data practice guidelines in SIOP’s Industrial and Organizational Psychology: Perspectives on Science and Practice, accompanied by a set of refereed commentaries.
4 Ross and Merrill (2017) also have a causal concern, but with a specific instance: “big data” application of statistical patterns to individual decisions without considering causal factors, thus committing ecological fallacy.
5 UGESP has gone 40 years without revision and there are concerns for its relevance apart from Big Data. See McDaniel, Kepes, and Banks’s (2011) focal article and associated commentaries.
Equal Employment Opportunity Commission, Civil Service Commission, Department of Labor, &
Department of Justice. (1978). Uniform guidelines on employee selection procedures. Federal Register,
43, 38290–39315. Q&As 1-90, Federal Register, 44, March 2, 1979. Q&A 91-93, Federal Register,
45, May 2, 1980. Retrieved from www.uniformguidelines.com.
Dunleavy, E.M. & Morris, S.B. (2017). Some conclusions and emerging issues in adverse impact measurement. In S.B Morris & E.M. Dunleavy (Eds.), Adverse impact analysis: Understanding data, statistics, and risk. New York, NY: Routledge.
Guzzo, R.A., Fink, A.A., King, E.B., Tonidandel, S., and Landis, R.S. (2015) Big Data recommendations for industrial-organizational psychology. Industrial and Organizational Psychology: Perspectives on Science and Practice, 8, 491-508.
Guzzo, R.A., MacLane, C.N., Mondragon, N.J., Tison, E.B., & Tonowski, R.F. (2017). Making better business decisions? Risks and rewards in Big Data. Presented at the 32nd annual meeting of the Society for Industrial and Organizational Psychology, Orlando, FL.
Guzzo, R.A., Park, M., Stanton, J.F., McAbee, S.T. & Landis, R.S. (2018). Teaching Big Data methods in I-O graduate curriculum 2.0 . Presented at the 33nd Annual Conference of the Society for Industrial and Organizational Psychology, Chicago, IL. See also S.T. McAbee (Chair, 2017), Teaching big data methods in I-O graduate curriculum: A primer. Presented at the 32nd Annual Conference of the Society for Industrial and Organizational Psychology, Orlando, FL.
Handler, C., Bergman, S., & Taylor, B. (2018). How to use advanced technology in selection (and feel good about it!). Presented at the 33nd annual meeting of the Society for Industrial and Organizational Psychology, Chicago, IL.
King, A.G. & Mrkonich, M. (2016). “Big Data” and the risk of employment discrimination. Oklahoma Law Review, 68, 555-584.
McDaniel, M.A., Kepes, S., Banks, G. C. (2011). The Uniform Guidelines are a detriment to the field of personnel selection. Industrial and Organizational Psychology: Perspectives on Science and Practice, 4, 419-514.
Morrison, Jr., J.D. & Abraham, J.D. (2015). Reasons for enthusiasm and caution regarding Big Data in applied selection research. The Industrial-Organizational Psychologist, 52, 134-139.
PreVisor, Inc. (2010) Do you want to profile performance or predict it? Retrieved from https://keenalignment.com/do-you-want-to-profile-performance-or-predict-it/.
Putka, D.J. & Landers, R.N. (2018). Modern analytics for data big and small. Presented at the 33nd annual meeting of the Society for Industrial and Organizational Psychology, Chicago, IL.
Ross, D.B. & Merrill, G. (2017). A defense attorney’s perspective. In S.B Morris & E.M. Dunleavy (Eds.), Adverse impact analysis: Understanding data, statistics, and risk. New York, NY: Routledge.
Whelan, T.J. & DuVernet, A.M. (2015). The big duplicity of big data. Industrial and Organizational Psychology: Perspectives on Science and Practice, 8, 509-515.