Meredith Turner / Friday, March 23, 2018

/ Categories: 554

Lost in Translation: Visually Communicating Validity Evidence

Michael L. Litano, Andrew B. Collmus, & Don C. Zhang

In our previous column, we discussed the complexity and nuances of measuring unobservable psychological phenomena and the importance of verbally communicating the value of reliability and validity evidence to non-I-O psychologists. Our interviews with Fred Oswald, Jeff Jolton, and Don Zhang were insightful, impactful, and extremely well-received by the SIOP community. However, we also received some feedback from I-O psychology practitioners that emphasized how much more frequently they communicate in the forms of charts, figures, and PowerPoint decks than simply in conversation, making it difficult to apply the lessons learned from our last column to their current roles. In fact, it is common in the business world to have your PowerPoint decks “walk” around the organization after your presentation, meaning that you must create your presentation to be interpretable and easily understood even without talking points to accompany the slides.

The focus of the current column is just that: How does one effectively communicate validity evidence using only visualization? Despite the interesting and relevant topic, we must express how difficult it was for us to obtain visualizations that effectively represented validity evidence. We had several academics and practitioners volunteer contributions that either (a) we had difficulty understanding, (b) mirrored an output you might see in the Journal of Applied Psychology rather than a business meeting, or (c) weren’t representative of validity evidence. For example, although some business audiences may generally be able to interpret a scatterplot (see Don Zhang’s contribution below), we as a field can and should do a better job of taking a validity coefficient and presenting it such that the value can be grasped by anyone.

The goal of this column is to show how conducting rigorous research on the back end is important, but only so far as the evidence/results are effectively communicated to those who decide whether and how to act. We hope to demonstrate how one might effectively visualize validity evidence, including how one might visually present a rigorous analysis or complex model in such a way that non-I-O audiences can understand and appreciate its value. Like previous columns, our hope is that this shift in focus will provide anyone who finds themselves “lost in translation” with resources that not only helps the individual I-O psychologist, but also builds the awareness and use of I-O psychology in organizations. In the sections that follow, Don Zhang, Michael Litano, AJ Thurston, and Daniel Hawthorne each share some of the ways that they have effectively presented validity evidence.

Don C. Zhang

In a recent conversation with a director of human resources, he (the director) lamented about the low prediction of most selection methods. “Only about 50%,” he said, referring to the meta-analytic validities reported in the seminal Schmidt and Hunter (1998) paper. Two problems: first, a correlation of 0.50 translates to 25%—not 50%—of variance explained; second, only 50%? As a field, we are not doing a good job of communicating the value of our work when our most treasured selection methods are received with an unimpressed “meh” by people in positions to use them.

Traditional validity indices such as the correlation and coefficient of determination do not convey the practical impact that a selection method has on organizations. Percent of variance explained is not the kind of metric business professionals care about, and 25% is not a number that will impress many. But, 25% is about as high as we can get due to the “validity ceiling” of predictions (Rundquist, 1969). Clearly, there is a need to communicate our evidence in a way that resonates with the public.

There are many alternative graphical and nongraphical displays of validity information (Kuncel & Rigdon, 2012). But, not all data visualization techniques are created equal. The scatter plot, for example, displays the relationship between a predictor and criterion variables on the X and Y axis. The plot below illustrates the relationship between ACT and college GPA, which has a validity of r = .30: not particularly impressive. At a glance, the relationship depicted in the scatter plot appears equally unconvincing. Yet, scatter plots are the primary method for visualizing linear relationships.

Alternatively, one can use an expectancy chart (Lawshe & Bolda 1958). Expectancy charts communicates the relationship between two variables (e.g., ACT score and GPA) by presenting the proportion of the sample with score above a cut-off criterion (e.g., GPA above 3.5) at a given score interval on the predictor (e.g., ACT score between 25 to 27). Believe it or not, the expectancy chart below is generated with the same data as the scatter plot. Based on the expectancy chart, one can easily see the predictive efficiency of the ACT. Students in the top quintile of ACT have approximately 70% chance making the Dean’s List (GPA above 3.5) at most universities, whereas students in the bottom quintile of ACT have a less than 25% chance (25% takes on a whole new meaning in this context).

Sometimes decision makers are interested in dichotomizing the predictor. For example, an academic administrator might be concerned with choosing an appropriate ACT cut-off score for admitting students into the honors college. To visualize the distribution of GPAs between individuals above or below a cutoff, we can use an overlapping density plot. The figure below illustrates the distribution of GPAs for students with ACT scores above and below 26. One way to translate the results is with the Common Language Effect Size (CLES, McGraw & Wong, 1992). The CLES for the validity of the ACT can be described as, “a randomly chosen person with ACT greater than 26 has a 62% chance of obtaining a higher GPA than a random person with ACT less than 26.”

Nontraditional effect size indices and displays are particularly useful when communicating the practical value of selection tests. Research from my lab has found that the lay public tend to judge nontraditional displays as easier to understand than traditional validity statistics; and more importantly, they had much more favorable judgments about selection instruments (e.g., ACTs, structured interviews) when validity information was presented with nontraditional graphical displays.

Despite the benefits of these nontraditional displays of validity, there are—unfortunately—no accessible tools for generating these displays. Common statistical packages (e.g., SPSS) do not readily produce these alternative displays. In order to facilitate the calculation of nontraditional effect size displays, I’ve created a free-to-use web application that allows scholars and practitioners to easily generate and visualize a variety of nontraditional effect sizes such as expectancy charts, CLES, and binomial effect size displays with their own data (https://dczhang.shinyapps.io/expectancyApp/).

Michael L. Litano

Entering the applied world, I incorrectly assumed that senior leaders and other organizational decision makers would be able to understand and interpret analyses of “people” data that extended beyond simple descriptives, such as favorability scores and agree percentages (i.e., percent of employees who “agree” or “strongly agree” to a single or set of questions). That’s not to say these leaders didn’t care or were incompetent, but because the way that validity evidence tends to be presented is too dense and unintuitive. If we assume the perspective of a senior leader with no I-O background and/or a limited understanding of statistics, what would we do with correlation coefficients, coefficients of determination, or (un)standardized parameter estimates?

I have conducted numerous interviews and focus groups with senior leaders and three themes generally emerge when it comes to digesting and interpreting survey results: they want (a) easily interpretable findings, (b) to see “trend”—how much scores have improved or declined since the last survey, and (c) what to focus on and how to drive meaningful change. As I-O psychologists, we are well-equipped to meet these needs, and I have found the best way to present this data is by doing my due diligence as a scientist in the background, then presenting findings that are easily interpretable even if it’s not the exact analysis that I conducted. After all, in people analytics, the goal is to use data-driven approaches to inform people and organization-related practices, programs, and processes, not to publish in a peer-review journal.

Here’s a scenario: Employee engagement is a high-value metric at Organization X. Employees are surveyed quarterly and each survey includes ten 3-item scales to measure constructs research suggests are primary antecedents of engagement (e.g., leader–employee relationship, professional development opportunities, etc.). Here’s one way I have analyzed such data in the background and then simplified the presentation of findings for a non-I-O audience.

Analysis: Every company has their own guidelines for presenting descriptive data (see translation section below), but when analyzing relationships between variables it’s still essential to methodically clean data, check assumptions, assess reliability of your measures, and demonstrate construct validity (correlations are fine with low N, but use confirmatory factor analysis with high N). At this point, we have some level of confidence that we are measuring distinct constructs that we are intending to measure. I don’t present any of these analyses to leaders outside of my team.

Because we have already identified employee engagement as the high-value metric of interest (let’s assume engagement is related to important HR and/or business outcomes), our goal is to be able to tell senior leaders with some degree of confidence what they should focus on to drive meaningful change. Especially when collected at one point in time, variables in organizational surveys generally share moderate-strong correlations with one another. Rather than rely on correlations (which do not provide accuracy metrics and consider only the focal relationship of interest) or multiple regression (which can provide misleading results when independent variables are highly correlated), I typically use relative importance analysis. This analysis treats independent variables as orthogonal so that you can more reliably assess the unique contributions of each. It easily identifies the most important “predictors” and sets us up well to translate.

Note: Scott Tonidandel & James LeBreton developed free programs for calculating relative weights in multiple, multivariate or logistic regression: http://relativeimportance.davidson.edu/

Finally, it’s important to consider the magnitude of the findings you are sharing. For example, if I was only able to explain 2% variance in engagement, how much of an impact will these variables really have? Conversely, in large sample sizes, everything is statistically significant, so it is important examine the practical significance (direction and magnitude) of effect sizes. These and other considerations must be made as part of your duty as a scientist.

Translation: Considering what I told you leaders care about, I could present this analysis in two very easy-to-understand slides:

1. They want easily interpretable findings. Raw mean scores and standard deviations aren’t meaningful to most people, so there is a data transformation aspect involved with this step. Each organization has a “preferred” way of communicating results, so you can be more effective if you can cater to their norms. In Organization X, managers are used to seeing “agree percentages.” Therefore, I re-code each individual survey response variable to be binary (1 = agree or strongly agree on 5-point scale).

2. They like to see “trend.” This isn’t validity evidence, nor is it something we often care about as I-O psychologists. But again, if you can take the leader’s perspective, it’s logical to wonder, “how were my scores this survey?”, “how do my scores compare to the organization?”, and “how do my scores compare to my historical scores?” Generally, Slide 1 answers these questions for the senior leader and prepares him/her for our “validity story”:

3. They want to know what to focus on and how to drive change. Here, we take the general results of our relative importance analysis and translate it for senior leaders. In this analysis, assume we’ve identified two variables to have the strongest relationships with employee engagement: leader–employee relationship quality and professional development opportunities.

Even when I am not conducting a random forests or other tree-based analyses, presenting this in a decision tree has been one of the more effective ways to communicate this finding to demonstrate what employee engagement scores look like in instances where these two variables are rated favorably and unfavorably. There are a lot of rules that you—as an analyst, team, and company—have to decide upon (e.g., how many tree branches to present, what are the cut-offs for “yes” and “no” decisions, etc.). Once those rules are determined, you can use your stats program of preference to codify each of the four decision tree buckets (high-high, low-low, high-low, low-high) and create a pivot table that generates unique engagement scores for each. I always present these in a footnote so that even when the deck “walks,” questions about the analysis can be answered.

As you can see in the chart below, it’s easy to see the percentage of associates who are engaged in each of the four scenarios. Again, I’m trying to communicate that these are the two things senior leaders should act on—not publish in an academic journal—so I decided to cut the branches off at two layers and codify predictors into binary agree scores so that the “yes” and “no” decisions are more easily interpretable (i.e., “yes” means an associate “agrees” or “strongly agrees” on average to that set of questions). Slide 2 highlights that 85% of employees that have high-quality relationships with their leaders and have their professional development needs met are engaged, compared to only 15% who have neither.

For those senior leaders who are more statistically and analytically proficient, you can also try to present the results of a relative importance analysis like the below:

Daniel R. Hawthorne

When using this data visualization method, it is important to explain different ways that this shows the importance of each variable to the job(s) in question. Green shows the direction of the desired scores and the row-height indicates the strength of the influence. A lay person can rapidly look down the scores and see what the most important attributes are to a particular job.

After a lay person understands how the data is visualized, I can use this to get feedback about our data-driven regression equations and know if I need to consider adjusting weights based on attributes important to the client that we may not have picked up in data collection. We can also use this method to have new clients give feedback about transportable solutions and how they might fit with their jobs and, ultimately, how they might need to be adjusted to account for organizational differences between clients.

Additionally, if we’ve conducted a concurrent validation solution, this data visualization method can be used to adjust for an organizational change, where an organization wants to select for different employees than they currently have. For example, incumbent employees might show a strong association with problem solving and ingenuity, and a client might want to shift more to a model where caring and compassion attributes are the strongest weighted predictors. This method of data-visualization would show the existing employees’ current organizational profile and make it easy to have a conversation with the client about how new employees might be selected to push the organization toward a new organizational competency model in line with the organizational change.

A.J. Thurston

Suppose I have an organization with an expensive training program. The program concludes with a final test which demonstrates a minimum level of competence and must be passed to graduate (score > 45). The pass rate is declining. I’ve been asked to develop a screen-out system for their program. Here are the validation results of that system.

I start by orienting them to the plot, the x-axis shows predicted criterion score quantiles, and the y-axis shows the mean actual criterion score for that quantile. I’ll explain what the quantiles are and how they were developed, with quantile 1 representing those trainees who scored in the bottom 20% of the predicted score, and those in quantile 5 representing those who scored in the top 20%. In describing how the quantiles were made, I also emphasize this is a simplification of a continuous scale.

Because this is a screen-out procedure, most of the discussion will focus on those in quantile 1. Here, I will add back some of the detail that’s been simplified to create the plot; specifically, the 95% confidence interval for this quantile. In this case, the 95%CI of the actual scores range from ~40–44. Conveniently, I can say anyone who falls into this predicted quantile has a 95% chance of actual failure.

If the cost of failure is quantified, the discussion continues in dollars, but here it’s important to be conservative. I’ll do so by giving a range of potential savings to temper expectations. The discussion ends by implementing a criterion-referenced cut score based on the training needs of the organization, the cost of failure, and any other client considerations.

AJ also shared an app he created in R that he uses when teaching about validity in classification decisions. We took screenshots to show how it works. The first example shows a predictor–criterion correlation of .47, with a predicted cut score of 0.3 and actual cut score of 0.5.

In the second example, we changed the correlation to .3 and both the predicted and actual cut scores to 0. The interactive app automatically updates the accuracy statistics in the column to the left of the graph.

Summary

This column was the second in our series on translating validity evidence in applied settings. Whereas last time we asked I-Os for verbal translation examples, this time we asked how I-Os use visualizations to help explain these concepts. To that end, several practitioners shared some of the creative ways they simplify validity evidence for nonacademic audiences. We are extremely grateful for these contributions and hope that you will be able to use these resources to craft better, clearer, and more impactful visual representations of research findings.

We generally feel that I-Os excel at carefully designing and conducting applied research, then methodically cleaning and analyzing the data to uncover meaningful and impactful results. However, scientific rigor means nothing if decision-makers do not understand the results and the benefits and consequences of acting on results versus inaction. With this theme in mind, we’d like to close our Lost in Translation series with a rather critical and possibility controversial assessment of our field. This may seem like an odd critique, but we believe that generally, I-O psychology graduate programs are not doing an adequate job of preparing students to be effective communicators of our science and ambassadors of the field. Despite claiming to be trained in the scientist–practitioner model, our formal education asymmetrically emphasizes traditional academic science communication, while virtually ignoring the burgeoning and crucial aspects of public science communication (i.e., Sci-comm). That is not a bad thing! But it does help our inspiration for this series come full circle. Maybe graduate programs aren’t currently equipped to teach the translation of science into practice.

Internships and consulting opportunities certainly exist for students in some programs, but perhaps a part of the solution to maintaining our relevance is to formally incorporate “translation” into the graduate curriculum. Intuitively, academics seem best-suited to prepare students for academic careers: they are scientists who are incentivized by publishing new theory and rigorous research studies in peer-reviewed journals with an academic audience. Wouldn’t it make sense to hire practitioners as adjunct professors—not to teach core I-O classes but rather to advise how the complex topics we learn function and are talked about in businesses?

For a field that traces its roots back to the late-1800s and whose flagship journal recently celebrated its centennial, we are woefully irrelevant to the majority of HR and business professionals (Rose, McCune, Spencer, Rupprecht, & Drogan, 2013). One of the most popular and provocative TIP articles of 2017 blatantly questioned whether I-O psychology has lost its way and identified other fields that are becoming better recognized than we are for the things that I-Os do best (Ones, Kaiser, Chamorro-Premuzic, & Svensson, 2017). We believe that our generally inability to effectively translate research findings into simple and actionable knowledge for organizational decision makers is a momentous impediment to our field’s continued relevance.

Many scientists are now using R, Python, and other languages to create custom and flexible visualization tools. Science communication continues to emerge as an independent field as the scientific community and general public realize the criticality and shared goals of open-source accessible science—this includes communicating complex research in a way that can be digested by the general public. For the sake of our field’s effectiveness and relevance, it is time for I-O to get with the program. We hope that graduate programs will take note and better prepare I-Os to effectively communicate science to a larger, more diverse, and nonacademic audience.

In addition to the contributions above, we’d also like to thank Evan Sinar for pointing us toward several validity and general data visualization resources that we found to be particularly useful:

Visualization resource	Description	Author(s)/creator(s)/ host(s)
Graphical descriptives	Dynamic analytics platform that allows visual exploration of univariate and bivariate scatterplot matrices, group means, moderator analyses, & regression assumption checks	Louis Tay, Scott Parrigon, & James LeBreton
Seeing theory	A visual introduction to probability and statistics	Daniel Kunin & team
Interpreting correlations	An interactive visualization tool that helps with the interpretation of correlations, Cohen’s d effect size, null hypothesis significance testing, and interpreting confidence intervals	Kristoffer Magnusson
Visualizing regression	A guest post on Stephanie Evergreen’s website that focuses on guidelines for effective visualization of regression	William Faulkner, Joao Martinho, & Heather Muntzer
Show me the data!	An ongoing compilation of key data visualization links and resources originally developed for a SIOP 2016 workshop	Evan Sinar, Eric Doversberger, & Kristin Charles
Alternative effect size calculator	An interactive app that allows you to upload your data, set criterion cut-offs and other parameters, and visualize expectancy charts, density plots, and common language effect size indices	Don Zhang

Acknowledgements

We have had quite the journey over the past 2 years and six columns. In addition to the 40+ interviewees who contributed to our columns, we’d like to express our sincerest gratitude to Tara Behrend for trusting two relatively unproven and definitively naïve graduate students with an opportunity to make a substantive contribution to our professional organization’s official publication. She even suggested “Lost in Translation” as a potential title. Thank you for trusting us, Tara. We hope this column lived up to your expectations.

What’s Next for Lost in Translation?

This is last column in our recurring Lost in Translation Series. We are going to take some time to consider how we might take Lost in Translation to the next level. If you just can’t get enough, please consider attending our SIOP 2018 Executive Board Session (I-O Value [No Longer] Lost in Translation http://www.siop.org/Conferences/18con/Regbk/EB_block.aspx).

Interviewee Biographies

Don C. Zhang is an Assistant Professor in the Department of Psychology at Louisiana State University. He received his PhD from Bowling Green State University. His research focuses on decision making, statistical communication, and employee selection. He is particularly interested in why many managers are reluctant to use evidence-based hiring practices such as structured interviews and mechanical data combination methods. He can be reached at: dc.zhang1@gmail.com

Michael L. Litano is an independent consultant and a principal associate on the People Analytics team at a large Fortune 100 company. He earned his PhD in I-O Psychology from Old Dominion University. He specializes in measurement, assessment, and advanced applied statistics. Michael’s research interests lie mostly in leadership, employee engagement, and diversity and inclusion and is passionate about increasing the awareness and use of I-O psychology in organizations. He can be reached at michael.litano@gmail.com

Daniel R. Hawthorne is the director of I-O Solutions, WorkFORCE Innovation within the Global Education and Workforce Division at Educational Testing Service in Princeton, NJ. Dan leads the I-O Solutions team by providing consultative, evidence-based solutions to internal and external clients, acting as a liaison between the ETS Strategic Business unit and ETS Research and Development, and advancement of US- and global-based business opportunities for ETS. He also provides direct client-facing leadership in client management and account development through the development and implementation of scalable and repeatable I-O solutions.

AJ Thurston is an I-O Psychology PhD candidate at the University of South Florida. He specializes in assessment and selection as they apply in military and veteran contexts. He is passionate about communicating I-O using data visualization, animation, and interactive web applications.

References

Kuncel, N. R., & Rigdon, J. (2012). Communicating research findings. In Handbook of Psychology (pp. 43–58). New York, NY: John Wiley & Sons.

Lawshe, C. H., & Bolda, R. A. (1958). Expectancy charts: I. Their use and empirical development. Personnel Psychology, 11(3), 353–365.

Litano, M. L. (2017). Lost in translation: Verbally communicating reliability and validity evidence. The Industrial-Organizational Psychologist, 55(2). Retrieved from: http://www.siop.org/tip/jan18/editor/ArtMID/13745/ArticleID/114/lost

Litano, M. L. & Collmus, A. B. (2016). Communicating the practical value of I-O. The Industrial-Organizational Psychologist, 54(1). Retrieved from: http://www.siop.org/tip/july16/lost.aspx

McGraw, K. O., & Wong, S. P. (1992). A common language effect size statistic. Psychological Bulletin, 111(2), 361-365.

Ones, D. S., Kaiser, R. B., Chamorro-Premuzic, T., & Svensson, C. (2017). Has Industrial-Organizational Psychology lost its way? The Industrial-Organizational Psychologist, 54(4). Retrieved from http://www.siop.org/tip/april17/lostio.aspx

Rose, M. R., McCune, E. A., HartmanSpencer, E. L., Rupprecht, E. A., & Drogan, O. (2013). Increasing I-O and SIOP brand awareness among business and HR professionals: What’s the baseline? The Industrial-Organizational Psychologist, 50(4). Retrieved from http://www.siop.org/tip/Apr13/07_Rose.aspx

Rundquist, E. A. (1969). The prediction ceiling. Personnel Psychology, 22(2), 109–116. https://doi.org/10.1111/j.1744-6570.1969.tb02293.x

Schmidt, F., & Hunter, J. (1998). The validity and utility of selection methods in personnel psychology: Practical and theoretical implications of 85 years of research findings. Psychological Bulletin, 124(2), 262–274.

Tonidandel, S. & LeBreton, J. M. (2011). Relative importance analyses: A useful supplement to multiple regression analyses. Journal of Business and Psychology, 26, 1-9.

Tonidandel, S. & LeBreton, J. M. (2014). RWA-Web—A free, comprehensive, web-based, and user-friendly tool for relative weight analysis. Journal of Business and Psychology. doi: 10.1007/s10869-014-9351-z.

5317 Rate this article:

5.0