Home Home | About Us | Sitemap | Contact  
  • Info For
  • Professionals
  • Students
  • Educators
  • Media
  • Search
    Powered By Google

Good Science-Good Practice 


Jamie Madigan
Ameren Services

 

Marcus W. Dickson
Wayne State University

Like many of our colleagues in this issue, we chose to highlight some of the fantastic research we saw presented at the 22nd Annual SIOP Conference in New York. Although not quite as flashy as the scenes taking place just outside the conference hotel in Times Square, a lot of what we saw focused neatly on the mission of this column: the confluence of science and practice within the realm of I-O psychology. In this issue we cover research on performance appraisals, translation issues in standardized testing, alternative test validation strategies, mentorship programs, safety culture, and organizational promise keeping to employees. Just like at SIOP itself, it’s a pretty diverse set of topics.

One of the first presentations of the conference was entitled “Performance Appraisal in the Real World: Bridging the Science–Practice Gap.” The majority of the five presentations focused on the thorny problem of forced distribution systems in job performance ratings. This, as the panel’s chair noted, is a huge issue in the real world and was even referenced in the conference’s keynote address. Many business leaders and strategic thinkers are attracted to forced distribution models because they create stratifications in the performance ratings of employees by robbing reluctant raters of their ability to give everyone similar scores. Presumably this allows managers to identify and react to performance—good or bad—more effectively and apply the correct coaching, rewards, or punishments. And yet the presenters noted that, as with other tools that gained popularity in advance of scientific scrutiny, research has lagged behind the use of these kinds of systems, leaving I-O psychologists with ample opportunities to contribute.

The first presentation on the forced distribution topic was by Blume, Baldwin, and Rubin and discussed things from the ratees’ point of view, noting that four important factors could drive reactions: How low performers are treated, how top performers are treated, what groups employees would be compared against (think departmental vs. company-wide), and how frequently the ratings would happen. In a policy capturing study with college students, the researchers found that all of the factors noted above drove perceptions of forced distribution rating systems as they expected, with preference for less severe consequences for lower performers, higher rewards for top performers, large comparison groups, and frequent ratings. It was also interesting that those with higher cognitive ability were more attracted to forced distribution systems, which might make them a good recruiting tool for such candidates.

The second presentation by Bull, Schleicher, and Green looked at the reactions of those who provide the ratings, specifically focusing on perceptions of fairness. Their research, which the presenter claimed to be the first empirical examination of rater reactions to forced distribution rating systems, presented another set of policy-capturing studies that manipulated the severity of consequences for poor performance and the variability in true performance among ratees while examining individual differences like needs for dominance, achievement, or harmony. Not much was found related to the personality variables, but the researchers did find that raters disliked the system when there were high consequences for low ratings and low variability in discernible performance. The researchers noted that to maximize rater satisfaction, organizations should use forced distribution systems when there is sufficient variability in performance. Personally, I find this somewhat odd because those are exactly the kind of situations that prevent the need for forced distributions in the first place.

The third study examined intentional distortion of scores by raters in forced distribution systems, using subjects with more experience in conducting performance ratings. A fourth study looking at performance raters in Singapore found that even in forced distribution systems raters would give consideration to trends in performance over time—if a ratee’s performance was on an upward trajectory, raters tended to be more lenient, especially if the appraisals were being done for development purposes. As the discussant noted, both of these studies added to our understanding of how context affects performance ratings under such conditions and structures.

Other symposiums at the SIOP convention focused, of course, on other topics, such as selection and employment testing. One, entitled “Using Applied Research to Better Understand How Language Impacts Assessments,” spawned a number of interesting research questions from the assumption that changes in language as a product of test translation will affect more than just the content of the test. For example, one researcher found that proficiency in English as a second language had an effect on test validity independent of the skills needed to take the assessment. Other researchers demonstrated how best to conduct equivalency studies between an English version of a test and one that had been translated to another language.

Later in the conference a panel of other experts on employment testing gathered and discussed how certain scientific and methodological advances in test validation were faring in the field. Specifically, the symposium, entitled “Validity Generalization in the Workplace,” discussed alternatives to traditional test validation strategies that are widely accepted as useful and acceptable by researchers and other experts in the testing industry, but which are sometimes regarded as inscrutable or untested (pardon the pun) by others. Examples included job component validity, validity transportability, and meta analysis. The panelists, all of whom use these validation tactics in their everyday work, explained that they are most often useful and necessary when traditional approaches like criterion-related validation are impossible due to time constraints or the lack of enough incumbents to achieve adequate statistical power for the required procedures.

The incumbents were also forthcoming with many of the sometimes irritating realities of this kind of research, including the fact that one must be able to replicate job analysis procedures for transportability studies and that there was still a certain amount of legal risk involved in these processes given that the courts have yet to build up a strong history of neither support nor opposition for these approaches despite their widespread acceptance among academics and other science-minded practitioners. The presenters also sheepishly provided a somewhat unsatisfying answer to the question of “how close is close enough” when it comes to comparing the components or requirements of two jobs for purposes of transporting validity: “That’s up to you to decide.” It seems there is still a place for professional judgment in the brave new world of alternative validation approaches. 

We also spent a good bit of time wandering the poster sessions and found several of them to highlight for this column. Sticking for the moment with a selection focus, Daniel Newman and Dana Rhodes took a different look at the role of emotional intelligence and whether it can provide incremental validity in selection, or reduce adverse impact, in their poster “Is Emotional Intelligence Worthwhile? Assessing Incremental Validity and Adverse Impact.” The line that seems to be emerging in other work on emotional intelligence is that “mixed models” of EI (e.g., Goleman’s model) are problematic and poorly defined, but “ability-based” models may have promise because of their greater construct validity. However, Newman and Rhodes’ meta-analysis found that neither model contributed much to predictive validity (though the mixed-model measures did add slightly above personality and cognitive ability) but that incorporation of mixed-model measures of EI could substantially reduce adverse impact without reducing predictive validity. This is of course a thorny question: Do we include measures known to have little predictive validity solely to reduce adverse impact? Further, the measures seen as more scientifically sound (the ability-based measures) contribute neither to prediction nor diversity of selection. On the one hand, these results may raise more questions than they answer. On the other, organizations need to know the value and effect of selection tools that seem more and more popular.

Kristina Matarazzo and Lisa Finkelstein’s poster “An Examination of Best Practices Within a Formal Mentoring Program” looked at such components of formal mentoring programs as objective setting, kickoff events, and using mentors who were previously mentees in the same program. They also examined perceptions of the mentor–mentee relationship from the perspectives of both dyad members. Among several interesting and useful findings, they unexpectedly found that mentees reported the highest levels of learning when they were in mentor–mentee pairs in which both had prior experience in their roles or neither had experience in their roles. When either the mentor had been a mentor in the program before but the mentee was new, or the mentee had had a prior mentor but the present mentor was new to the program, reported learning was significantly lower. The authors speculate that in the mixed-experience cases, the experienced person may have difficulty “unlearning” the prior relationship, but in either of the other cases, the two partners could more easily negotiate a successful new relationship. They also found that attendance at “kickoff programs” was not really crucial for successful mentor–mentee relationships to develop and that perceptions of similarity and communication quality were actually higher when the mentor did not attend a kickoff event, perhaps because this compelled the dyad to be more intentional about establishing time to work together and get to know each other. Finally, and not surprisingly, the data showed that mentor–mentee relationships are most successful when the dyad actually establishes objectives for the relationship. Nonetheless, many mentorship programs do not include formal objective-setting components, and these data suggest that this is a mistake.

Moving from dyads to divisions, the poster “Predicting Negative Incidents in Hospitals at Individual and Unit Levels” by Theresa Kline, Chelsea Willness, and William Ghali analyzed over 8,000 hospital admissions across 40 units of three hospitals, focusing on adverse events, or significant complications that arise from the treatment itself, that result in death, disability, or significantly increased duration of hospitalization.  Given many of the alarming (and some would say alarmist) statistics reported in the last several years about medical treatment errors, we care about this work as potential patients. But all organizations can draw on this work as it relates to decision making in complex situations with high stakes. Kline and colleagues’ HLM analyses showed (not surprisingly) that complex cases are more likely to result in adverse events. They also showed, however, that the extent to which unit members perceive the unit to have a high focus on safety and to hold safety as a high priority accounts for over 9% of the unit-level variance in incident severity.  The work is cross-sectional (experimental work would be a bit ethically dicey, I think), but the authors conclude that the healthcare industry could benefit by drawing on safety culture/climate development  interventions of the sort utilized in other industries, such as commercial air transportation and nuclear power generation. Certainly the paper suggests that shared perceptions and values among unit employees can have significant effects on unit outcomes.

Maintaining our unit-level focus, we turn to Gunnar Schrah and Paige Graham’s poster “Keeping Values-Based Promises to Employees: Implications for Business-Unit Turnover.”  The title is pretty descriptive of the purpose of the study, which drew on a sample of over 12,000 employees at over 200 local restaurants within a national chain. Using a measure of organizational promise keeping that focuses on perceptions of behaviors and attitudes as opposed to affect (e.g., managers at this restaurant are open to suggestions from employees), Schrah and Graham tested and found support for a restaurant-level model in which perceptions of organizational promise keeping led to increased affective commitment, which led to decreased unit-level turnover, even after accounting for turnover related to average tenure in each restaurant. This study relates to a wealth of prior work on the importance of organizations attending to affective commitment of employees as often relatively low-cost efforts that can have high impact on turnover and other affective commitment-related outcomes.

This column continues to keep us energized about the work being done that simultaneously advances theory and provides practical guidance to organizations, and coming away from the SIOP conference, we’re pleased but not surprised at the wide range of presentations and papers we had to choose from for this issue. We always welcome suggestions for articles and research to review, and to all of you that we spoke to in New York who promised to send us ideas for future columns, we look forward to hearing from you!  Jamie can be reached at hmadigan@ameren.com and Marcus can be reached at marcus.dickson@wayne.edu.

References

(To save space, we refer to sessions according to their listing in the SIOP program.)
6. Performance Appraisal in the Real World: Bridging the Science–Practice Gap. Deidra J. Schleicher and Lisa M. Keeping, Chairs.
96. Using Applied Research To Better Understand How Language Impacts Assessments. Autumn D. Krauss, Chair.
173-12. Predicting Negative Incidents in Hospitals at Individual and Unit Levels. Theresa J. B. Kline, Chelsea Willness, & William A. Ghali.
173-21. Keeping Values-Based Promises to Employees: Implications for Business-Unit Turnover. Gunnar E. Schrah & Paige K. Graham.
214. Validity Generalization in the Workplace. John A. Weiner, Chair.
229-7. An Examination of Best Practices Within a Formal Mentoring Program. Kristina Matarazzo & Lisa Finkelstein.
249-27. Is Emotional Intelligence Worthwhile? Assessing Incremental Validity and Adverse Impact. Dana Rhodes & Daniel A. Newman.