Home Home | About Us | Sitemap | Contact  
  • Info For
  • Professionals
  • Students
  • Educators
  • Media
  • Search
    Powered By Google

Effect Size Reporting in Applied Psychology:  How Are We Doing?

Eric M. Dunleavy
American Institutes for Research

Christopher D. Barr
University of Houston

Dana M. Glenn
Transportation Security Administration

Kristina Renee Miller
University of Houston 

I believe that the almost universal reliance on merely refuting the null hypothesis as a standard method for corroborating substantive theories in the soft areas is a terrible mistake, is basically unsound, poor scientific strategy, and one of the worst things that ever happened in the history of psychology. (Meehl, 1978)

Over the last decade, quantitative practices in psychological research have changed, with the role of null hypothesis significance testing (NHST) being questioned and more emphasis being placed on effect sizes. In fact, the American Psychological Association (APA) and quantitative scholars recommend that NHST always be accompanied by other indices, including relevant derived effect sizes and confidence intervals (Cohen, 1990; 1994; Falk & Greenbaum, 1995; Kirk, 1996). This prioritization of effect size affects academics and practitioners of I-O psychology in at least two ways. First, revisions to the APA Publication Manual and quantitative reporting guidelines for submission in top-tier journals affect the way I-O psychologists analyze and present their results in research articles. Similarly, professional standards also prioritize effect size reporting. For example, the 4th edition of the Principles for the Validation and Use of Personnel Selection Procedures (Society for Industrial and Organizational Psychology, 2003) requires the reporting of effect sizes in all applied research where effect sizes are available.

Second, effect size reporting ensures the availability of information for meta-analysis. This is important for academics and practitioners. For example, in the academic realm, meta-analytic work allows for a better understanding of relations between constructs and appropriate research design and sampling via a priori power analysis. Meta-analytic work is also valuable for practitioners because it provides a way to evaluate the (a) practical significance of an intervention, (b) viability of transporting validity, and (c) potential adverse impact associated with a selection procedure when conducting an actual study is not feasible. 

Given the changes in quantitative practices presented above, we thought a TIP article describing effect size reporting would be useful. The purpose of this paper is to provide a snapshot of effect size reporting in applied psychology and other subdisciplines to determine whether effect sizes are reported and how applied journals compare to journals from other psychological subdisciplines. We also thought it would be useful to identify potential cases where effect sizes are commonly omitted. 

Statistical Significance Tests

Although NHST has been the dominant quantitative paradigm in psychology, researchers have long acknowledged its shortcomings. NHST has typically been criticized because (a) the null hypothesis is literally never true and (b) it violates a cardinal quantitative principle by arbitrarily dichotomizing a continuous variable (Cohen, 1994; Kirk, 1996). Drawing on these criticisms, the APA taskforce on statistical inference and quantitative scholars have concluded that NHST is not an informative method of answering many psychological research questions and recommended that psychologists consider reporting effect sizes (Wilkinson, 1999).

Despite this push toward effect size reporting, academics and practitioners alike have been somewhat reluctant to give up NHST, as evidence by the still common rejection of research that does not achieve the criterion of p less than .05 (Cohen, 1990; Vacha-Haase, 2001) and a lack of effect size reporting in the psychological literature (Kirk, 1996). In a review of a subset of the psychological literature, including the Journal of Applied Psychology (JAP), the Journal of Educational Psychology (JEP), the Journal of Personality and Social Psychology (JPSP), and the Journal of Experimental Psychology, Learning, and Memory (JEPLM), Kirk (1996) demonstrated that effect size reporting was inadequate. Kirk noted that more studies from applied psychology (77%) reported at least one effect size as compared to educational (55%), social psychological (47%), and experimental (12%) journals. 

Kirk reasoned that applied psychologists were more likely to report effect sizes because they typically utilize survey data and conduct correlation analyses. Kirk concluded that differences in effect size reporting by subdiscipline were a function of methodological and data-analytic norms rather than superior quantitative practices in applied psychology. Regardless of interpretation, Kirks article demonstrated the continued reliance on NHST in all four subdisciplines of psychology in the mid-1990s and concluded that best practices were not being followed by the field.

No studies have followed up on the Kirk article. This is a meaningful omission given the increased scrutiny that quantitative practices have received since 1996. For example, a new set of guidelines for statistical methods in psychology journals was reported in the American Psychologist (Wilkinson, 1999) and stated: Always present effect sizes for primary outcomes (p. 599). The APA Publication Manual has also changed significantly since the Kirk article. Although the 1994 edition of the APA Publication Manual included an encouragement (p. 18) to report effect sizes (American Psychological Association, 1994), the newest APA Publication Manual (American Psychological Association, 2001) now lists a failure to report effect sizes as a defect in the reporting of research:

No approach to probability value directly reflects the magnitude of an effect or the strength of a relation. For the reader to fully understand the importance of your findings, it is almost always necessary to include some index of effect size or strength of relation in your Results section. (p. 25)

Likewise, specific journal standards have also changed since Kirks article. For example, JAP instructs authors to:

indicate in the results section of the manuscript the complete outcome of statistical tests including significance levels, some index of effect size or strength of relation, and confidence intervals (Zedeck, 2003, p. 4).

Personnel Psychology (PPSYCH) also published an article review checklist (Campion, 1993) that discusses effect sizes and directs reviewers to evaluate whether research includes effect sizes (p. 13).

Given these changes in the requirements of quantitative practices, it is important to reassess current effect size reporting trends. The purpose of this study is to replicate and expand Kirks review by surveying quantitative practices and effect size reporting. It is expected that effect size reporting (20022003) has improved since the Kirks review due to formal changes to the APA publication manual and specific journal requirements. We also investigated effect size reporting across journals from different subdisciplines to determine how applied psychology journals compared to journals from other subdisciplines. 

Methodology

We examined all empirical articles from JAP, PPSYCH, JPSP, JEP, and JEPLMC1 in 2002 and 2003. Four I-O graduate students were trained to code articles for research methodology, data analytic method, types of effect size reported, and effect size omissions. Note that coding effect size omissions is somewhat subjective. Thus, Kirks (1996) list of effect sizes was used as a framework for identifying effect size omission. Of the 921 studies coded, 736 were included in this study. One hundred and eighty-five studies were excluded because they used analyses that produce effect sizes as primary outcomes without inferential analyses (e.g., meta-analysis, classical test theory, generalizability theory, Bayesian methods, and item response theory).

1 JEPLM has recently become the Journal of Experimental Psychology, Learning, Memory and Cognition (JEPLMC).

Results

Table 1 displays percentages of article research methodology across journals. Overall, the majority of studies were experimental in nature (47.5% random assignment, 6.3% quasi-experimental); most other studies employed a survey design (35.1%). As expected, the percentage of research designs varied across journals: Applied journals published more survey designs and fewer experiments. For example, JAP published the lowest percentage of experimental studies (22.5% true experiments, 6.5% quasi experiments) and JEPLMC published the highest (87.2% true experiments, 4.9% quasi-experiments). JAP also published the most survey designs (50.5%), and JEPLMC published the fewest (.5%).



Table 2 shows data analytic methods across journals. Overall, univariate analyses were the predominant analysis, represented by regression (20.0%), nonregression univariate methods like ANOVA and t-tests (48.4%), and univariate combinations (10.0%). Structural equation models were the most frequent multivariate analysis across journals (10.2%). 

Table 3 displays effect size reporting across journal. Overall, 62.5% of all articles reported effect sizes. As expected, effect size reporting varied by journal. JAP published the most studies reporting effect sizes (94.0%), and JEPLMC had the lowest (17.2%). PPSYCH published the second highest percentage of articles reporting effect sizes (86.7%). 



Effect size omissions are presented in Table 4. We used a stringent operationalization of omission and considered cases where effect sizes could be computed by hand from other reported statistics to be omissions. Hence, if an article included means and standard deviations for two groups and reported a t statistic without a d statistic, we considered this an omission. Although a d statistic is simple to compute by hand, this additional burden may be too much to ask of the audience. 



Univariate analyses testing mean differences had the greatest number of omitted effect sizes. For example, 240 ANOVA and 74 other mean difference omissions were identified. In these cases variance accounted for statistics for the overall model like h, partial h, and h2 were typically omitted. Likewise, mean difference analyses were often not accompanied by d, odds ratios, and other indices of effect. Common omissions related to regression analyses included no correlations among multiple predictor variables and omissions of either model statistics like multiple R and R2 or predictor-level coefficients like b or b

We computed odds ratios to describe effect size reporting differences across journals (Table 5). A positive odds ratio indicates that studies from the journal listed first in the contrast were more likely to report effect sizes. JAP and PPSYCH reported more effect sizes than journals from other subdisciplines. For example, JAP articles were approximately five times more likely to report effect sizes and PPSYCH articles were approximately two times more likely to report effect sizes than articles from other subdisciplines. Note that JEPLMC was much less likely to report effect sizes than were JAP and PPSYCH. In addition, there was no meaningful difference in effect size reporting between JAP and PPSYCH. Odds ratios were also computed for articles using regression and nonregression analyses. Articles employing regression techniques were 23 times more likely to report effect sizes than articles employing nonregression univariate methods and non-SEM multivariate methods. 

Conclusion

It appears that effect size reporting has improved during the last decade. All four journals originally examined by Kirk in 1996 have a higher percentage of studies reporting effect sizes for the years 20022003. For example, JAP improved from 77.0% to 94.0%, JEP from 55.0% to 72.3%, JEPLMC from 12.0% to 17.2%, and JPSP from 47.0% to 72.3%. It appears that APA initiatives have positively influenced effect size reporting. 

However, our results indicated that three of the five journals published at least one quarter of their articles without effect sizes. This lack of quantitative information may create an impediment for interpreting results and meaningful meta-analytic work and may also leave future studies without a benchmark for a priori power analyses. Thus, journals can improve their quantitative practices. 

For example, journals should formally describe quantitative practices in their submission guidelines, require authors to consult APA sources, and require reviewers to take effect size reporting seriously. Another potential avenue for influencing quantitative practices is within the graduate curriculum. Although it may be difficult to change quantitative coursework, building effect size work into graduate statistics may ensure that the next generation of journal authors is ready and able to report effect sizes. For example, coding exercises similar to those conducted in this study would expose graduate students to various effect sizes and their appropriate use. 

Results showed that more effect sizes are published in applied journals. It is difficult to determine whether superior quantitative practices are in place for applied journals or if this can be attributed to automated effect sizes produced by correlation analyses. However, some articles in applied journals used ANOVA/MANOVA frameworks, and most included relevant effect sizes. We think that the high rate of applied journal effect size reporting is a combination of both factors. 

In conclusion, effect size reporting has improved since the work of Kirk in 1996, and applied psychology journals continue to report effect sizes more often than journals from other psychological subdisciplines. Given the importance of meta-analytic work and practical significance in psychological research, quantitative practices should provide the journal reader with as much pertinent information about the data of interest as possible. Effect sizes allow academics and practitioners alike to go beyond probabilistic criteria in answering our true question of interest: What do my data really mean? 

References

     American Psychological Association (1994). Publication manual of the American Psychological Association (4th edition). Washington D.C.: Author. 
     American Psychological Association (2001). Publication manual of the American Psychological Association (5th edition). Washington D.C.: Author.
     Campion, M. (1993). Article review checklist: A criterion checklist for reviewing research articles in applied psychology. Personnel Psychology, 46, 114. 
     Cohen, J. (1990). Things I have learned (so far). American Psychologist, 45, 1304-1212. 
     Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49, 9971003. 
     Falk, R., & Greenbaum, C. W. (1995). Significance tests die hard: the amazing persistence of a probabilistic misconception. Theory and Psychology, 5, 7598. 
     Kirk, R. (1996). Practical Significance: A concept whose time has come. Educational and Psychological Measurement, 56, 746759. 
     Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. Journal of Consulting and Clinical Psychology, 46, 806834. 
     Society for Industrial and Organizational Psychology. (2003). Principles for the Validation and Use of Personnel Selection Procedures (4th Edition). Bowling Green, OH: Author. 
     Vacha-Haase, T. (2001). Statistical significance should not be considered one of life's guarantees: Effect sizes are needed. Educational and Psychological Measurement, 61, 219224. 
     Wilkinson, L., and the Task Force on Statistical Inference (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54, 594604. 
     Zedeck, S. (2003). Instructions for authors. Journal of Applied Psychology, 88, 35. 


April 2006 Table of Contents | TIP Home | SIOP Home