Jenny Baker
/ Categories: 573

Opening Up: A Primer on Open Science for Industrial-Organizational Psychologists

Christopher M. Castille, Nicholls State University


My purpose with this “Opening Up” article is to orient TIP readers – both practicing I-O psychologists and academics—to open science. Think of this particular article as your first class on open science. I will share an interesting story to give you some historical context on the open science movement, highlight facts that should make it abundantly clear why you are taking this course, and clarify key terms that you need to know. Subsequent articles will touch on the many ways we might make our collective work more replicable, reproducible, and credible by adopting open science practices. By the end of this course, my goal is that you leave more aware of the need for open science, but also possessing effective tactics for opening up our science. Crucially, I hope to leave you with more questions than answers.

A Brief History of the Replication Crisis

In 2011, a study was published in a top-tier journal—the Journal of Personality and Social Psychology (JPSP) —that fundamentally challenged the methodological foundations of psychological science writ large. This study, conducted by eminent Cornell University social psychologist Daryl Bem, presented nine studies involving over 1,000 participants on precognition (i.e., conscious awareness of future events) and premonition (i.e., affective apprehension regarding future events), eight of which provided statistically significant evidence that individuals held the ability to feel the future (Bem, 2011). The studies involved taking classical psychological phenomena—such as how erotic images cause arousal and training can affect recall—and time-reversing the effects (e.g., if someone received training in the future, that would facilitate performance in the present). Bem estimated that such psi effects were rather robust for psychological phenomena: a Cohen’s d of .22. Just imagine pitching psi to an executive via a utility analysis to garner support for a future training investment: “If we train employees in the future, they will perform better in the present because of psi.”

Bem’s work caught the attention of the media and the broader scientific community. He appeared on MSNBC claiming to provide strong evidence of psychic phenomena (MSNBC, 2011) as well as Comedy Central’s Colbert Report (see “Time-Traveling Porn”, Colbert, 2011). Quickly thereafter, his work was roundly criticized by the academic community. Psi had long been a debunked idea (see Alcock, 2011). It also raised serious questions about the quality of statistical thinking going on in psychology and whether we should abandon null hypothesis significance testing (see Wagenmakers, Wetzels, Borsboom, & van der Maas, 2011). Scholars called into question the quality of the peer review process in general and the values of the academic community (i.e., quantity over quality, publish or perish; see Gad-el-Hak, 2011) and raised the possibility that science itself was broken (Engber, 2017).

Though there were many attempted replications that failed to support Bem’s findings (see Galak, LeBoeuf, Nelson, & Simmons, 2012; Ritchie, Wiseman, & French, 2012), the debate over the substantive nature of Bem’s contributions is still unresolved. In fact, there is a large-scale collaborative effort to replicate Bem’s work going on in the present (Kekecs et al., 2019). Notably, one replication attempt (i.e., Ritchie et al.) was rejected by JPSP because the journal does not publish replications (see Aldhous, 2011). This reflects a deeper trend among publications in psychology. A review of psychological studies since the beginning of the 20th century using the term “replication” suggests that roughly 1 in 1,000 studies are replicated. Interestingly, replications are more likely to be “successful” when the seminal author co-wrote the replication (Makel, Plucker, & Hegarty, 2012). We can all agree that such lack of replication can impede scientific progress (Meehl, 1978).

Bem’s work prompted some soul searching among psychologists and a shift toward greater openness. If research on such an implausible hypothesis met the bar for methodological rigor necessary for admittance to a top-tier journal, what else might also meet (or have met) the bar and yet be similarly implausible (Chambers, 2017)? What followed were researchers trying and often failing to replicate both novel and classical findings in psychology. For instance, a large scale effort to replicate 100 studies in both social and cognitive psychology suggested that less than half of the published literature is replicable (Open Science Collaboration, 2015). Other replication attempts into topics that are quite relevant to I-O psychology also encountered replicability issues. I will highlight just two for now.

  • Ego depletion—the notion that self-control is a finite resource and can be exhausted—informs research on the regulation and control of behavior in organizational life (e.g., Rosen, Koopman, Gabriel, & Johnson, 2016). And though over 600 studies have corroborated the ego depletion hypothesis, mainstream social psychology is now questioning whether the effect is real (Inzlicht & Friese, 2019). A large-scale pre-registered replication attempt involving over 2,000 participants failed to support the foundational experimental paradigm (Hagger et al., 2016). Similar research areas involving delay of gratification (i.e., the infamous marshmallow experiments; see Shoda, Mischel, & Peake, 1990) have also been considered to lack replicability (see Hagger et al., 2016).
  • Stereotype threat—the notion that cues in an environment can confirm a negative stereotype about one’s social group, harming performance (Steele & Aronson, 1995)—informs research on helping women and minorities succeed (Kinias & Sim, 2016; Nguyen & Ryan, 2008). However, a large-scale replication involving over 2,000 participants did not find any evidence supporting the stereotype threat effect (see Flore, Mulder, & Wicherts, 2018). Indeed, a recent meta-analysis of this literature suggests that, if there is a stereotype effect in practice, the effect is small to trivial (Shewach, Sackett, & Quint, 2019). However publication bias in the academic literature, which has long been known as a flaw with the scientific record (for a review, see Kepes & McDaniel, 2013), suggests that the effect size has been inflated.

To be fair toward scholars working in these areas, I should point out that the above sources of evidence are not without their own criticism. For instance, the Open Science Collaboration was criticized as not representatively sampling psychological studies (see Gilbert, King, Pettigrew, & Wilson, 2016). And though we do not know how credible any body of scientific evidence is (Blastland, 2019), others are interested in answering this question. For instance, the Department of Defense is building an artificial intelligence that can help identify which studies are replicable or reproducible (Resnick, 2019). Darpa has promised $7.6 million to the Center for Open Science, which will create a database of 30,000 claims from the social science and, for 3,000 of those claims, attempt to replicate them. Importantly, they will ask experts to bet on whether a claim would replicate (Center for Open Science, 2019). Such prediction markets can, indeed, predict which effects would replicate. A study of 21 experimental studies published in Nature and Science between 2010 and 2015—whereby ~62% of effects were replicated—found a strong correlation linking scientists beliefs regarding replicability and actual replicability (Spearman correlation coefficient: 0.842; Camerer et al., 2018). Additionally, such expert judgment could be built into an artificial intelligence that could then scour the literature and score work for credibility.  

Now that you have some historical context on the open science movement as well as some understanding of present events, let’s turn to key terms and ideas.

Key Terms and Ideas

Open science. This term refers to an umbrella of practices intended to promote openness, integrity, reproducibility, and replicability in research (Banks et al., 2018; Nosek et al., 2015). These practices are broad and include practices such as make peer reviews open and accessible via PubPeer (, preregistering a study (via, to simply sharing reference libraries (e.g., via Zotero, see, but includes many more see (Kramer & Bosman, 2018). The Center for Open Science has emerged as a part of a broader movement to make scientific disciplines more transparent, open, and reproducible—so I strongly recommend that you visit this resource if you have not already done so.

Reproducibility. The American Statistical Association (ASA) distinguished between reproducibility and replicability (see Broman et al., 2017). A study is reproducible if you can take the original data and code to reproduce the numerical findings from a study. Although this may sound trivial, in practice this standard may not be met. For instance, in a sample of 88 strategic management studies published in the top-tier Strategic Management Journal, about 70% did not disclose enough detail to permit independent tests of reproducibility and of those that did, almost one-third of supported hypotheses were not corroborated (Bergh, Sharp, Aguinis, & Li, 2017). Closer to home, research into the reproducibility of psychology has revealed that statistical reporting errors in published top-tier psychology journals are quite prevalent. An examination of publications from 1985–2013 suggests that roughly half of all published studies contain at least one statistical reporting error—and this includes about one-third of articles from the Journal of Applied Psychology (Nuijten, Hartgerink, van Assen, Epskamp, & Wicherts, 2016).

Replicability. Conversely, the ASA defined replicability as the extent to which an entire study could be repeated independently of the original investigator without use of the original data and reaching the same conclusions. Again, while this may sound trivial, it too is in a standard that may not be met in practice. Early reports on the replicability of 50 highly influential studies in the domain of cancer biology dropped to under 20 studies due to insufficient detail in the method sections (Kwon, 2018).

Preregistration. This is the act of specifying hypotheses and methods for testing them in advance of gathering data. A key aim of preregistration is to create a clear separation between hypothesis generation (which occurs under exploratory conditions, a.k.a. postdiction) and hypothesis testing (which occurs under confirmatory conditions, a.k.a. prediction; see for more details).

Simply put, preregistration can allow results to be transparently reported. In practice, this does not seem to occur—due to a strong publish-or-perish culture in academia—and can lead to literature biased by positive results. For instance, in a review of papers from hard sciences (e.g., space science) to soft sciences (e.g., business, economics, psychology), Fanelli (2010) found that though journals from all sciences favor positive results, psychology prefers to publish positive/confirmatory results: 91–97% of articles refute the null or confirm some alternative hypothesis. Fanelli also showed how positive results are more likely to be reported in softer sciences described by weaker paradigms and higher rejection rates, such as psychology, business, and economics. However, preregistration can cut down on such bias reporting. One early study suggests that preregistered studies only report positive results 43% of the time (Schijen, Scheel, Anne, & Lakens, 2019).

To be fair, preregistration is not a panacea. Indeed, a study of preregistrations that eventually were published in the journal Psychological Science revealed that researchers often deviated from preregistered protocols and do not report the reason for deviating from protocols (Adam, 2019). Now, deviations can occur for a number of reasons that are not so questionable (e.g., forgetting, situations changing, etc.). Further, it has been argued recently that preregistration will not save us if we cannot adequately map the theories we are testing onto the statistical models that we are utilizing (Szollosi et al., 2019).

Still, preregistration is a valuable tool in the scientists’ toolkit because it involves an opportunity to plan and then reveal what was planned and not planned. In other words, it is less a policing effort and more a mechanism of encouraging transparency. It helps to shed light on assumptions and decisions used by researchers in their work, which may explain why certain effects are observed (while others are not observed). For instance, consider preregistration within the context of meta-analysis, where so many decisions can influence an estimated effect size. A meta-analysis of double-blind randomized controlled trials offered 9,216 ways to compute an effect size depending on the decision rules that applied. While the largest density of results hovered just under the p = .05 threshold for establishing statistical significance, the effect sizes ranged from -.38 to .18 (see Palpacuer et al., 2019).

Registered reports. Journals can encourage open science by having a registered report submission track. Registered reports involve two stages whereby reviewers evaluate the theory and methods of a study (ideally) prior to data collection, offering a conditional acceptance to designs that answer meaningful questions with rigor. In I-O psychology, journals adopting this format of submission are the Journal of Business and Psychology, the Journal of Occupational and Organizational Psychology, the International Journal of Selection and Assessment, and the Journal of Personnel Psychology. A full list of journals accepting registered reports is available at the Center for Open Science (

Early reports suggest that preregistration may help strengthen a paradigm when journals adopt a registered report submission process. For instance, the journal Cortex observed that when authors submit their work as a registered reports, only 10% of submissions were rejected. This is in stark contrast to the primary/conventional submission track, whereby 90% of submissions are rejected (see Chambers, 2019).

A Call to Open Up Industrial and Organizational Psychology

We are at our best when our work informs evidence-based practice by challenging beliefs about the way the world works. Adopting open science furthers evidence-based practice (Banks et al., 2018). As pointed out by Banks et al.: “Evidence-based management stands to benefit from these practices as practitioners will gain increased access to scientific content, which in turn could ultimately reduce the science-practice gap (Banks & McDaniel, 2011; Schmidt & Oh, 2016).”         

As industrial and organizational psychologists, we use the scientific method while applying the principles of psychology to enhance organizational performance. Just as medical researchers have produced insights into how suffering might be reduced, our collective body of evidence provides insights into how individual, team, and organizational performance might be enhanced for the benefit of society. Indeed, by comparison to the medical sciences, our interventions are often more effective, though the public may not see them as such (see Erez & Grant, 2014).

By contrast, using the scientific method is part of what sets our science apart from the pseudosciences (e.g., psi, the anti-vax movement, homeopathy), but it is by far not the only thing. As Richard Feynman (1974) put it in his infamous speech on cargo cult science, even pseudosciences can look like science. They may “follow all the apparent precepts and forms of scientific investigation, but they are missing something essential.” He was referring to a sort of “bending over backwards” to specify the constraints on a claim and a lingering feeling that we are deeply mistaken. Such is needed for a field to build a cumulative character of knowledge (Meehl, 1978). Open science practices will be the kind of “bending over backwards” that is needed for us to attain such a cumulative character.

Certain norms are also what set us apart from the pseudosciences. According to sociologist Robert K. Merton, these norms compel scientists within a community to collectively share data openly, evaluate contributions on their own merit, approach knowledge claims in a disinterested manner, and consider all relevant evidence—even contrary evidence—for a claim (see Merton, 1973). Merton also suggested that communities that deviate from these norms are not sciences, but pseudosciences that see nature not as it is revealed via rigorous methodology, but as they would like to see it.

I would like you to entertain a set of provocative questions: Is industrial and organizational psychology—and for that matter, related disciplines (e.g., management, organizational behavior, etc.)—more like science or pseudoscience? While this may be an uncomfortable comparison, it is important to be consistently avoiding pseudoscientific practices while maintaining and improving good scientific practices. Open science does exactly this, helping I-O psychology grow its accountability and reputation as a science-based field. Accusations of unreliability or being a pseudoscience have long been levied at psychology (see Landis & Cortina, 2015; Popper, 1962). Are we a soft science as the public—including funding agencies—perceive us to be (Landis & Cortina)? Or is our science just harder than the hard sciences (Diamond, 1987), and so we must be thoughtful in informing others about who we are and what we can do? How you think of these questions matters because it relates to the need for open science.

Despite what you may think after my admittedly brief and cursory review: science is not broken. I hope that it does suggest that we have room for improvement. We all probably want more robust evidence precisely so that our decisions and actions can be more reliable. As such, I ask that you consider the possibility that the rules of the game we call science be challenged (Chambers, Feredoes, Muthukumaraswamy, & Etchells, 2014). If in this challenge we find that that there is room to grow, this is a sign of strength because it demonstrates that our discipline is truly a science.

Next Time on Opening Up...

We will answer a key question: how credible is the I-O psych literature (and related literatures)? As a preparatory reading, take a look at Kepes and McDaniel (2013). If you want a deeper dive, consider Management Studies in Crisis (Tourish, 2019) and The Seven Deadly Sins of Psychology (Chambers, 2017). I will also discuss more key terms (e.g., questionable research practices, QRPs) and other high-profile cases of irreproducibility that are closer to home. Additionally, consider resources made available by the Consortium for the Advancement of Research Methods and Analysis (CARMA), which is providing unlimited free access to open science resources produced by scholars in the organizational sciences ( There, you will find an interview that Mike Morrison and I conducted with Dr. Larry Williams, who is the director of CARMA, and Dr. George Banks, who is one of the leading figures in open science. We talk about QRPs, scientific reform, and ask for tips for opening up our science.



 Adam, D. (2019, May 23). A solution to psychology’s reproducibility problem just failed its first test. Science. Retrieved November 29, 2019 from

Alcock, J. (2011, January 6). Back from the future: Parapsychology and the Bem Affair. Skeptical Inquirer. Retrieved from

Aldhous, P. (2011, May 5). Journal rejects studies contradicting precognition. New Scientist. Retrieved from

Banks, G. C., Field, J. G., Oswald, F. L., O’Boyle, E. H., Landis, R. S., Rupp, D. E., & Rogelberg, S. G. (2018). Answers to 18 questions about open science practices. Journal of Business and Psychology.

Bem, D. J. (2011). Feeling the future: Experimental evidence for anomalous retroactive influences on cognition and affect. Journal of Personality and Social Psychology, 100(3), 407–425.

Bergh, D. D., Sharp, B. M., Aguinis, H., & Li, M. (2017). Is there a credibility crisis in strategic management research? Evidence on the reproducibility of study findings. Strategic Organization, 15(3), 423–436.

Blastland, M. (2019). The hidden half: How the world conceals its secrets. London, England: Atlantic Books.

Broman, K., Cetinkaya-Rundel, M., Nussbaum, A., Paciorek, C., Peng, R., Turek, D., & Wickham, H. (2017). Recommendations to funding agencies for supporting reproducible research. American Statistical Association.

Camerer, C. F., Dreber, A., Holzmeister, F., Ho, T.-H., Huber, J., Johannesson, M., … Wu, H. (2018). Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015. Nature Human Behaviour.

Center for Open Science. (2019, February 5). Can machines determine the credibility of research claims? Retrieved December 2, 2019, from

Chambers, C. (2017). The seven deadly sins of psychology: A manifesto for reforming the culture of scientific practice. Princeton, NJ: Princeton University Press.

Chambers, C. (2019). Registered reports: Concept and application PhD workshop. Retrieved from

Chambers, C., Feredoes, E., Muthukumaraswamy, S., & Etchells, P. (2014). Instead of “playing the game” it is time to change the rules: Registered reports at AIMS Neuroscience and beyond. AIMS Neuroscience, 1(1), 4–17.

Colbert, S. (2011, January 27). Time-traveling porn—Daryl Bem. The Colbert Report [Video Clip]. Retrieved November 28, 2019 from

Diamond, J. (1987). Soft sciences are often harder than hard sciences. Discover. Retrieved from

Engber, D. (2017, May 17). Daryl Bem proved ESP is real, which means science is broken. Slate. Retrieved November 28, 2019, from

Erez, A., & Grant, A. M. (2014). Separating data from intuition: Bringing evidence into the management classroom. Academy of Management Learning & Education, 13(1), 104–119.

Fanelli, D. (2010). “Positive” results increase down the hierarchy of the sciences. PLOS ONE, 5(4), e10068.

Feynman, R. (1974). Cargo cult science. Caltech.

Flore, P. C., Mulder, J., & Wicherts, J. M. (2018). The influence of gender stereotype threat on mathematics test scores of Dutch high school students: A registered report. Comprehensive Results in Social Psychology, 3(2), 140–174.

Gad-el-Hak, M. (2011, January 7). When peer review falters. The New York Times. Retrieved from

Galak, J., LeBoeuf, R. A., Nelson, L. D., & Simmons, J. P. (2012). Correcting the past: Failures to replicate psi. Journal of Personality and Social Psychology, 103(6), 933–948.

Gilbert, D. T., King, G., Pettigrew, S., & Wilson, T. D. (2016). Comment on “estimating the reproducibility of psychological science.” Science, 351(6277), 1037–1037.

Hagger, M. S., Chatzisarantis, N. L. D., Alberts, H., Anggono, C. O., Batailler, C., Birt, A. R., … Zwienenberg, M. (2016). A multilab preregistered replication of the ego-depletion effect. Perspectives on Psychological Science, 11(4), 546–573.

Inzlicht, M., & Friese, M. (2019). The past, present, and future of ego depletion. Social Psychology, 50(5–6), 370–378.

Kekecs, Z., Aczel, B., Palfi, B., Szaszi, B., Szecsi, P., Kovacs, M.,  … Liu, H. (2019). Raising the value of research studies in psychological science by increasing the credibility of research reports: The Transparent Psi Project. PksyArXiv Preprints, 44.

Kepes, S., & McDaniel, M. A. (2013). How trustworthy is the scientific literature in industrial and organizational psychology? Industrial and Organizational Psychology, 6(03), 252–268.

Kinias, Z., & Sim, J. (2016). Facilitating women’s success in business: Interrupting the process of stereotype threat through affirmation of personal values. Journal of Applied Psychology, 101(11), 1585–1597.

Kramer, B., & Bosman, J. (2018). Rainbow of open science practices.

Kwon, D. (2018, August 1). Effort to reproduce cancer studies scales down to 18 papers. The Scientist. Retrieved August 12, 2019, from

Landis, R. S., & Cortina, J. M. (2015). Is ours a hard science (and do we care)? In C. E. Lance & R. J. Vandenberg (Eds.), More statistical and methodological myths and urban legends (pp. 9–35). New York, NY: Routledge.

Makel, M. C., Plucker, J. A., & Hegarty, B. (2012). Replications in psychology research: How often do they really occur? Perspectives on Psychological Science, 7(6), 537–542.

Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. Journal of Consulting and Clinical Psychology, 46, 806–834.

Merton, R. (1973). The sociology of science: Theoretical and empirical investigations. Chicago, IL: University of Chicago Press.

MSNBC. (2011). Professor: Strong evidence ESP is real. [Video]. Retrieved from

Nguyen, H.-H. D., & Ryan, A. M. (2008). Does stereotype threat affect test performance of minorities and women? A meta-analysis of experimental evidence. Journal of Applied Psychology, 93(6), 1314–1334.

Nosek, B. A., Alter, G., Banks, G. C., Borsboom, D., Bowman, S. D., Breckler, S. J., … Yarkoni, T. (2015). Promoting an open research culture. Science, 348(6242), 1422–1425.

Nuijten, M. B., Hartgerink, C. H. J., van Assen, M. A. L. M., Epskamp, S., & Wicherts, J. M. (2016). The prevalence of statistical reporting errors in psychology (1985–2013). Behavior Research Methods, 48(4), 1205–1226.

Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716–aac4716.

Palpacuer, C., Hammas, K., Duprez, R., Laviolle, B., Ioannidis, J. P. A., & Naudet, F. (2019). Vibration of effects from diverse inclusion/exclusion criteria and analytical choices: 9216 different ways to perform an indirect comparison meta-analysis. BMC Medicine, 17(1), 174.

Popper, K. (1962). Conjectures and refutations. New York, NY: Basic Books.

Resnick, B. (2019, February 25). The military wants to build a bullshit detector for social science studies. Vox. Retrieved March 20, 2019, from

Ritchie, S. J., Wiseman, R., & French, C. C. (2012). Failing the future: Three unsuccessful attempts to replicate Bem’s “retroactive facilitation of recall” effect. PLOS ONE, 7(3), e33423.

Rosen, C. C., Koopman, J., Gabriel, A. S., & Johnson, R. E. (2016). Who strikes back? A daily investigation of when and why incivility begets incivility. Journal of Applied Psychology, 101(11), 1620–1634.

Schijen, M., Scheel, Anne, & Lakens, D. (2019). Positive result rates in psychology: Registered reports compared to the conventional literature.

Shewach, O. R., Sackett, P. R., & Quint, S. (2019). Stereotype threat effects in settings with features likely versus unlikely in operational test settings: A meta-analysis. Journal of Applied Psychology, 104(12), 1514–1534.

Shoda, Y., Mischel, W., & Peake, P. K. (1990). Predicting adolescent cognitive and self-regulatory competencies from preschool delay of gratification: Identifying diagnostic conditions. Developmental Psychology, 26(6), 978–986.

Steele, C. M., & Aronson, J. (1995). Stereotype threat and the intellectual test performance of African Americans. Journal of Personality and Social Psychology, 69(5), 797–811.

Szollosi, A., Kellen, D., Navarro, D., Shiffrin, R., van Rooij, I., Van Zandt, T., & Donkin, C. (2019). Is preregistration worthwhile? [Preprint].

Tourish, D. (2019). Management studies in crisis: Fraud, deception and meaningless research. Cambridge, UK: Cambridge University Press.

Wagenmakers, E.-J., Wetzels, R., Borsboom, D., & van der Maas, H. L. J. (2011). Why psychologists must change the way they analyze their data: The case of psi: Comment on Bem (2011). Journal of Personality and Social Psychology, 100(3), 426–432.


821 Rate this article:
Comments are only visible to subscribers.

Theme picker