Featured Articles
Jenny Baker
/ Categories: 593

Opening Up: Sharing Is Caring (About the Science): Tips for Requesting and Providing Data in Support of Open Science

Lillian T. Eby, University of Georgia; Frederick L. Oswald, Rice University; and Tammy D. Allen University of South Florida

“My name is [redacted] and I am a team member of the project ‘How Biased Is the Literature on Psychological Science?’ led by…”

“I am contacting you because your 2016 article, ‘XXXXXX,’ was randomly selected for inclusion in the program. A research claim from your paper will be provided to subject experts and teams working on prediction algorithms to assess the claim’s likelihood of being reproducible or replicable. Can you review below the claim and associated inferential test selected from your paper and let us know whether revisions are required?”

You may have received email data and reanalysis requests that look like this, sent by open science researchers who are seeking to understand the transparency, reliability, and replicability of our research. Similarly, I-O psychology researchers may make similar data requests for the purpose of reanalysis and understanding. How can requesters better ensure their data requests are appropriate and nonaggressive? Likewise, how might authors and journals better ensure they honor data requests in a manner that is appropriate and nondefensive?  Overall, how can we better navigate data requests where the requested data can only be partially shared or cannot be shared? We will address these questions in what follows.

To provide some broader context: Best practices for supporting open science are emerging, and numerous professional associations, granting agencies, and journals are encouraging open science practices. For example, the American Psychological Association (APA) created an Open Science and Methodology Committee to identify ways that the association can promote open science in APA journals and identify practices for specific subfields in psychology. As part of this initiative, APA is now an organizational signatory of the TOP (Transparency and Openness Promotion) Guidelines (https://www.cos.io/initiatives/top-guidelines) and requires a minimum of TOP Level 1 (disclosure) for each of the eight TOP domains across its core titles. Taking these requirements closer to home, starting November 1, 2021, the Journal of Applied Psychology is requiring authors to follow TOP guidelines when submitting to the journal (https://www.apa.org/pubs/journals/apl).

As with any major new initiative, “the devil is in the details,” and as noted above, one aspect of open science practices that can be especially thorny is data sharing. Although it is only one of the many ways to increase transparency, reproducibility, and replicability in our science, data sharing is fraught with many challenges and conundrums. In this column, we discuss issues related to data sharing to help requesters and authors navigate this open science practice successfully. As we all know, data are the currency of academic productivity and success, and collecting data often requires investigators to make a considerable investment (e.g., time, labor, money). These factors, coupled with several high-profile cases involving replication failure (e.g., Schimmack, 2020), suggest that the open science practice of data sharing in particular may give authors pause. Moreover, as the opening quotes indicate, data-sharing requests can often seem to assume wrongdoing or an impending investigation, reflecting an unnecessary tone that can be off putting and threatening to the receivers. Below we offer “food for thought” related to data sharing, from the perspective of both requesters and authors. At the heart of our discussion is the need for collegiality, respect, and good faith.

Everyone Be Collegial

In our collective opinion, one of the many strengths of the profession of I-O psychology is that we are generally a collegial group. Whether requesting data or providing data, being respectful (if not downright nice) is of utmost importance. Unfortunately, as both our experience and the opening quotes illustrate, this is not always the case. In our editorial roles, we have seen requests for data that contain any or all of the following components: (a) demanding data sharing, without discussing the purpose of such sharing, or appropriate or necessary limitations (e.g., timeline for provision and analysis); (b) threatening to take various professional or legal forms of action if data are not shared, without consulting relevant parties, policies, or processes involved; and (c) shaming authors on social media if they do not share data, not providing recipients with context nor any opportunity for dialogue (e.g., subtweeting, where the target is known and shamed, but not named).

To be clear, data requests are an essential part of the scientific conversation and are entirely appropriate to make. Unfortunately, legitimate data requests can be exceedingly slow or may not be honored when they should be. Sometimes these issues reside within authors who are resistant or who claim to have lost their data. Other times these problems might reside within the journal and editorial systems. One of the authors of the current article (Oswald) himself had been slowed by a request for data and reanalysis (it was blogged about here: https://statmodeling.stat.columbia.edu/2011/12/13/data-sharing-update/). But just as clear as it is that the data-requesting process needs to be improved, equally clear is that the behaviors we outline above—demands, threats, and shaming—are entirely unacceptable. These behaviors are especially concerning when junior authors end up feeling powerless in receiving a data request (e.g., when working with a senior faculty member who has more influence over the situation). We say this while understanding fully that data requesters may feel frustrated when they are, in fact, legitimately frustrated in their efforts. Open science hopes to assist authors and journals alike to further the appropriateness of the process, content, and outcomes of data requests. We are not there yet.

It is also important to remember that not all data are shareable. To provide just a few examples: Organizational data can be proprietary, institutional review board (IRB) data restrictions may be in place to protect subjects, employee health data might be protected under HIPAA, or a combination of data (demographics, education, employment) might lead to deanonymizing employees. Many professional associations have developed ethical and publication guidelines regarding data sharing. They need to be considered more carefully on all sides: those of authors, editors, and requesters. The recently revised APA Publication Manual (2020), in Section 1.14 (Data Retention and Sharing), states the following regarding data sharing after publication (emphasis added):

Authors must make their data available after publication, subject to conditions and exceptions, within the period of retention specified by their institution, journal, funder, or other supporting organization. This permits other competent professionals to confirm the reported analyses using the data on which the authors’ conclusions are based or to test alternative analyses that address the article’s hypotheses.

Moreover, Section 8.14 of the APA Ethics Code (2017) states the following (emphases added):

(a) After research results are published, psychologists do not withhold the data on which their conclusions are based from other competent professionals who seek to verify the substantive claims through reanalysis and who intend to use such data only for that purpose, provided that the confidentiality of the participants can be protected and unless legal rights concerning proprietary data preclude their release. This does not preclude psychologists from requiring that such individuals or groups be responsible for costs associated with the provision of such information.

(b) Psychologists who request data from other psychologists to verify the substantive claims through reanalysis may use shared data only for the declared purpose. Requesting psychologists obtain prior written agreement for all other uses of the data.

Importantly, these guidelines are general and should therefore be used to inform and improve both policies and particular data request cases in an appropriate professional manner. As stated, the Ethics Code of APA “prohibits authors from withholding data…in most circumstances” (emphasis added; p. 14, APA, 2017), yet we cannot forget that a variety of circumstances in organizational research may legitimately make full or even partial data sharing impossible. That said, organizations and authors can both work more carefully to provide aggregated forms of data that allow for replication and reanalysis (e.g., authors could provide variance–covariance matrices and observed means that are then provided to requesters for use in CFA/SEM analyses).

Authors are also responsible for demonstrating collegiality regarding data-sharing requests. This includes the courtesy of acknowledging receipt of requests for data in a timely manner (e.g., within 7–10 business days), also informing the requester of additional information that may be needed to determine whether data can be shared. Often such decisions do not solely rest with the corresponding author. For example, data may be jointly owned by multiple authors (necessitating each author’s approval to share). One of us (Allen) was involved in a multiple-PI study in which there was a fundamental disagreement—with some approving sharing the data requested whereas others disapproved—ultimately necessitating the denial of the request. In other cases, some or all of the data may be owned by organizations, which may have applied in-house restrictions and legal restrictions to allow the research to be conducted in the first place. Alternatively, a lead author’s institution may not allow data sharing per IRB requirements (or without review and approval of additional documentation from the requester regarding how the data will be used and over what time period).

Everyone Be Clear

Requesters should be clear and specific regarding data requests. Even if the initial request may have originated with an in-person conversation or through informal emails, the actual request by the author should be in writing, unambiguously stating the rationale and nature of the request, with a way to verify that the message was received (e.g., over email with acknowledgement). The purpose may involve attempts to replicate the study findings, or although it is less common, perhaps the raw data might usefully contribute to a new study. As an example of the latter, in a data-sharing request received by one of us (Eby), the requester explained how the data would be added to other primary datasets to create a larger dataset to explore a phenomenon of interest. In other words, the requester had done their homework, where the request presented clear rationale as to how the data would be used and how credit would be provided if the data were shared.

When honoring a data-sharing request, the author should also be clear regarding any terms and conditions of data use. In the aforementioned example, the author asked for clarification regarding the use of shared data and informed the requester that if the data were provided, it would require crediting the granting agency in any subsequent publications (in this case, per NIH policy). The author also asked for written documentation from the requester that the data would be only used for the stated purpose. Through this process, trust was established, and there was greater clarity regarding the scope of data use for both parties.

Requesters Be Patient

It is important for requesters to understand that fulfilling data-sharing requests can require considerable author time and effort. Sharing data may require reducing data sets, converting files, anonymizing data, creating codebooks, and perhaps obtaining organizational and/or local IRB approval to share data. Authors have many competing responsibilities, and sharing data adds to this workload. That said, data-sharing requests can be less burdensome or unneeded when researchers use the open science framework (OSF) to share documented data and their summaries, as appropriate (e.g., variable codebooks, materials, data, and summary statistics). At the same time, it is incumbent upon the requester to make compliance with the request as easy as possible for the researcher. Moreover, there may be multiple authors involved in the original dataset who need to be contacted for assistance and/or approvals for sharing. We encourage requesters to consider these factors when making data-sharing requests and plan ahead to ensure sufficient time for requesters to prepare and provide data. Requests for immediate data sharing are not realistic or acceptable.

Authors Be Generous

From the perspective of the author, data-sharing requests can feel overwhelming to deal with or somehow threatening to the work that was conducted. Instead, we encourage authors to think about requests for data sharing as part of a conversation with the requester and as an opportunity to contribute to strengthen the work and otherwise further scientific advancement rather than merely a burden to be overcome. Also, authors might usefully and creatively consider the request as an opportunity for graduate students. For example, one of us (Eby) recently received a data-sharing request. The request involved culling down a large, complex longitudinal dataset to include only the variables of interest and create new codebooks for the requester. It also required working with the local IRB to secure approval to share the data per university guidelines. The author negotiated with the requester to cover the costs for a graduate student to fulfill the data-sharing request (and offered to oversee the process at no cost to the requester). Everyone benefitted. The author and research team promoted open science; the graduate student obtained first-hand experience with a data-sharing request and was introduced to the “nuts and bolts” of data sharing (and was paid); and the requester obtained the data.

Concluding Thoughts

In many ways, open science is a new frontier for industrial-organizational psychology. Although there are signs that both momentum and enthusiasm are growing for open science practices, in order to realize its benefits, it is important to continue the dialogue about how to do open science well in terms of both process and outcomes. We hope that this column begins to set in motion some conversations related to data sharing in clarifying the benefits of this practice for the advancement of our science.

References

American Psychological Association. (2017). Ethical principles of psychologists and code of conduct. https://www.apa.org/ethics/code

American Psychological Association. (2020). Publication manual of the American Psychological Association (7th ed.).

Schimmack, U. (2020). A meta-psychological perspective on the decade of replication failures in social psychology. Canadian Psychology/Psychologie canadienne, 61(4), 364–376. https://doi.org/10.1037/cap0000246

 

               

Print
1200 Rate this article:
5.0
Comments are only visible to subscribers.

Categories

Information on this website, including articles, white papers, and other resources, is provided by SIOP staff and members. We do not include third-party content on our website or in our publications, except in rare exceptions such as paid partnerships.