>

mainheader

Matthew Haynes

Progress and Opportunities With Big Data in I-O

In another installment in our celebration of #SmarterWorkplace Awareness Month, Dr. Scott Tonidandel answers some questions about big data and how he sees it through the lens of I-O psychology.

Scott Tonidandel is a professor of Management in the Belk College of Business at the University of North Carolina – Charlotte. Scott’s research interests include issues related to leadership effectiveness, the impact of diversity in organizations, and research methods and statistics. He co-edited the SIOP Frontiers series volume titled Big Data at Work: The Data Science Revolution and Organizational Psychology. Scott serves as an associate editor for the Journal of Business and Psychology, is a former associate editor for Organizational Research Methods, and is a fellow of the Association for Psychological Science, the American Psychological Association, and the Society for Industrial and Organizational Psychology.

Could you give us some background on big data as it applies to I-O psychology researchers and practitioners?

This is a difficult question because I think big data is a loaded term, so it becomes hard to clearly identify what is big data. Historically, big data is often been defined by different Vs: volume, velocity, and variety.  But there doesn't seem to be any consistent requirement. For example, how much volume do you need? With the advances in computer storage and processor speed what would have been considered large a few years ago is no longer. That makes volume a bit of a moving target. In addition, do you need all 3 Vs or some combination of them to have big data. I'm not sure there is the perfect definition, but I'll share how I like to think about it. When I think about big data and I-O psychology, I am usually thinking about two things. The first is what I would call big data sources. By big data sources, I mean new and different kinds of data that we aren't traditionally accustomed to that arise from technology's impact on our lives. So, one example of this might be unstructured text at scale. Using text data is not new to us, but we have never been able to easily capture, store, and mine such massive quantities. Another big data source might be electronic trace data that we leave behind in our daily interactions. So, these could be things like email or location data.

The second thing that makes something big data to me is the tools and techniques used to analyze that data. So, when I think about text data being big data there is certainly a volume component to it, but the other key feature is how we make sense of that data. When we move beyond our traditional approaches of qualitative analysis, content coding, or even computer-assisted text analysis, which can be applied to lots of data relatively easily, to using more advanced approaches such as natural language processing, word embedding, and deep learning that is when text becomes big data to me. For I-O researchers and practitioners, the application of big data is all about leveraging these new and evolving sources of data in conjunction with the application of a whole set of methodologies originating in fields like computer science to propel our field forward.

It’s been a few years since you wrote about big data's past impact and potential future impact on I-O psychology research and practice. What types of impact are you seeing now and how much change do you see in the future?

Change has been slow, especially in our literature. Although there have been some great pieces published, they most often are in special issues devoted to the topic of big data. We haven't seen this type of work appear more regularly in our top tier journals. The picture is a bit different in industry. Many of my colleagues in applied settings are embracing big data more readily. They are hiring computer and data scientists to be part of their talent analytics teams; they are pursuing algorithmic solutions to human talent problems; they are investing in technologies to capture more data and various kinds of data. I don't mean to suggest that my academic brethren aren't doing these things, but the pace of progress seems a bit slower in academia. However, I think the pace at which we will embrace these things as a field overall will advance quite rapidly in the next few years. Companies are clearly interested in the potential of this kind of work and that will drive change in both academic and applied settings. We are seeing students entering graduate programs expressing explicit interest in this kind of work and that is going to have a big impact as well.

What are some of the more tangible opportunities and strategies for utilizing big data to enhance existing methods of data collection and analysis within I-O psychology?

I think there are lots of opportunities. If your main interest is in prediction, then you should really be making use of one of these advanced prediction models.  I realize we were all trained in OLS regression, but we need to expand our bag of tricks to include these more robust and powerful methodologies that excel in this area.

Another low hanging fruit is unstructured text data. This is one area where the amount of data being accumulated is growing at a massive rate, and text can capture a richness of information that might be lacking from a more conventional data source like a survey. It is also exceedingly difficult and time-consuming to rely on our traditional approaches of manually coding text data, especially when there are massive quantities of it. As a result, there is a ton of potential to discover new knowledge, answer new questions, and have a more profound impact by leveraging this data source. Importantly, we are making continual improvements in our ability to develop algorithms that can accurately make sense of unstructured text data so the whole process is becoming easier and more exact. We're not there yet but I'm very optimistic about what the future holds.

One area where I'd like to see us make more in-roads as a field is in terms of capturing actual behaviors. This is where we can leverage technology like sensors and video to actually see what people are doing rather than rely on self-reports of their behaviors. Videos, like text, are difficult to code manually, but computers are getting better and better at recognizing and classifying behavior. When I think about strategies for utilizing big data more in I-O, the example of video illustrates the importance of one particular strategy we need to embrace -- integrate more computer science talent into our research and applied teams. There are people at my university in departments such as computer science and information systems that rarely interact with psychology and management. These people are doing some incredible things with text and video data that I didn't even know were possible. We really need to be pulling those people into our projects.

Any words of caution/warning for I-O practitioners who may be working with big data?  For example, in the 2018 ORM paper, you discussed concerns of data quality, inadequate training on big data methods among I-O scientists, and some ethical issues (e.g., being able to identify respondents).  Have you seen improvement in these areas that have reduced your concerns—has much progress been made over the last few years?

Well, I think this is a little bit of a mixed bag. Improvements have definitely been made but there are also some additional hurdles to be overcome. GDPR certainly provides additional protections to individuals regarding their data and how it can be used. In terms of privacy and participant protections, this is positive. However, GDPR regulations make it extremely difficult to engage in the same type of data science projects that we may have prior to those regulations. This is definitely a developing challenge as the legal landscape changes.

There is no question that bias in algorithms continues to be a daunting problem. Nevertheless, we have made improvements in our ability to diagnose potential bias. In addition, these biased algorithms are showing something that our field has known for a long time but we don't want to admit: that many of our current practices are themselves extremely biased. These algorithms are so good at reproducing our biased decision making, it is bringing these unfair practices to light. It also provides an opportunity for change. Initially, we may build a biased algorithm, but we can take steps to debias that process. This may not produce perfect decisions, but it can lead to decision making that is less biased than the status quo.

When it comes to data protection, privacy, and bias, what ends up being really important is people’s perceptions of these issues. For these big data tools and methods to have the kind of impact that I think they are capable of, we need to have access to high-quality data, and people need to believe in the predictions being made. Right now, the public sentiment around AI is quite mixed, and we seem to largely neglect studying these perceptions. We are more interested in developing and deploying the latest cool methodology or technology rather than considering how people might react to its application. Given our history of studying things like applicant reactions to selection systems, this seems like an area where we could make an immediate contribution.

I still worry a lot about data quality. Many applications in the big data space still seem to ignore measurement as an important principle. Often, a lot of the failures that you see with AI in HR don't seem to consider measurement at all, which is likely partially responsible for the failure. In my personal experiences, I have had a considerable amount of success when measurement is an integral part of the development of any AI system.

In terms of our training, there is a long way to go, but progress is being made. People are learning R and Python now in graduate school. Workshops at places like the annual conference and CARMA now include big data topics. Though we will likely never have the same skills as someone trained in computer science, we need to know enough so we can work effectively on an interdisciplinary team. We also need to remember that we have a lot to offer. For example, even though some of our data science skills may be lacking, we are all very well trained in measurement, which needs to be a foundational element of any data science project.

This interview was contributed by Dr. Jerel E. Slaughter. Jerel is a member of SIOP’s Visibility Committee and Eller Professor of Management and Organizations in the Eller College of Management at the University of Arizona.

September is Smarter Workplace Awareness Month! Smarter Workplace Awareness Month is all about celebrating and promoting the science and practice of I-O psychology and how I-O psychology can help improve the workplace. This year, we are focusing on the Top Ten Workplace Trends for 2019  Smarter Workplace Awareness Month and the SIOP Top 10 Trends are initiatives of the SIOP Visibility Committee. Watch the 2019 Top 10 Trends Overview Video and visit Top 10 Trends web page for more historical context on the trends.

Previous Article Finding Balance: Evidence-Based Strategies for Employers
Next Article An I-O Perspective on Machine Learning in HR
Print
180 Rate this article:
No rating
Comments are only visible to subscribers.

Theme picker