A Crash Course in
Crash Course in I-O Technology
Richard N. Landers
Old Dominion University
This issue, I’ll be digging into the daunting world of big data visualization, sometimes called “data viz.” This represents one of the four major application areas of big data techniques to I-O psychology and HR, alongside data gathering, data storage, and data analytics (Landers, Fink & Collmus, in press). Importantly, I’m distinguishing data visualization in the big data sense (data viz) from data visualization in the traditional SPSS-ish sense. “Visualizing data” is something we’ve been doing for a very long time with histograms, scatterplots, pie charts and so on. Data viz, in contrast, is a specific type of data visualization, one that focuses on interactive exploration of highly complex datasets. When you create a scatterplot, you’re trying to illustrate to someone the relationship between two variables. When you create a data viz, you’re trying to empower the viewers of that data viz to explore whatever particular relationships they’re personally interested in without much, if any, expertise in statistics required. In either case, the creator of data visualization must have expertise in both the subject matter being visualized and also in the art of visualization itself; historically, the training of scientists has focused more on the former, which may explain why scientists have not generally been very good at creating visualizations (Gelman, Pasarica & Dodhia, 2002).
A great example of data viz comes from visualization guru Evan Sinar
who, as part of the SIOP Content Initiative, interactively visualized SIOP conference submissions
between 2008 and 2016 and made the resulting data viz publicly available
. Using this online tool, I can easily examine any subset of the conference data I might want. You may not be personally interested in checking Testing/Assessment submissions over time, but if I am, I can dial up that analysis myself and see it instantaneously. Despite this increased analytic power, my ability to order extra analyses does not unnecessarily increase the complexity of your or anyone else’s experience interacting with the data viz. Every viewer ultimately has an individualized experience with precisely the data they want.
In these boom times for big data, there are myriad tools available for creating a data viz like this, but perhaps the most popular of these is Tableau, the brainchild of two computer science PhDs and an MBA graduate from Stanford. Tableau’s mission, which you can view for yourself at http://tableau.com/about/mission
, is one centered on accessibility of data. As they state, “we believe helping people to see and understand data is one of the most important missions of the 21st
century. We proudly wear the mantle of ‘data geek’” (Tableau, 2016). As an I-O psychologist, I find that this mission resonates a great deal with me. We often lament that I-O psychology is not taken as seriously as we would like, that I-O psychology’s insistence on high-quality data is often drowned out by the siren songs of “consultants” with pretty PowerPoint presentations, all flash and no substance. We are the original HR data geeks! So if there’s a data viz platform that is worth our attention, this is probably it.
Even if you don’t see a need to adopt a data visualization platform yourself, this is a trend worth watching. Tableau and programs like it have been called “self-service business intelligence,” which can be interpreted in terms of the Silicon Valley tradition of “disrupting” existing industries. Because I-O psychologists often (perhaps even “usually”) practice in the area of business intelligence, this software is intended, in part, to automate and replace the job functions of I-O psychologists. If your job consists primarily of administering pre-existing surveys to employees and making standardized reports to summarize your findings, this software may be able to replace you – if not now, then soon. So if you’re at risk, I’d particularly recommend learning Tableau to see what it is capable of and to ensure you still add value beyond what a VP of HR playing with this software in her office can do. As you learn, remember that this sort of platform is only going to become more powerful, more user-friendly, and less expensive as time moves forward.
In my interviews with I-O psychologists currently using Tableau, their emphasis was clearly on two outcomes: speed and client experience. Brett M. Wells, chief research officer at Talent Plus, Inc., gave me a rundown of how his organization does it:
With its interactive, real-time reporting, Tableau allows us to communicate data insights in more meaningful ways. Through this, we empower clients to answer their own pressing questions. What took days to gather, blend, clean and analyze seemingly disparate sources of data, now can be accomplished with a few clicks, and without the need of continually requesting time from a software developer and/or data scientist. For example, preparing for an executive meeting, the CHRO of a large health system can run a report to describe how recommendation and selection rates have varied across X variable.
Let’s See It in Action
Tableau currently has six core products, and the one you will need depends on what you want to do with it. If you’re just thinking about data viz for yourself and your own presentations, you want Tableau Desktop, which is a desktop application, like SPSS or R. The next two products, Tableau Server and Tableau Online, are essentially identical to each other. Both move Tableau visualization tools to the Internet so that you can create and share data viz via a web browser, and this can be done within your organization, for clients, or for the public. The key difference between Server and Online is that Server requires your organization to host the software, whereas Online is hosted on Tableau’s servers (i.e., it is Tableau in the cloud). Tableau Public is a free version of Tableau Online that requires your dataset and your visualization to be publicly available; Public is thus intended for public-facing projects. See this Tableau Public visualization for example, which allows you to track home ownership by psychologists over time! The final product, Tableau Reader, allows people who don’t license Desktop to view visualizations created by that program on their own computers. For the description below and the linked demonstration video, I’ll be focusing on Tableau Desktop.
So what does Tableau actually look like? Unfortunately, the first time you open Tableau, you may find it a bit confusing; you are prompted to “connect” to a file, server, or other data source. It is here that you get your first glimpse of the perspective from which Tableau was designed, that of a computer programmer. You might ask yourself why you need to “connect” to a file rather than simply open one. The answer to that question actually reveals a bit about Tableau.
As I-O psychologists, we’re accustomed to a particular technology workflow. We open a data file, work in that file, save that file, and close it to work on it again later. Tableau is not built this way in relation to its data sources. Instead, data sources are assumed to be live and changing things that Tableau should not modify. Instead, Tableau merely takes a snapshot of whatever data file you connect it to, manipulates that snapshot to accomplish whatever tasks you request, and then discards that snapshot once you’re done for the day. When you open the program again, data sources are refreshed and your visualizations are recreated. Thus, this is a very different way of thinking about data than to what you’re probably currently accustomed.
Figure 1. Annotated screenshot of Tableau.
Once you connect to a file, the experience doesn’t get much better. In fact, Tableau doesn’t seem to do anything at all. The secret next step is that you need to click “Sheet 1” at the bottom of the screen, which you can see in Figure 1. Once you have opened a Sheet, you finally gain access to Tableau’s data visualization platform, and from this point, how to use this software becomes much more apparent.
To start, you’ll want to check to be sure your variables appeared in the right places; specifically, any variable you want to summarize (usually DVs, typically interval- or ratio-level measurement) should appear as Measures whereas any variable you want to split your analysis by (usually IVs, typically nominal- or ordinal-level measurement) should appear as Dimensions. Tableau will take a guess, but it is not always correct. For example, in the demonstration video I created for this article, Tableau interpreted my Likert-type survey items to be nominal dimensions because they contained an “X” representing “Do Not Know” in addition to scores numbered 1 through 5. In such cases, simply click-drag the variable to the correct location. If your variables remain miscategorized, right-click on them to access a range of other settings.
Once your variables have been categorized correctly, you’ll want to click-drag them to one of three places: Columns, Rows, or Marks. These are what they seem. For example, if you wanted to visualize the two-way interaction between Gender and Supervisor Status on answers to Q15, you’d probably drag both Gender and Supervisor Status to Columns, then Q15 to Rows. Tableau will guess as to the ideal visualization based upon the data types you’ve given it and where you’ve put them, creating a multibar chart, one bar for each unique combination of Gender and Supervisor Status. If you then wanted to break each bar up into pieces, with different colors by category, you’d drag that variable to Marks.
The most impressive visualizations that Tableau can create are undoubtedly geography-based. One such analysis appears in Figure 1. Here, I have given Tableau a list of ZIP codes and asked it to fill in the geographic area represented with that ZIP code with a color based upon its mean. ZIP codes with higher means (in this particular case, higher scores on a Satisfaction with Telework survey item) appear bluer, whereas ZIP codes with lower means appear more orange. Although you can see “Latitude” and “Longitude” in the Columns and Rows areas, these were generated automatically from the ZIP codes I provided, which Tableau recognized automatically. In fact, Tableau will automatically recognize a wide range of geographic signifiers, including area codes, city names, counties, countries, state names and city names.
Once your visualization design is complete, you progress to create a “Dashboard,” which is essentially a snapshot of a Sheet designed to either be exported elsewhere or to be shown directly to a person of interest. “Stories” are a collection of Dashboards, intended to tell… well, a story. I generally found Sheets and Dashboards were all I needed.
To see these concepts in action, watch the demonstration video below.
Crash Course in I-O Technology: Tableau from TNTLab on Vimeo.
So Who Should Learn Tableau?
Tableau is really designed to help people draw conclusions from data who have limited expertise in statistics, and that does not describe I-O psychologists. Given this, I believe its value to people like us is limited to a few particular circumstances:
- Tableau is useful if you fully embrace the idea of exploratory analyses and would like to visually poke around a dataset quite quickly. If you think of yourself as a “visuals person,” you will find this much more enjoyable than SPSS analyses.
- Tableau is useful if you want to enable other stakeholders without significant statistical expertise (e.g., non-I-O members of your team, your clients) to explore complex data on their own, within parameters that you specify.
- Tableau really shines when you have geographical aspects to your data that are too complex to easily understand using simple statistics (e.g., area codes, ZIP codes, states). For example, if you’re pondering a 50-level one-way ANOVA to look at US state-by-state differences, you’d probably have a better experience and produce more useful conclusions skipping the ANOVA and interpreting your results using Tableau alone.
- Tableau is very convenient if you have access to live datasets (i.e., datasets that are being constantly updated in an automated fashion) and would like to see what your data look like as they are collected.
- Tableau is very effective at creating clean, attractive visualizations with very little effort. If you’re visualization-challenged, Tableau makes the creation of informative, interactive visualizations quite simple.
Gonzalo Ferro, an I-O psychologist working for the government, made a compelling case to me:
I think being able to do data visualizations has really helped me do my job. I have been able to get my message across to high level leaders (head of an agency) much faster, and in a more credible way. It makes story-telling data much easier to do to non-data people. You see a lot of slick presentations out there, but the content is bad. You see great content, but the presentation is really boring and painful to watch. When you can do valuable/informative graphics, with important content, you become a value-added asset. I feel data visualization is an important skill for I-O psychologists to have, because let’s be honest, talking about a coefficient matrix and p-values is not that interesting.” Implicit in his response, I think, is that Tableau enables you to create much more interesting and interactive visualizations than you would with SPSS or Excel alone. You don’t need to be a graphic artist anymore to create something quite impressive.
A downside to Tableau is that there is no free version. If you want to try it out, you’re limited to a 14-day free trial. However, students and educators
can currently request yearly licenses for free. There are alternatives to Tableau, many paid and a few free, but in my experience nothing is quite so user-friendly and versatile, which are clearly Tableau’s strengths. To get the same combination of features, you would likely need to combine the capabilities of several different programs. The closest you can likely get is that if you use R, as I’ve recommended already
, you can use the library ggplot2
to create quite impressive data visualizations and Shiny
if you want to make them interactive. If you already use Excel or SPSS, you can also create most of Tableau’s visualizations already (excepting the geography-based ones), although it can be a bit unpleasant to do so given the clunkiness of both of those programs. You also need to know exactly what figure you want to create before navigating many-level-deep menus to figure out how, which is a challenge the Tableau user interface solves quite effectively.
A downside to data visualization programs in general, including Tableau, is that it’s quite easy to give too much power to a novice data explorer. As I-O psychologists, we all know the sampling error-related dangers of overinterpreting small subsets of data, but this is not obvious to someone who knows little about statistics. When you empower someone to drill deep into data, the personal data experience they have may not be one you want them to have, so a balancing act is required in designing data viz. Empower your viewers, but not too much; simplify the story told by the data, but not too much. There are no hard and fast rules for doing this (yet), so be cautious. If you’re planning to show your data viz to someone important for your career, consider “testing” it on novices (i.e., friendly non-I-O colleagues) first.
To Learn More
Here are a few steps to get started in Tableau:
- Download the free trial: http://www.tableau.com/products/trial
- Next, “connect” to a dataset that you’ve analyzed before. Tableau will open Excel, SPSS and R data files, among many others, in addition to live database connections.
- Click on Sheet 1 at the bottom of the screen.
- Play with it. Start small.
No really! Tableau’s primary selling point is that it’s a highly intuitive way to explore data. Before trying any tutorials, spend a few minutes simply poking around a dataset that you already know and see what visualizations you can create. Don’t be afraid if you specify something incorrectly; just use the undo button at the top, which looks like a left arrow. If you decide you want to learn beyond your own fiddling, try out Tableau’s own extensive
tutorial video series, found here: http://www.tableau.com/learn/training
. There are seven hours
of tutorials, if you really want to dig deep into what Tableau is capable of. If you want a faster answer to a specific problem, check the Tableau discussion forums
or ask a question tagged “tableau” on Stack Overflow
After you get comfortable poking around Tableau, you might wonder just how impressive your visualizations can become. If you want to get inspired, I recommend watching this recording of the 2015 SIOP conference closing plenary session
featuring Amanda Cox, data visualization expert at the New York Times. Almost every visualization she presents could be created in Tableau, assuming you have the data, of course.
As you explore your newfound skillset, I’ll leave you with two recommendations. First, take advantage of preexisting data sources. One of the most fundamental ideas from the big data movement is that data should be freely accessible and remixable to maximize its impact and value. Second, have fun. This is a great opportunity to get your data nerd on. For example, remember that impressive visualization of SIOP conference submitters
by Evan Sinar that I mentioned earlier? Well, in just 5 minutes, I was able to extract the data Evan used from the website where he published it, import it into Tableau Public, and produce the data viz below (a dashboard) quantifying that same presentation list aggregated by first name. Why? Because from this, I can clearly tell that the Davids need to get working lest the Michaels run away with our conference. Those Testing/Assessment Michaels in particular. You’re welcome, SIOP
. And if that’s not a compelling case for data viz, I don’t know what is.
That’s it for the second edition of Crash Course! If you have any questions, suggestions, or recommendations about Tableauor Crash Course, I’d love to hear from you (firstname.lastname@example.org; @rnlanders).
Gelman, A., Pasarica, C., & Dodhia, R. (2002). Let's practice what we preach. The American Statistician, 56, 121-130. doi: 10.1198/000313002317572790
Landers, R. N., Fink, A. & Collmus, A. B. (in press). Using big data to enhance staffing: Vast untapped resources or tempting honeypot? In J. L. Farr & N. T. Tippins (Eds.), Handbook of employee selection. New York, NY: Routledge.
Tableau. (2016, August 18). Mission.Tableau Software. Retrieved from http://www.tableau.com/about/mission