Personnel Psychology in 75 Words (or Less): A Word Cloud Example
Thomas A. Stetz*
Hawaii Pacific University
Has anyone ever walked up to you on the street and asked you to describe personnel psychology in 75 words or less? I didn’t think so. After reading this article, however you’ll be prepared just in case.
The personnel psychology (PP) that I am talking about is the hardcore stuff. I am talking about the stuff that you find in Personnel Psychology (the journal). It’s one of I-O psychology’s most respected journals. It’s the one that was first published in 1948. It’s the one that was established by Erwin Taylor in collaboration with Frederic Kuder and Charles Mosier. Interestingly, the very first lines ever written in PP were:
Personnel are people! Psychology, as a body of Scientific findings and as a method is concerned with the study of people—the ways they act—what they can do—and why. Personnel Psychology has been founded to stimulate and report the application of psychological methods, understandings, techniques and findings to personnel problems (Taylor and Mosier, 1948).
That’s only 52 words. It still seems to apply today. I guess I’m done. Not quite. If my math is right, 1948 was uh . . . many years ago.
The current scope of PP reads,
Personnel Psychology publishes psychological research centered around people at work. Articles span the full range of human resource management and organizational behavior topics, including job analysis, selection and recruiting, training and development, performance appraisal and feedback, compensation and rewards, careers, strategic human resource management, work design, global and cross-cultural issues, organizational climate, work attitudes and behaviors, motivation, teams, and leadership. Research conducted at multiple levels of analysis, including individual, team, and organizational levels, are welcome. Published articles include original empirical research, theory development, meta-analytic reviews, and narrative literature reviews” (Personnel Psychology, 2011).
Unfortunately that is 89 words and I only have 75. Also, as an I-O psychologist I am aware that there is often a difference between what people say and what they actually do. Not that I don’t trust PP, but I decided to do some simple text analysis and data visualization that would help describe PP in 75 words (or less) should I ever be asked.
I downloaded the entire contents of Volume 63 of PP, which was published in 2010. That’s 1087 pages of printed material. I would like to say that I read every single word on every single page, but that would be a lie. I would also like to say I used highly sophisticated software for this analysis, but again that would be a lie. All of my analyses could have been done using R (see Feinerer, Hornick, & Meyer, 2008; Fellows, 2011), but then again what can’t R do? Text analysis and data visualization, however, do not need to be complicated. The tools I use are easily within everyone’s reach and could be applied to a variety of real world projects that I-O psychologists face daily.
Rather than a complicated software package, I found a word frequency counter on the Internet. I pasted the entire contents of volume 63 of PP into a textbox and clicked analyze. In about a minute I got the output.
Text analysis typically removes highly used words that contribute little to the semantic analysis of the text; words like “the,” “be,” “and,” and so on. In text analysis these are called stop words. (There is more to be said about stop word lists later.) The simple free application I used did not have this removal capability but returned the entire list in alphabetical order. I copied and pasted the output into an Excel spreadsheet and sorted by word frequency. I read down the list manually deleting words that did not contribute to my understanding of PP. I did this until I had 75 words that described what was published in PP. I noticed that some words had the singular and plural equivalents, such as “study” and “studies.” Thus, I decided to combine the counts in such cases and also for cases like “organization” and “organizational.” In text analysis this is called stemming, which is the process of combining different grammatical forms of the same words. Many if not most text analysis programs can do this automatically. The final word list with frequencies is shown in Table 1.
Looking at Table 1 you can see performance was the most used word, appearing 2,085 times. That’s a lot. That’s 1.92 times per page! Now I know performance must be really important and PP is primarily about performance. (In other words, make sure you talk about performance a lot in any article you send there.) Although I-O psychologists really like long boring tables, the person on the street probably doesn’t. I had to come up with a more engaging way to display this information if I wanted to successfully explain PP. Two words came to mind: word clouds.
A word cloud, also known as tag cloud (although there are differences related to the data behind them, I consider them interchangeable for this article), is a text data visualization method. At its most basic level the printed size of the word is contingent upon a weighted value placed upon the word. Many people credit their development to the photo-sharing website Flicker in 2002 as a way to show how users had tagged their photos. However, they have a much longer history. For example, consider the practice in cartography where the size of a city’s name is based on the city’s size. Viègas and Wattenberg (2008) credit psychologist Stanley Milgram with being one of the first to use the technique as a text visual representation tool. Milgram and Jodelet (1976) asked people to name Parisian landmarks, then used differing font size to show how often each landmark was stated. Viègas and Wattenberg (2008) also state that word clouds first worked their way into the popular media in 2001 when Fortune magazine published an article that included a visual depiction of the 500 largest corporations in the world. The corporations were organized into circles for different countries and the size of the circles and the corporate names were based upon revenue.1 Soon after tag clouds exploded on the scene they quickly became a prominent feature in Web 2.0 design.
Unfortunately, cloud representations have both good and bad data visualization characteristics (Hearst & Rosner, 2008). For example, on the positive side, they are compact, and the eyes are drawn to the largest items first. They can also represent several pieces of information simultaneously:â€ˆThe words convey information, the spatial representation can convey information (clustered, circular, alphabetical, etc.), font size can be manipulated to convey information, and words can be color coded to convey even more information. The negative design aspects include that slight differences between word size is often difficult determine and word length is conflated with size. In addition, similar words can sometimes be placed very far apartt, although different layout options such as sequential, circular, and clustered partially overcome this difficulty.
The above concerns suggest that despite their popularity word clouds might not be such a great visualization tool after all. Coupled with the fact that very little empirical research has been conducted on them there could be a real problem. So why did I choose to use a word cloud to visualize the text analysis results? I choose it because word clouds are not purely a data visualization tool. In comparison to a boring table they are often more esthetically pleasing, eye catching, and engaging, and these things are important if we are to communicate effectively with the person on the street. If a cloud is put together well it can convey a lot of information, avoiding math and numbers that so often frighten people. Thus, word clouds can be a supplemental tool to help practicing I-O psychologists communicate with clients.
Lohmann, Ziegler, and Tetzlaff (2009) concluded there is no single best way to arrange a cloud. Instead, the most effective design depends on the specific user goals and the intentions of the designer. Thus, anyone choosing to use a word cloud should have a basic understanding of the research to date, as this will allow them to make the most effective cloud for their specific use. Rivadeneira, Gruen, Muller, and Millen (2007) suggested a basic methodology to evaluate tag clouds. They identified four tasks that clouds can support. They are searching, browsing, impression forming or gisting, and recognition and matching. In the present case, impression forming or gisting seems the most relevant task—almost all of the research performed to date has focused on the other tasks.
Bateman, Gutwin, and Nacenta (2008) identified nine visual features that may influence the effectiveness of clouds. They are (a) font size, (b) font weight, (c) color, (d) intensity, (e) number of pixels, (f) tag width, (g) number of characters, (h) tag area, and (i) position. They also discussed font type, font alignment, text decoration (underline, italics, etc.), word spacing, and character width variability as important properties to consider in the evaluation of clouds. In addition, Rivadeneira et al. (2007) identified layout features that may influence the effectiveness of word clouds. The layout features include how the words are sorted (alphabetically, randomly, or frequency), clustering (words can be sorted semantically or other user preferences), and spatial layout (words can be sequential or circular). Based on the number of factors identified above, you can see that a thorough evaluation of the method quickly becomes very complicated. Furthermore, any specific cloud’s performance will be dependent upon how the combinations of these factors align with the designer’s goal.
It should be obvious that font size could affect a word’s recall rate and how quickly it is found when performing a searching task. The cloud research supports both of these intuitive observations (recall: Bateman et al. 2008; Rivadeneira et al. 2007, searching: Halvey & Keane, 2007, Lohmann, et al., 2009). Bateman et al. (2008) further explored font characteristics on tag selection and found that the most important visual clues for selection were font size, font weight, and intensity. Much less important were number of pixels in a word, tag width, and tag area. Finally they suggest that color and position should be used with care and any decisions involving these characteristics should be made on a case-by-case basis.
There are some other interesting findings that a word cloud designer should know about. For example, words in the upper left corner tend to be better recalled (Rivadeneira et al. 2007) and found more quickly (Halvey & Keane, 2007; Lohmann et al., 2009). It should be noted, however, that Lohmann et al. (2009) found that the upper left corner position performed best when a search task was for a specific tag, but a circular design was most effective for locating the most important tag. Their eye tracking data further showed that when the cloud was sequentially ordered or clustered, eye fixations were greatest in the upper left and lowest in the lower right. However, when a circular layout was used, eye fixations were strongly focused on the central part of the cloud.
Clearly, based on the above findings word position is important. This may be because of left-to-right western style reading. The designer may wish to put high impact words in the upper-left to draw attention, or conversely he or she may want to put smaller font words there to balance the viewer’s attention to detail. One additional comment regarding reading style: Research suggests that viewers scan tag clouds rather than read them (Halvey & Keane, 2007; Lohmann, et al., 2009; Rivadeneira et al. 2007). However, this finding applies to tag clouds as a search tool and not as a data visualization tool.
Using an information retrieval task, Sinclair and Cardew-Hall (2008) found that users expressed a greater preference for an ordered list over a tag cloud when the information retrieval task was for specific information. In contrast, they preferred tag clouds when the task was more general in nature. These results suggest that clouds are useful tools when browsing rather than searching for specific information. They suggest that under general browsing activity, using a cloud reduces the user’s cognitive effort. The findings of Lohmann et al. (2009) generally support Sinclair and Cardew-Hall’s conclusion. However, they also noted that participants partly preferred layouts that did not produce the best performance. This is an important finding reminding all of us that effective communication not only involves objective performance, but user preferences as well. Oosterman and Cockburn (2010) concluded as much when their evaluation of tag clouds revealed that clouds often perform worse than interactive tables for search tasks. In explaining the popularity of tag clouds, they ultimately concluded that clouds also serve an artistic purpose to communicate information in a visually appealing way. Thus, speed and accuracy may be irrelevant or secondary to the true purpose of user engagement. Perhaps this is why Viégas and Wattenberg (2008) declared that word clouds work in practice but not in theory.
Now that you know more about word cloud research, how can you create one? You could simply list the words in Word or Excel and manually change the font size based on the relative frequency of the words. Most word cloud algorithms use a log function to determine font size, but you can play with other possibilities (such as power functions or simple linear functions) to fit your particular dataset. Below is a simple Excel log formula that you can use.
= MinFontSize + ((MaxFontSize − MinFontSize)*(LOG(WordCount) − LOG(MinWordOccurance)) / (LOG(MaxWordOccurance) − LOG(MinWordOccurance)))
MinFontSize is the desired minimum font,
MaxFontSize is the desired maximum font,
WordCount is the count of the specific word,
MinWordOccurance is the minimum word frequency in your list of words, and
MaxWordOccurance is the maximum word frequency in your list of words.
Alternatively you could also use a simple free web application that automatically arranges the words into aesthetically pleasing formats. Regrettably, when you use a free application your control over key design aspects will be limited. The controllability of design features vary by website, and you may want to search around until you find one that meets your needs.
I investigated two popular sites, Tag Crowd (http://tagcrowd.com/) and Wordle (http://www.wordle.net). Tag Crowd allows you to upload documents up to 5 MB in size. That’s a lot of text.2 However, Wordle allows you to specify the relative weight of words, which I already had from Table 1. After a minute of reformatting Table 1, I was able to cut and paste into Wordle and click Submit. Almost instantly a word cloud appeared. There are several layout options you can play with such as order presentation, orientation, and font type and color. I preferred alphabetical order, rounder edges, vertical (word orientation), black and white, and Lucida Sans font type. The result is shown in Figure 1.
|Figure 1: Word cloud of most frequently occurring meaningful words in Volume 63 of Personnel Psychology
Looking at the very top of the word cloud you see “although.” I debated if that word should be included. I decided to keep it. My old professor (name withheld so I don’t make him angry and he’s not really old) once said in class that he wanted to met a one-armed I-O psychologist because they are always saying “on the one had . . . , but on the other hand . . .” For this reason I left “although” in the word list. Jumping back to stop words, most stop word lists remove the words “although,” “but,” and “because.” All these words made it into my list. I think these words are clearly important as they show how I-O psychologists are always playing it safe, hedging our bets, making sure that we don’t overstate findings as we explain things like “behavior” and “performance.” This is probably the scientist in us. I wonder if I did the same analysis of popular HR or management writings (like Harvard Business Review) if the same finding would emerge.
The next word is “applied.” Even though PP does not explicitly say in its statement of scope that it is applied, I believe that most of us would consider it an applied journal. Continuing to examine the cloud you can see the importance of “performance” and “job.” Look at the prominence of the word “between.” We are often looking at effects (also in the list) between things like groups, employees, and teams, all words that made it into the list. Zoom your attention in and you see topical words like “selection” and “coaching.” Spend some more time reading and thinking and you will see that our entire approach to PP is captured. The word “table” appears because we present so much of statistical analyses in tables (not figures). You can see other important statistical words like “variance” and “correlation.” You can see important research method words like “hypothesis” and “validity.”
I could read through the entire word list justifying and explaining each occurrence. However, a good visual display allows readers to explore and make sense of the information on their own.
A more thorough and traditional text analysis might have communicated all of this information in a long static table with a column for word and one for word frequency. Although this would have been entirely accurate, informative, and perhaps objectively more efficient, it would not have been as esthetically pleasing. It would only have interested the already interested viewers. It would not have pulled the marginally interested viewers into exploring the data. If we want to communicate our important findings to a large number of others, we need first to capture their attention and pull them in. Once we have their attention and interest, they may actually expend the cognitive effort to understand what we are saying.
I encourage all I-O psychologists to think more creatively about how we communicate with the person on the street. I presented some very simple text analysis tools and visual communication strategies that everyone could use.
Now if anyone walks up to you on the street and asks you describe PP in 75 words (or less), you can simply show them a word cloud.
Bateman, S., Gutwin, C., & Nacenta, M. (2008). Seeing things in the clouds: the effect of visual features on tag cloud selections. Proceedings of the 19th ACM conference on hypertext and hypermedia, (pp. 193–202). New York, NY: ACM Press.
Feinerer, I., Hornik, K., & Meyer, D. (2008). Text mining infrastructure in R. Journal of Statistical Software, 25, 1–54.
Fellows, I. (2011). Package “wordcloud.” Retrieved from http://cran.r-project.org/web/ packages/wordcloud/wordcloud.pdf
Halvey, M. J., & Keane, M. T. (2007). An assessment of tag presentation techniques. Proceedings of the 16th international conference on World Wide Web 2007 (pp. 1313–1314). New York, NY: ACM Press.
Hearst, M., A., & Rosner, D. (2008). Tag clouds: Data analysis tool or social signaler? Proceedings of the 41st annual Hawaii international conference on system sciences (HICSS’08, 160).
Lohman, S., Ziegler, J., & Tetzlaff, L. (2009). Comparison of tag cloud layouts: Task-related performance and visual exploration. Proceeding of the human-computer interaction – INTERACT 2009 (pp. 392–404). New York, NY:â€ˆSpringer.
Milgram, S., & Jodelet, D. (1976). Psychological maps of Paris. In W. I. H. Proshansky & L. Rivlin (Eds.) Environmental psychology (pp. 104–124). New York, NY: Holt, Rinehart, and Winston.
Oosterman, J., & Cockburn, A. (2010). An empirical comparison of tag clouds and tables. Proceedings of the 22nd conference of the computer-human interaction special interest group of Australia on computer-human interaction, 288–295. New York, NY: ACM Press.
Personnel Psychology. (2011). Scope. Retrieved from htp://onlinelibrary.wiley.com/
Rivadeneira, A. W., Gruen, D. M., Muller, M. J., & Millen, D. R. (2007). Getting our head in the clouds: Toward evaluation studies of tagclouds. Proceedings of the SIGCHI conference on human factors in computing systems, (pp. 995–998). New York, NY: ACM Press.
Sinclair, J., & Cardew-Hall, M. (2008). The folksonomy tag cloud: When is it useful? Journal of Information Science, 6, 15–23.
Taylor, E. K., & Mosier, C. I. (1948). The methods of science applied to the problems of personnel. Personnel Psychology, 1, 1–6.
Viègas, F. B., & Wattenberg, W. (2008). Tag clouds and the case of vernacular visualization. Interactions, July-August, 49–52.