Informed Decisions:
Research-Based Practice Notes
Steven G. Rogelberg
Bowling Green State University
Recently,
I, along with Allan Church, Janine Waclawski, and Jeff Stanton,
wrote a chapter on the present state of organizational survey research for the
forthcoming Handbook of Research Methods in Industrial and Organizational
Psychology. In writing this chapter, it became apparent that not only is
survey research thriving in I-O psychology, but its future appears quite secure
given the utility and power of the Internet/Intranet for conducting survey
research. Over the past 30 years or so, researchers and practitioners have
designed and developed many excellent survey methods and practices. We have also
developed and designed some questionable ones as well. In this column, we turn
our attention to two survey practices that have become quite banal and yet can
be highly deleterious to a survey effort. If you have any comments/questions
concerning this column, please contact me at rogelbe@bgnet.bgsu.edu.
Problems and Potential Alternatives to
Two Common Survey Reporting Practices:
Normative Comparisons and Percent Favorables
Steven G. Rogelberg
Bowling Green State University
Allan H. Church
PepsiCo, Inc.
Janine Waclawski
PricewaterhouseCoopers, LLP
Jeffrey M. Stanton
Bowling Green State University
In general, applied organizational survey research typically proceeds
through five basic stages: (a) identification and documentation of survey
purpose and scope; (b) survey item and instrument construction; (c)
administration and data collection; (d) data analysis and interpretation; and
(e) the reporting of results.1 Throughout the planning and
implementation process, survey researchers and practitioners are often faced
with making a host of methodological and analytical decisions which can impact
the quality and utility of the results obtained. In this column, we review two
survey practices that occur in the latter stages of the implementation process
which, though very commonly employed, we feel are particularly problematic for a
variety of reasons. The two practices to be examined include the use of external
benchmarks or normative comparisons and data reporting via percent favorables.
1 From a practitioner perspective there is a
critical sixth step to the action research model which involves using the survey
results to drive organizational change and improvement. This issue is discussed
at length in other sources including Church & Waclawski 2001; Folkman, 1998;
Kraut, 1996.
Although there are many good in-depth in survey texts and how-to
manuals, few have focused their attention specifically on issues inherent in
these two commonly used approaches to data interpretation. In the following
sections we will briefly review these two survey practices, express our issues
and concerns, and suggest potential alternatives for survey researchers and
practitioners.2
2 For those individuals interested in a more
comprehensive treatment of these issues see Rogelberg, Church, Waclawski, &
Stanton (in press).
Practice One: Normative Comparisons
It is fairly common in organizational survey research to see current data
compared with some internal or external normative database (often called a
benchmark) that contains information on how employees in other organizations,
groups, and/or internal units responded to the same set of questions. This
comparative process is thought to help individuals interpret and evaluate the
observed data against a greater context (e.g., How do we stack up? Are
our observed ratings high, low, or average in comparison to others? How do we
compare to the best in class in our industry?). While normative comparisons
may play an important role in total quality and business process reengineering
efforts, its use is more problematic in applied survey research. This is
particularly apparent when organizations focus more on their relative standing
vis--vis external norms or benchmarks than on their own internal strengths and
areas for improvement (Church & Waclawski, 1998).
One significant concern, for example, is the issue of data equivalence,
particularly when relying on external comparative data from other organizations.
Critics of norming argue that even if two organizations are very similar in
their basic composition (e.g., number of employees, type of industry), it is
still highly unlikely that they are equivalent across the full range of
demographic, geographic, and socioeconomic dimensions (Lees-Haley &
Lees-Haley, 1982). As a result, differences between normative databases and
observed survey data cannot easily and confidently be attributed to identifiable
organizational factors. Thus, interpreting gaps between the two databases
is suspect. Although internal norms (e.g., comparisons within the same
organization) do not suffer as much from this problem, in some companies the
differences between specific business units, divisions or regions do in fact
represent less-than-comparable situations and dynamics as well.
The second major argument against norming concerns is conceptual
appropriateness. More specifically, some practitioners have argued that an
organization should not compare its own observed data to what others firms have
obtained, but instead to what is inherently meaningful, important, and plausible
(Church & Waclawski, 1998). For example, even if ones ratings are higher
than the norm on employee satisfaction, if the scores are low in general there
is little point in claiming that area as a strength. After all, dissatisfied
employees are still dissatisfied, regardless of whether their dissatisfaction is
consistent or not with external benchmarks. Comparative norms do not define
reality for the employees who completed the surveys.
Rather than calling for the discontinuation of norming, however, (which seems
impractical given its popularity in industry), we point the reader instead to
some factors to consider which can impact the validity and utility of such
efforts. First, a basic methodological rule in norming practice is to only
compare data across organizations when the data have been collected using
identical survey items. Despite the apparent obvious nature of this rule, in
practice it is frequently ignored under the rubric of inference.
Although necessary, having a set of identical items alone is not sufficient
for an appropriate between-organization comparison. When planning to use
external norms, comparative analyses should only be conducted when the item
context has also been carefully controlled. Item context refers to the placement
and order of items found on the organizational survey instrument. Item context
can have a significant affect on individual response patterns.
Research by Hyman and Sheatsley (1950), for example, found that affirmative
responses in support of freedom of the press for Russian reporters in the United
States jumped from 36% to 73% depending on whether the question was prefaced or
not with a similar one regarding the appropriateness of American newspaper
reporters in Russia. More recently, Strack, Schwarz, and Gschneidinger (1985)
asked respondents to describe either three recent positive or three negative
life events. Not surprisingly, respondents who were instructed to recall
positive events subsequently reported higher happiness and life satisfaction
than those who had to recall negative ones. Research by Schwarz, Bless, Strack,
Klumpp, Rittenauer-Schatka and Simons, (1991) reported similar context effects
with respect to assertiveness. Subjects that were instructed to generate a total
12 different examples rated themselves as having lower assertiveness than
subjects who were asked to generate only 6 such examples (Schwarz et al.
suggested that the difficulty of generating 12 different examples may have led
the participants to believe that they must not be too assertive). Taken
together, these and numerous other studies have demonstrated that item order can
dramatically influence the survey responses given (see Schuman & Presser,
1996; Schwarz & Hippler, 1995; Tourangeau & Rasinski, 1988).
Clearly, the implications of item-context effects underscore the difficulty
of making comparisons across external data even when the item wording itself is
identical. The item context must be taken into consideration and held constant
prior to the comparison process. Without controlling for or understanding
item-context effects we can not reliably interpret gaps or
similarities between a normative data set and an organizational data set.
Given these concerns, we recommend practitioners and researchers consider the
following recommendations in the survey design process when planning to use
their data for normative comparisons. Items should be: (a) grouped together; (b)
listed in the same order, (c) presented with the same instructions, and (d)
placed as a block at the beginning of the survey prior to customized items not
being planned for normative analyses (although subsequent items can still cause
context effects, the effect sizes for subsequent item-context effects are much
smaller than the item-context effects for preceding questions; Schwarz &
Hippler, 1995).
Besides this design solution, however, we would like to offer two alternative
norming approaches for researchers and practitioners to consider: (a) expectation
norming and (b) goal norming. In expectation norming, key senior
leaders and survey sponsors complete their own copy of the survey instrument
based on how they truly believe their employees will respond. Actual
survey results are then compared to these expectation norms which provide
insight into how in-sync the key stakeholders are with their employees
perceptions of the organization. Alternatively, in goal norming, the leaders and
survey sponsors complete their own version of the survey based on how they
hope employees will respond. The discrepancies between this ideal state and
the actual responses obtained can be used to drive interest, energy, and action
planning around the survey results. Regardless of which alternative approach is
used, both of these norming methods have the added benefit of increasing
investment and interest on the part of the survey sponsors and senior leadership
in the outcome of the process prior to the delivery of results themselves.
Practice Two: Percent Favorables
The second major area of survey practice that concerns us is the
over-reliance on percent favorables. Despite the problems associated with
this approach, the reality is that reporting survey results in the form of a
collapsed set of percentages is one of the most frequently used methods of
summarizing data in organizational settings (Church & Waclawski, 1998;
Edwards, Thomas, Rosenfeld and Booth-Kewley, 1997; Jones & Bearley, 1995).
In practice, this translates to the combination of two or more positive response
categories (e.g., adding a 4 = satisfied and a 5 = very satisfied together
on a 5-point satisfaction scale) and labeling that group as being the
favorable respondents. Typically, the same approach is applied to the
lower end of the response scale as well with the bottom two or three categories
(e.g., 1 = very dissatisfied and 2 = dissatisfied) being grouped
together to represent the unfavorable respondents. While collapsing data
is not inherently bad per se, (and is required for certain types of nominal data
such as categorical responses and demographic items), the problem is that
oftentimes the collapsed data is all that is reported. Thus, rather than
presenting a complete distribution of frequencies for all response options on a
given scale, many survey reports only display the findings using this
reductionistic approach, which clearly limits both the depth of information and
the level interpretability of the findings provided. Moreover, in some survey
reports, only one of these categories might be displayed (typically only the
percent-favorable component).
While the use of a percent-favorable category clearly makes sense from a
simplicity and clarity-of-presentation perspective (Jones & Bearley, 1995),
from a methodological and measurement-based mindset this approach is quite
problematic. First, by collapsing a 5- or 7-point rating to what is essentially
a 3- (or even 1-) point format, one loses considerable information regarding
variability. Second, the inherent discrimination made among categories by
respondents when completing the survey is entirely lost; the subtleties of
response are lost. Third, by collapsing a scale after it has been used, the
survey researcher is essentially imposing new psychometric restrictions on the
underlying structure of the data that were not present when it was initially
gathered. Finally, when the collapsed data are used for additional subgroup
analyses (which is quite common in practice), this tends to compound the impact
of this reductionistic method.
Aside from the decreased variability and subtlety of the data, and perhaps
more importantly for practitioners and survey sponsors, collapsing response
ratings can lead to significant misinterpretations of the data. Table 1 provides
an example of how the percent-favorable method might obscure data results in a
given survey effort.
Table 1: Different Examples of Percent Favorable
|
Percentage of sample reporting
each scale value |
| Scale values |
Example one:
Middle of the road |
Example two:
Top heavy |
Example three:
Well distributed |
| 7 very satisfied |
10 |
60 |
22 |
| 6 |
10 |
0 |
18 |
| 5 |
40 |
0 |
20 |
| 4 |
20 |
20 |
20 |
| 3 |
20 |
10 |
8 |
| 2 |
0 |
5 |
6 |
| 1 very dissatisfied |
0 |
5 |
6 |
Clearly, each of these three sample distributions are quite different from
one another, yet in each of the examples a 60% favorable score (collapsing the
top three response categories) and 20% unfavorable score (collapsing the bottom
three response categories) would be identified using this method of reporting.
In short, when applied to these data, the percent-favorable method of displaying
results would not yield an effective set of targeted interventions or follow-up
activities from the survey process. Thus, it is our contention that the
percent-favorable approach presented as the sole method of display is generally
an inappropriate and potentially unethical (if the researcher were to collapse
responses in an attempt to purposefully deceive the audience) way of
summarizing data.
Of course, there are several alternatives to this method. First and foremost,
as applied researchers at heart, we would strongly advocate the use of the mean
and standard deviation for survey reporting purposes. Both have useful
statistical properties and are simple yet powerful descriptive measures. Plus,
the mean and standard deviations are applicable to a wide variety of situations
and types of survey items. Although we recognize that there are some inherent
problems with the use of these measures as well (e.g., the impact of outliers
and/or bimodal or highly skewed response distributions), in general, given the
restricted range of standard 5-point or even 7-point rating scales, coupled with
the large sample sizes typically associated with most organizational survey
efforts, these do not present the same level of concern as noted above with
reliance on percent favorables.
Of course, the biggest barrier to using the mean and standard deviation in
applied organizational survey work, and probably part of the reason the
percentage-favorable method has grown so significantly in practice, is the issue
of interpretability. Many practitioners and researchers have found that mean
scores and standard deviations are not always readily interpretable by
nonstatistically trained individuals (or senior executives in particular). Given
these concerns, we offer two potential linear transformations that survey
researchers and practitioners might want to consider using to overcome this
barrier. Both afford the same level of psychometric robustness (remember that
linear transformations do not change the inherent properties of the data) while
potentially increasing the ease of understanding among those with low statistics
quotients.
The first option is what we call the Grade Point Transformation. In
this approach, survey data are transformed into a 0-- 4 scale using the
following formula:
(observed score minimum-possible scale value) * 4
________________________________________________________________
maximum-possible scale value minimum-possible scale value
For a typical 5-point scale then, a rating of 5 would be transformed into a
4.0 GPA, and a mean rating of 4.12 becomes a GPA of 3.12. For a 7-point scale, a
mean value of 5.67 becomes a GPA of 3.11, and a mean of 3.70 becomes a GPA of
1.80. This type of transformation could help the survey audience better
understand the reported results within a context with which they are very
familiarthat is, a grade point average. Given the grading systems typically
used in school in the United States, most executives and managers are likely to
be familiar and comfortable with assessing and interpreting GPAs. Because of
this, the transformed means are likely to have an intuitive appeal that may
promote clarity and understanding. For added effect, one could add letter grades
to the presentation to serve as scale anchors (particularly given executives
propensity for displaying and grading various sources of information in
general).
The second alternative to the mean is what we call the Test Score
Transformation. Here, survey data are converted to a more familiar 0-- 100
scale. This linear transformation is accomplished as follows:
(observed score minimum-possible scale value) * 100
________________________________________________________________
maximum-possible scale value minimum-possible scale value
Again, in the case of a standard 5-point scale, a survey rating of 5 would be
transformed into a score of 100, while a mean rating of 4.12 becomes a score of
78. For a 7-point scale, a mean value of 5.67 becomes a score of 77.8, and a
mean of 3.70 yields a scored value of 45. As with the GPA approach, this
transformation presents the survey results in a more familiar contextfor
example, a test score. Given the prevalence of testing in educational settings
and its connotations with performance, it too represents a familiar and
easier interpretable solution for presenting survey findings to those who have
difficulty with standard mean scores. Moreover, it can still be reported as a
mean (preferably with a standard deviation) without sacrificing clarity or
interpretability. Table 2 provides two examples of how the above transformations
might be applied to reporting survey findings.
Table 2: Examples of Linear Transformation Methods
| |
Percentage of sample
reporting each scale value |
| Scale values |
Example one:
Very centered |
Example two:
Well distributed |
| 7 very satisfied |
0 |
40 |
| 6 |
5 |
25 |
| 5 |
70 |
10 |
| 4 |
20 |
10 |
| 3 |
3 |
5 |
| 2 |
2 |
5 |
| 1 very dissatisfied |
0 |
5 |
| Mean score |
4.73 |
5.5 |
| Test score transformation |
62.2 (out of 100) |
75.0 (out of 100) |
| GPA transformation |
2.5 (GPA) |
3.0 (GPA) |
| Percent favorable |
75% |
75% |
In sum, although quite simple, these two transformations may provide the key
to helping managers, executives, and other organization members understand,
interpret, accept, and ultimately make better use of their organizational survey
results. Moreover, since the display adjustment is made after the data have been
collected and does not affect analyses, it is virtually transparent to the end
users. As a final note, however, it is important to remember to always report
standard deviations (whether adjusted or otherwise) when reporting mean scores
of any type.
Conclusion
Clearly, the process of reporting organizational survey research results
is an important one, and yet it is easily susceptible to obfuscation. As we have
tried to demonstrate here, survey researchers need to move away from a reliance
on data-collapsing approaches such as the percent favorable, and more into the
use of transformed means and standard deviations. In addition, we must be highly
sensitive to the pitfalls and methodological impediments to meaningful normative
comparisons.
References
Church, A. H., & Waclawski, J. (1998). Designing and using
organizational surveys. Aldershot, England: Gower.
Church, A. H., & Waclawski, J. (2001). Designing and using
organizational surveys: A seven-step process, San Francisco, CA: Jossey-Bass.
Edwards, J. E., Thomas, M. D., Rosenfeld, P., & Booth-Kewley, S. (1997). How
to conduct organizational surveys: A step-by-step guide. Thousand Oaks, CA:
Sage.
Folkman, J., (1998). Employee surveys that make a difference: Using
customized feedback tools to transform your organization. Provo, UT:
Executive Excellence.
Hyman, H. H., & Sheatsley, P. B. (1950). The current status of American
public opinion. In J.C. Payne (Ed.), The Teaching of Contemporary Affairs,
1134. New York: National Education Association.
Jones, J. E., & Bearley, W. K. (1995). Surveying employees: A
practical guidebook. Amherst, MA: HRD Press.
Kraut, A. I. (Ed.), (1996). Organizational surveys: Tools for assessment
and change. San Francisco, CA: Jossey-Bass.
Lees-Haley, P. R., & Lees-Haley, C. E. (1982, October). Attitude survey
norms: A dangerous ally. Personnel Administrator, 89, 5153.
Rogelberg, S. G., Church, A. H., Waclawski, J., & Stanton, J. M. (in
press). Organizational survey research: Overview, the Internet/Intranet and
present practices of concern. In Rogelberg, S. G. (Ed.), Handbook of Research
Methods in Industrial and Organizational Psychology. England: Blackwell.
Schuman, H., & Presser, S. (1996). Questions and answers in attitude
surveys : experiments on question form, wording, and context. Thousand Oaks,
CA: Sage.
Schwarz, N., Bless, H., Strack, F., Klumpp, G., Rittenauer-Schatka, H., &
Simons, A. (1991). Ease of retrieval as information: Another look at the
availability heuristic. Journal of Personality and Social Psychology, 45,
513523.
Schwarz, N., & Hippler, H.J. (1995). Subsequent questions may influence
answers to preceding questions in mail surveys. Public Opinion Quarterly, 59,
9397.
Strack, F., Schwarz, N., & Gschneidinger, E. (1985). Happiness and
reminiscing: The role of time perspective, mood, and mode of thinking. Journal
of Personality and Social Psychology, 49, 14601469.
Tourangeau, R. & Rasinski, K.A., (1988). Cognitive processes underlying
context effects in attitude measurement. Psychological Bulletin, 103,
299314.
April 2001 Table of Contents | TIP Home
| SIOP Home
|