Traveling in Cyberspace:
Psychology of Software Design, Part II
Usability Evaluation
J. Philip Craiger
University of Nebraska at Omaha
In my last column, I presented a description of the software1
design process and showed how psychology plays an important role in building
successful interactive software. In this column I will discuss the second part
of the design process, usability evaluation. Usability evaluation is a
form of testing that is applied to the design of computer software, in
particular, the interface with which users interact. Essentially, it allows the
design team to determine the extent to which an interface will support users in
doing whatever they need to do (whether work or play).
1I use the term "software" and
"interface" interchangeably throughout this column. As I discussed in
my last column, the interface is the part of the software with which the user
interacts, and therefore, to the user, the interface IS the software.
The terms "usability" and "user-friendly"a term I
assume most of you have heardare loosely interchangeable. Usability has been
defined more precisely as:
the extent to which a product can be used by specified users to achieve
specified goals with effectiveness, efficiency, and satisfaction in a
specified context of use. (Karat 1997, p. 691).
There are numerous usability evaluation methods, and they differ in terms of
cost of implementation, who works on the evaluation team (actual users,
usability experts, or a combination thereof), the effectiveness in finding
usability problems, and so on. Simpler methods are called discount usability
methods because they are low-cost and relatively easy to use. At the
opposite end of the complexity and effort spectrums is user testing. Full-blown
user testing involves real end users of the software performing real tasks
with a fairly complete and functional high-fidelity software system in a
laboratory setting.
Why is usability important? Software that is difficult to use has many
ramifications ranging from personal dissatisfaction and lower productivity to
increased training costs. "The cost of less-than-user-friendly software can
be astonishingly highthe combined result of unnecessarily high training and
customer support costs, unnecessarily low productivity, and lost market
share" (Mayhew, 1999, p. x).
Following the user-centered design principles I described in my last column
does not guarantee a product's usability. Rather, one might say that
user-centered design is a necessary but insufficient condition for usability.
Interface design is not a one-shot deal. It requires numerous iterations to get
it right (i.e., to ensure a software product's usability). It may be helpful to
think of interface design like the process of writing. The author can start out
with an understanding of the intended audience, a clear theme, an outline,
thorough background research, and so forth, but that doesn't guarantee that your
first draft will be perfect. It usually requires numerous iterations of writing,
reading, revising, reading, revising, and so forth. This is analogous to
software design: develop the concept, generate the design, evaluate it,
redesign, evaluate, redesign, and so on, until it is right.
Aspects of Usability
Usability is actually a multidimensional construct whose meaning is derived
from the aspects of the software that affects it purpose and use. That is,
criteria used to evaluate software usability will vary depending upon who will
use the system, and the characteristics of the tasks for which the system is
used. The most common usability criteria (Nielsen, 1993; 1994) include:
Productivity and efficiency: Software should be designed
so that users can perform tasks quickly and efficiently.
Minimize errors: Software should be designed to prevent
errors.
User control: Software should allow users the freedom
and control to complete a task as they see fit. It should not force users to
follow one path through a task.
Ease of learning: Software should be easy to learn. If
software is used infrequently, it should be easy to remember how to use it
User satisfaction: Users should enjoy using the software
to do their job.
In the best of all worlds, all software would reflect each of these criteria:
be easy to learn; promote fast and efficient work; minimize errors; allow users
flexibility in sequencing tasks; and achieve a high level of satisfaction with
all users. This, of course, is not a realistic situation. Designing to achieve
one criteria may have an adverse effect on another usability criteria. To
illustrate, say you have two usability goals: to maximize ease of learning and
to provide user control and flexibility. Making the software easy to learn may
require limiting the number of choices, such as menus, buttons, and so forth,
available to a user (otherwise, it would be too confusing for novice users). Or,
to make it easy to learn, the design could force users to complete steps in a
strictly linear fashion (e.g., the typical "wizards" you encounter if
you have ever installed any software). These design decisions have the effect of
limiting the flexibility that a user has in completing the task. Certainly,
expert users would like more choices, as well as the freedom to complete the
same task in different ways (to alleviate boredom, to accommodate their personal
work style or mood, and so on). Notice that the reverse may hold. Providing the
user the freedom to accomplish a task in multiple and varied ways may make the
software more difficult to learn.
Note also that there are both positive and negative relationships among the
usability criteria. For instance, we expect positive relationships between user
satisfaction and the remaining criteria (easy-to-learn, fast, and flexible
software is more satisfying to users). Other criteria, clearly, are negatively
associated. The more errors that are committed by a human, the less efficient
and productive he or she is.
The usability criteria are not selected arbitrarily by the designers. Rather,
the selection of these criteria is determined by the types of users who will be
using the software, and the task for which they are using the software. For
example, if most users are novices (due to high turnover, or seasonal work), and
reduced training costs are important, then ease of learning may be the most
critical criteria. For telephone assistant operators, speed of execution and
minimal errors would be important. For a child's computer game, ease of learning
may be more important than speed of execution and efficiency.
Now let us turn to a description of three usability evaluation methods. I
will start with heuristic evaluation, a fairly simple discount usability method.
Heuristic Evaluation
Heuristic evaluation is a method of evaluating an interface based on several
simple heuristics. It involves usability experts providing independent
evaluations of a prototype to identify potential violations of the heuristics.
Heuristic evaluation is considered a discount method because it is relatively
inexpensive, for example, it typically does not involve actual end users of the
software, nor necessarily consider the full range of actual tasks that users
will perform.
Nielsen (1994) identified 10 heuristics that account for the majority of
usability problems. For example, heuristics of good design include:
Visibility of system status: The software should inform users
about what is happening through appropriate feedback.
Match between system and the real world: Language and concepts
should be familiar to the user. Information should appear in a natural and
logical order.
Consistency and standards: Users should not have to wonder
whether different words, situations, or actions mean the same thing. Platform
conventions should be followed.
Error prevention: The design should prevent errors from
occurring.
Recognition rather than recall: Objects and actions should be
visible and apparent to the user. Users should not have to recall information
when it could be provided by the software.
The reader is referred to Nielsen (1994) for a description of the remaining
heuristics. A heuristic evaluation is conducted by a group of evaluators,
working independently, applying the set of heuristics to a prototype. These
independent evaluations are aggregated, and the violations which occur most
frequently across the evaluations indicate a problem with the design.
Figures 1 and 2 below illustrate examples of bad and good design,
respectively. The graphic is a simple login screen (I assume that most of you
have had to login to a computer system at least once). I have applied the
heuristics to the first screen (Figure 1), and below I've listed the violations
of the heuristics. Figure 2 illustrates an alternative design which has
corrected the violations.
Figure 1. A login screen with several design violations
The violations are as follows:
The sequence of input fieldsbased on a vertical task flowdoes not match
a traditional login sequence. Users are typically asked for their username
first, and then their password. Here we have the reverse, which will undoubtedly
cause problems. For example, more experienced users may not pay attention to the
labelsbased on their expectations of the sequenceand inappropriately type
their username in the password field. Heuristics violated: Match between system
and real world, consistency and standards, prevent errors.
Improper labeling of the "username" field. Novice users may be
confused and may type in their real name ("Filo J. Farnsworth")
instead of their username ("ffarnsworth"). Heuristics violated:
Consistency and standards, and prevent errors.
There is no visible way to complete the sequence of the login. That is, after
completing the two fields, what does the user actually do to login? Typically, a
button is provided which the users presses to complete the login procedure.
Although not obviousbecause it isn't visiblethe user has to type a special
key (e.g., F7) to complete the procedure. Novice users would have difficulty
knowing what to do. Moreover, because this is a nonstandard way to complete the
sequence, intermittent users (e.g., someone using the software once a month)
would probably forget what the special key is. Heuristics violated: Recognition
over recall, and consistency and standards.
Figure 2 demonstrates an alternative design that corrects the aforementioned
violations.
Figure 2. A login screen with violations corrected.
Cognitive Walkthrough
The cognitive walkthrough procedure was developed to evaluate the
learnability of a software system, and allows designers to answer the question:
"How easy would it be for a particular set of users to learn this
software?" Similar to the heuristic evaluation, and unlike user testing,
actual end users are not required to conduct a walkthrough. Rather, designers,
usability specialists, software engineers, developers, and the like, are used as
surrogates for users.
Two essential components of the walkthrough include a prototype of the
system, and an explicit understanding of the intended users and any
characteristics about prior knowledge and training, experience with similar
software, or other assumptions that would effect the user's ability to learn the
software (Lewis & Rieman, 1993).
The walkthrough proceeds by examining each step the user would take to
accomplish a task, and trying to tell a believable story as to why the
users would choose a particular action (Wharton, Rieman, Lewis, & Polson,
1994). For example, if the first step in a particular task requires the user to
press a button labeled "update," then a believable story would explain
why a user would be likely to accomplish that step. Note that believable
stories are based on assumptions about the user's background knowledge and what
they are trying to accomplish, and on an understanding of the elements of the
software that would enable a user to determine the appropriate action, and is
why it is critical that these assumptions be expressed explicitly at the outset.
As described in Wharton et al. (1994), for each step required to accomplish a
task, the evaluators ask the following four questions:
Will the users understand that there may be a subgoal to complete
before they begin? For instance, if the user's task is to print a
document, will the user know that they must select a printer first?
Will the user notice that the correct action is available? That
is, is there something that is visible to the user, such as a button, a menu, a
text field, that gives the user a clue as to what to do next?
Will the user associate the correct action with the effect to be
achieved? This question speaks to the association between the
goal of the user, and the visible parts of the interface. For example, in Figure
1 above, would the intended users know that they had to select the F7 key to
complete the login process? How would the user know that they were required to
press the F7 key to complete the login? They wouldn't, so the answer to this
question would be "no." This would indicate a usability problem, and
the software would need to be revised such that the evaluation team could answer
"yes" to this question (e.g., by modifying the screen to resemble
Figure 2).
If the correct action is performed, will the user see that progress is
being made toward their goal? This question relates to feedback provided
by the software. For example, if the user presses a button to print a document,
would there be some type of feedback that would let the user know that the
document is being printed?
At each step of the sequence required to perform a task, the evaluation team
will try to construct a believable story that defines success (i.e., answering
"yes" to each the questions above). If any question receives an answer
of "no," then the evaluators found a problem with the software, and an
alternative design should be considered.
User Testing
The most comprehensive evaluation technique is user testing. User testing
involves a set of actual end users conducting a set of real tasks using a
high-fidelity and functional prototype in an laboratory setting. It is the most
costly and effortful usability evaluation method, however, it is also the method
most likely to identify the most comprehensive list of usability problems.
Because of its expense, user testing is often limited to larger companies with
the money and experienced personnel to conduct such a test. Even then user
testing can be delayed to the end of design process because of the need to have
a fairly complete and functional prototype.
User testing involves actual users working through tasks on the prototype
while members of the evaluation team record information relevant to usability,
such as errors made, time to complete a task, frequency of errors, and so on.
These situations may be video- or audio-taped for analysis at a later time. A
technique called thinking aloud is often used to gather additional
feedback. Here users are asked to "think out loud" regarding what they
are doing or thinking as they are working through a task. Data from thinking
aloud provides valuable information that designers would otherwise be unable to
gather, such as the users' goals and intentions as they are working through a
task, and reasons why they chose a certain option or course of action.
Finally, the actions that users perform can be captured by the computer and
saved to a file for examination by the evaluation team after the user test.
Logging users' actions provides very detailed information on not only what the
users did during the task (what buttons they pushed, menus they accessed, etc.),
but also how long it took them to complete various aspects of the task.
Conclusion
The three evaluation methods described above are not necessarily mutually
exclusive. For example, a design team may apply the cognitive walkthrough
procedure very early in the design process, and once a more functional prototype
is available, move to user testing. Or a design team may decide to use all three
methods at one point or another. Either way, usability evaluation is essential
in determining the usefulness of interactive software products, and some form of
evaluation is better than none.
Computer systems are playing an increasingly important role in our lives.
More and more jobs require some computer usage. Many of our home appliances,
once purely mechanical, are now driven by a tiny computer chip. Although some
computer systems are "embedded" (i.e., not for direct human use), most
computer systems are made for direct human interaction. Consequently, it is
important that humans are able to use the system to do what they need to do, and
usability evaluation plays a key role in determining a system's usefulness. For
those of you who are interested in reading more about usability evaluation, I've
provided some additional references below.
Further Reading
Hix, D., & Hartson, H. R. (1993). Developing user interfaces: Ensuring
usability through product and process. New York: John Wiley & Sons.
Karat, J. (1997) "User-centered software evaluation methods." In M.
Helander, T. K. Landauer, and P. V. Prabhu (Eds.), Handbook of human-computer
interaction. Amsterdam: North-Holland.
Lewis, C., & Rieman, J. (1993). Task-Centered user interface design: A
practical introduction.. A shareware book published by the authors. Original
files for the book are available by FTP from ftp.cs.colorado.edu.
Mayhew, D. (1999). The usability engineering lifecycle: A practitioner's
handbook for user interface design. San Francisco: Morgan Kaufmann.
Nielsen, J. (1993) Usability engineering. Boston, MA: Academic Press.
Nielsen, J. (1994) "Heuristic evaluation" In J. Nielsen and R. Mack
(eds) Usability inspection methods. New York: John Wiley and Sons.
Nielsen, J., & Mack, R. (1994.) Usability inspection methods. New
York: Wiley & Sons.
Preece, J., Benyon, D., Davies, G., & Keller, L. (1993) A guide to
usability: Human factors in computing. New York: Addison Wesley.
Rubin, J. (1994) Handbook of usability testing: How to plan, design, and
conduct effective tests. New York: Wiley Technical Communication Library.
Spoor, J. M., Scanlon, Schroeder, W. T., Snyder, C., & DeAngelo, T.
(1998). Web site usability: A designer's guide. San Francisco: Morgan
Kaufmann.
Wharton, C., Rieman, J., Lewis, C., & Polson, p. (1994). The cognitive
walkthrough: A practitioner's guide. In J. Nielsen and R. L. Mack (Eds.), Usability
inspection methods. John Wiley and Sons, Inc.
January 2000 Table of Contents | TIP Home | SIOP Home