Loren J. Naidoo, California State University, Northridge

The other night I was up late working while staying at a vacation home—that’s a sad sentence! I was trying to write and not getting anywhere. I got tired of staring at my laptop screen, so I turned on the TV to distract me from my lack of progress (generally, a terrible idea!). I don’t usually do this because I don’t have cable TV at home. I ended up watching most of a movie called “Doctor Strange,” a visually stunning and borderline incoherent Marvel superhero movie. As far as I could determine (and sure, I wasn’t watching too carefully), the titular hero, his robed associates, and the generic bad guys could open portals to other dimensions and alter reality in various kaleidoscopic and visually confusing ways. Anyway, the basic idea seemed to be that our experienced reality is limited and largely determined by our beliefs.

All of this made me think about grading (sad sentence #2). So, using Dr. Strange as a theme, I’d like to present some contrarian ideas about grading. We are going to open sparkly portals to other dimensions of grading, peer inside, and hopefully go on a spiritual journey from which we will return forever changed! It’s important to say that I’m not the first person to talk about these ideas. However, aligned with our Dr. Strange theme, we aren’t going to burden this column with evidence, theories, or any kind of logical sequencing of events—the goal is to be entertaining! Finally, to fully duct tape this theme together, I will organize the topics in terms of different dimensions of reality. So, “forget everything you think you know,” ready your hands dramatically, flail at the air in front of you, and open the portal to…

1. The Percentage Dimension

Here’s what the percentage dimension looks like. Most class assignments are graded out of 100, and for those that are not, both the grade (e.g., out of 20) and the percentage are reported. Multiple choice exams are beloved in this dimension! If a student gets 20 out of 20 multiple choice questions correct, then a grade of 100% has intuitive meaning to students. Similarly, a zero grade for getting every question incorrect makes sense. Outside of this dimension, there aren’t too many contexts in the practice of I-O psychology where there is a single correct answer among a set of incorrect answers to a given problem. There are other approaches to multiple choice tests, including ones in which more than one answer can be correct, and where incorrect answers receive negative scores to discourage guessing, but these are less frequently used and don’t substantially change the interpretation of percentage scores. Outside of the context of multiple choice tests, it becomes progressively less and less clear how to interpret a percentage grade. Take, for example, a paper assignment. What does a 100% grade on a paper mean? That the student did exactly what you wanted them to do? That there was fixed domain of knowledge that was being assessed and they demonstrated mastery of the entire domain? Does a 90% grade mean the student needs to exert 10% more effort next time, that 10% of the required content was missing, or that 10% of what they wrote was incorrect? I wonder how students interpret percentage grades beyond a vague sense of goodness versus badness, or their equivalent in letter grades—about which we could ask the same kinds of questions. What about assignments that involve creativity, in which, presumably, we want students to develop new answers and ideas that the instructor has not thought of? What do percentage grades mean in that context? Of course, a good grading rubric provides students with a verbal description of each performance level, but I suspect that the real reason we dwell in this dimension so much is so that we can get to the next dimension…

2. The Average Dimension

Here’s what calculating grades at the end of the semester might look like in the average dimension. There’s a big Excel spreadsheet. Students are in rows, assignments in columns. Each column has a cell that specifies what the assignment was out of and another cell that specifies the weighting of the assignment relative to the total grade. Students’ scores are divided by the “out of” cell, multiplied by the weight cell, and then summed to form a final percentage grade. There may be aspects of the grading scheme that slightly deviate from this weighted average approach—bonus assignments, a set of assignments from which the top X scores are counted, etc. Regardless, final grades are fundamentally quantitative. Students usually aren’t given any kind of overall qualitative evaluation of their performance over the entire class. The principle advantages of the weighted average approach are that (a) it is a relatively objective, unbiased way to achieve an overall assessment of student learning, (b) it is relatively easy to do with a computer or calculator, and (c) arriving at a single numerical grade for each student for each class makes it easy to then numerically calculate a weighted average that represents their overall academic performance, which then lets us compare students to each other and to external standards.

Why do we spend so much time in the average dimension? The average is but one indicator of central tendency, and one that we know is not ideal in all circumstances (e.g., when there are outliers, such as when a student bombs one test but does well on everything else). Why not, for example, weight assignments equally and use the median of the scores? The use of averages is so ingrained that students are generally resistant to the idea of using anything else. I can remember raising this issue early on in an introductory statistics class. I asked my students if we should use their median daily participation grade rather than their average. Students felt that using the median would in some sense mean they aren’t getting credit for their work that resulted in grades above the median. Ultimately, the more time we spend in the average dimension the easier it is for us to travel to the evaluative dimension, and the harder it is for us to access the development dimension (see Dimensions #6 and #7).

On another note, in taking the average we do not consider the trajectory of performance. For example, in I-O psychology practice, an employee might receive performance feedback indicating that they are below standard for a particular competency. They might then work to develop that competency, maybe with some successes and some failures, eventually reaching the point where they have met or maybe exceeded the standard. From a practical standpoint, this is a trajectory you would want to see in your employees. To which of two employees with the same average score on customer service over the past 5 years would you assign the task of working to solve a sensitive issue with a disgruntled client—the one who has been consistently mediocre or the one who started out below standard but is now above standard? Partly the failure of the average dimension is to consider trajectory may be due to our tendency to dwell in the next dimension…

3. The One-and-Done Dimension

In this dimension we only give an assignment once in a class. Students are graded and provided no opportunity to resubmit their assignments. Classes may be divided up according to textbook chapter, with each chapter reflecting a different domain of knowledge. Assessments target discrete domains of knowledge, there is little carryover of knowledge from assessment to assessment, and no general expectation that students’ scores should go up over time. I’ve been in this dimension, uncertain how to respond to a student who asked me to regrade the paper that they had revised based on my original grade and feedback. Many courses avoid this dimension by having cumulative final exams which reassess material that may already have been part of midterm exams or some other assignment.

4. The Incentive Dimension

In the incentive dimension, grades are magically transformed into carrots and sticks. Here grades are used to incentivized behaviors desired by the instructor (e.g., attending class), and to disincentivize undesired behaviors (e.g., showing up late to class). These grades may have little or nothing to do with student learning, knowledge, or performance. Particularly interesting examples of practices in the incentive dimension include that of punishing students for late submissions of assignments and rewarding students for perfect attendance. For the former, long time incentive dimension dwellers might argue that when students enter the workforce, they will need to be able to meet deadlines and won’t have the luxury of asking for an extension—that they must learn to “be responsible.” This is nonsense, of course. Yes, sometimes deadlines are impossible to shift, but probably more often, asking one’s boss for more time on a project is completely reasonable and acceptable. In fact, the more responsible employee may be the one who asks for extra time and/or resources when needed rather than submits poor-quality work. Is our rigid adherence to deadlines really about preparing students for “the real world,” or is it more about convenience for us as instructors? Another incentive dimension argument might be something like “allowing some students to hand in work late without penalty is unfair to the students who completed the work on time.” This is true only if this privilege is distributed unequally. I have shifted away from the incentive dimension view of lateness, giving due dates that are flexible early in the semester, and ones that are fixed later in the semester (when grades are due). This allows students more leeway in managing their workloads (at least early in the semester), which students tend to greatly appreciate, and which might actually be more consistent with most employees’ experiences with time management at work.

5. The Evaluative Dimension and 6. The Development Dimension

As we know from theory and research on performance appraisal in general, and 360-degree feedback in particular, the ostensible goal of the performance-appraisal system, specifically whether the feedback is used for performance evaluation or employee-development purposes, is an important determinant of how that system operates. Similarly, research on mindset and goal orientation has shown that an excessive focus on performance and grades can undermine depth of processing and learning. The more time we spend in the evaluative dimension, where instructors warn students that their exams are difficult and students are more concerned with not failing than they are with learning, the less time we spend in the development dimension, where instructors emphasize learning and improvement, and students pursue learning for its own sake. From a practical standpoint, there seems to be only limited need for the evaluative dimension where GPA might figure into applications to graduate school, financial support, and students’ first job out of college. Clearly, it’s not a sufficiently important need to prevent universities like Brown from declining to assign traditional grades in undergraduate classes. The Evaluative Dimension of grading is pervasive and difficult to escape…

I hope you’ve enjoyed this mystical journey through the grading multiverse.

Let me finish with an inscrutable quote from Dr. Strange, the movie:

Bad guy: “How long have you been at Kamar-Taj, Mister…?”

Dr. Strange: “Doctor”

Bad guy: “Mister Doctor.”

Dr. Strange: “It’s Strange.”

Bad guy: “Maybe. Who am I to judge?”

As always, dear readers, please email me with your questions, comments, and feedback: Loren.Naidoo@csun.edu

