Evaluating Evaluations

Last month a friend in the history department passed along a notice from the American Historical Association entitled “AHA Signs onto ASA Statement on Teaching Evaluations.” This ASA is the American Sociological Association, and their statement is a devastating takedown of using Student Evaluations of Teaching (SETs) as a tool for evaluating faculty performance. Just to give a taste,

Despite the ubiquity of SETs, a growing body of evidence suggests that their use in personnel decisions is problematic. SETs are weakly related to other measures of teaching effectiveness and student learning (Boring, Ottoboni, and Stark 2016; Uttl, White, and Gonzalez 2017); they are used in statistically problematic ways (e.g., categorical measures are treated as interval, response rates are ignored, small differences are given undue weight, and distributions are not reported) (Boysen 2015; Stark and Freishtat 2014); and they can be influenced by course characteristics like time of day, subject, class size, and whether the course is required, all of which are unrelated to teaching effectiveness.

While some schools are doing away with SETs voluntarily, many faculty still depend on their numbers at the end of the semester to keep their job. The last time I was on the job market, some schools required evaluations as a part of their application process. I will certainly be submitting mine next fall when I compile my tenure dossier. But we all jump through these hoops despite the well-documented fact that these evaluations are strongly influenced by all the factors above, but even worse, by how white and/or male the professor happens to be.

Those looking for more specifics on the unreliability of student evaluations are welcome to look at the extensive reference list of the ASA statement, or this lovely post by Anna Haensch on the AMS Blog on Math Blogs, or the other posts linked below. What I’d like to talk about is What do we, the untenured, do about this?!

The ASA statement gives lists of suggestions, though most of them are at the institution level. They recommend everything from renaming the evaluations themselves, to tweaking the language of the questions, to implementing a much broader and more holistic process of evaluating teaching effectiveness. They even go so far as to recommend that individuals not be compared to campus or even departmental averages.

My school has not (yet) done any of that. I could technically refuse to provide my evaluations in my tenure dossier. Emphasis on technically. The faculty promotion committee would definitely view it as possibly (probably?) troublesome without a boatload of documentation on my reasons for omitting them, and I’d certainly need to provide a lot of other evidence of teaching effectiveness. Who knows what the Provost and Trustees would think. I don’t think anybody’s ever risked it. I certainly won’t.

What I have done is cited relevant research on the unreliability of SETs during workshops with the faculty on the promotion committee, and at least publicly they’ve all agreed that student evaluations are at best problematic and at worst useless. But, of course, they still want to see them. There’s probably room at my institution to begin a more formal transition away from SETs, but I don’t think anybody in a position to make change has the energy for that particular fight right now. Certainly official statements from professional societies like the ASA’s will provide some useful ammunition. Maybe after I get tenure…

What I should do in the meantime is follow Jaqueline Dewer’s advice from the On Teaching and Learning Mathematics blog on interpreting your evaluations, which she calls ratings (“I refer to them as student ratings, not evaluations, because “evaluation” indicates that a judgment of value or worth…while “ratings” denote data that need interpretation.”) She recommends possible tweaks to the language of the questions your school uses, and also gives good targets for response rates (75-80%), and advice for how to compare your course to others. The “When Good Teaching is the Average” section really jumped out at me. I’ve had friends (not at my school) who’ve scored 4.0 out of 5 when the department average is a 4.3, and been made to feel like they wouldn’t pass their mid-tenure review because of it.

I can also follow Adriana Salerno’s advice from a workshop she attended on interpreting evaluations without losing your mind. That is, if I can get up the guts to look at them before it’s time to put them in my dossier.



This entry was posted in Uncategorized and tagged . Bookmark the permalink.