First, let’s start with the classic article, “*How to Improve Your Teaching Evaluations Without Improving Your Teaching*” by Ian Neath from the mid 90’s, in which 20 tips are furnished for gaming your end-of-semester evaluations. Despite the funny title and sort of gimmicky conceit — and at this point somewhat out of date research — it is a serious paper in a serious academic journal. We know more now than we knew then, but a lot of the broad strokes are still the same. Often more than teaching and student outcomes, the class size, maleness, quality of students, and other non-pedagogical factors, play an outsized role in Student Evaluation of Teaching (SET) scores. But there’s more.

In a recent slam-dunk of a meta-analysis by Uttl et al. in 2017, the authors provide strong evidence that when controlled for prior knowledge and sample size “student evaluation of teaching ratings and student learning are not related.” Yup, that’s it, there is no correlation between the learning and the SET scores. Based on this the authors suggest, “institutions focused on student learning and career success may want to abandon SET ratings as a measure of faculty’s teaching effectiveness.” In a post for the *Berkeley Blog*, the statistician Philip Stark talks about some the “statistical considerations” of SET scores.

So we can be reasonably convinced that instructors who get very good evaluations aren’t necessarily bringing better learning outcomes to bear. But maybe they are bringing…something else?

In “Availability of cookies during an academic course session affects evaluation of teaching,” published by Hessler et al. in 2018, the authors prove just that. All things being equal, the presence of cookies leads to higher SET scores, or as the authors so succinctly put it, “the provision of chocolate cookies had a significant effect on course evaluation.” And again, they conclude that it might be unwise to use SETs in important promotion and tenure decisions.

Since then, the research about Student Evaluations of Teaching continues to roll out and continues to undercut my confidence in the system. Most recently, “Gender Bias in Teaching Evaluations” a study by Mengel et al really lit up the internet. This study includes analysis of almost 20,000 student evaluations and makes some important observations about the presence of bias in SETs.

A really nuanced discussion of Mengel et al. appears in a post on the *Rice University Center for Teaching Excellence Blog*. The tables in the original paper are a bit hard to digest, but this post distills some major ideas into an easy to read infographic, and gives good bulleted summaries of the main points. Some takeaways are that bias is more apparent in math than in other subjects, junior women are subject to more bias than senior women, and bias in evaluations follows some in-group patterns, that is, men tend to rate men more favorably and women tend to rate women more favorably. The most appreciable loss is dealt to female PhD students teaching classes of predominantly men, who see -0.26 on a 5 point scale compared to their male counterparts. This number isn’t huge, but still troubling when you consider the particular importance of SETs for young people just beginning their career.

I recently learned that SETs at Villanova this year will also allow students to comment on instructor bias in the classroom. You can read about it in a Wall Street Journal editorial (sorry, paywall), or in this twitter thread from Jeffrey Sachs.

Many universities have started to move away from using SETs as tools in determining promotion and tenure cases. In the US, the University of Southern California caused a stir in spring of 2018 when they announced that they would no longer use SETs in promotion and tenure decision. Since then others have also begun to opt out, and others have begun to offer training on how to correctly interpret the scores once they’ve been collected. Jacqueline Dewar wrote a comprehensive blog post for the AMS blog *On Teaching and Learning Mathematics* about how we might interpret our SETs.

A thing that really frustrates me about all of this is that women, POC, and other underrepresented groups who get lower SET scores by no fault of their teaching, are fooled into thinking they are a “bad teachers” when they’re really perfectly good. Consequently, they redirect the energy they would have otherwise spent on research in trying to fix their teaching, thereby increasing the likelihood that they will be viewed as less serious researchers. This, in a word, sucks.

The end of the semester is barreling towards us, which means SETs will be dropping soon. Has your institution had the talk about SETs? What will you do to prepare your students? Are you bringing cookies on SET day? Do you love SETs? Tell me everything over on Twitter @extremefriday.

I’m thinking that SET day is usually pretty close to the end of the term, so students are feeling, in various ways and to various extents, nervous and negative, more so than at the term’s beginning or middle. That can affect how they evaluate the course; in particular, a course which has been taken by students who tend towards nervousness and negativity might be negatively evaluated. Of course, teachers are all in the same boat wrt this, but it’s is still one of the many variables (other than quality of teaching or how much the students have learned).