### Measuring Teacher Effectiveness

Something I've been thinking a lot about lately is the idea of linking test scores to teacher evaluation. It's a topic that's everywhere this summer:

- The EFF school report card makes the point.
- It's also the heart of Arne Duncan's remarks at the NEA Representative Assembly.
- The ASCD put out a newsletter recently on union/management collaboration; tying test scores to evaluation was a big part of it.
- And of course, there's the state-level hijinx going on vis-a-vis HB2261, the State Board of Education, and the PESB. Wheeee!

Last year, for one of my Master's classes, I dug into testing data I had on hand for the first grade team in my building. These are real numbers and real averages with real kids behind them; the test in question is the Measures of Academic Progress, from the Northwest Evaluation Association.

Teacher A: In the fall, her class had an average score of 162.5 on the MAP. In the spring the class average rose to 184.3, an average gain of 21.8 points.With this data, then, you could argue the case for two different teachers as the "winners" in the group. If you look at the average gain,

Teacher B: Her fall average was 164.7; her spring average, 183.85, for an increase of 19.15 points.

Teacher C: 169.05 in the fall, 189.35 in the spring, so an average gain of 20.3 points.

Teacher D: An average score of 155.30 points in the fall and 174.85 in the spring. Her fall-to-spring gain, then, was 19.55 points.

**Teacher A is your champion**:

- Teacher A: 21.8 points
- Teacher C: 20.3 points
- Teacher D: 19.55 points
- Teacher B: 19.15 points

**Teacher C is far and away your winner**:

- Teacher C: 189.35
- Teacher A: 184.3
- Teacher B: 183.85
- Teacher D: 174.85

But we have to dig even deeper before making a statement about teacher quality, because here the raw numbers aren't telling the whole story.

In the fall, the average score for this test is 164 points. In the spring, the average score is 178. Knowing that, here's some new data to chew on.

In Teacher A's room in the fall, 10 kids scored in the below average range. In the spring, 6 kids scored below average.With this new information, you can make two new arguments. First,

In Teacher B's room, 7 kids were below average in the fall, while 3 were below average in the spring.

In Teacher C's room, 6 kids were below average in the fall, and 3 in the spring.

In Teacher D's room, 16 kids were below average in the fall, and 6 tested below average in the spring.

**Teacher B is your best teacher**because she had more of her kids cross the finish line (the goal score, 178) than the other teachers did. You could also argue that

**Teacher D is your best teacher**because she lowered her percentage of kids who were below standard more than any of the other teachers did.

So, who is your Most Valuable Teacher?

Is it Teacher A, who added the most value to her class over the course of the year?

Is it Teacher B, who had more of her kids meet the year-end goal?

Is it Teacher C, whose class scored the highest in the spring?

Is it Teacher D, who turned around more failing kids than any of the others?

"Value" is a homophone; there's the value signified by the numbers, but there's also the values of the school, the district, and the state which have to be superimposed atop any effort to link the data to the teacher. If the incentive pay/merit pay/whatever pay in this case goes to only one of the four teachers, you're making a statement about the value of the work the other three did, and it's a pretty lousy thing to say to the other three who also made progress that their success didn't matter as much.

Similarly, can we countenance a system where every one of these teachers is given the bonus money, indicating that they all did a good job? In the eyes of some reformers I could see that being too close to what we do now, where every teacher is assumed to be a good teacher. If a merit pay system is intended to have winners and losers, and to inspire the "less-capable" teachers to emulate the "better" teachers, can we really have a 4-way tie?

These are the questions that have to be answered going forward.

If you'd like to see the raw scores presented in a spreadsheet, you can find them here.

## 2 Comments:

This was an outstanding analysis of a controversial topic! I just found your blog through edweek.org. I hope to read more!

Indeed, very well laid out.

As a data wonk of sorts, I fully understand how simple concepts get infinitely complex quite quickly.

What if teacher E had flat gains in the course of her school year by any or all measures, would it be documentation of ineffectiveness? Would that documentation be allowed to result in some form of intervention?

Unfortunately, when we deny ourselves the belief in common values and we deny ourselves the ability to make judgments (to discern/discriminate based upon quality), then we are left with a bizarre, sterile methodology for creating tests, using numbers and assigning artificial values based upon absurdities.

It is why folks like me keep falling back to trusting the goodwill and common sense of parents to have choices, and of the public school managers to manage personnel.

