Last week, the Court essentially laughed away a challenge to the law. Its reforms addressed a number of different facets about education, and the suit was based upon the very fact of diversity in the law. Having already warned an obstinate lower court judge that the law should not be taken to violate the single object constitutional standard for legislation, only to have him stubbornly insist that it did, the Court conclusively decided otherwise unanimously. The legal tactic by the plaintiffs, a motley collection of American Federation of Teachers state and local units, was a cover to try to cancel policy they didn’t like, most controversially for them making teacher tenure more difficult to attain and basing that decision for about 35 percent of all teachers on a value-added measure anchored on standardized testing.
In the short term, this changes nothing. Given the wrenching changes from implementation of and the debate ongoing about the Common Core State Standards, the use of a VAM, defined as changes in student scores from one year to another on standardized tests, as an input into tenure and retention decisions was delayed for the last and this school year. Even as CCSS is expected to increase the profile of standardized testing and expand it to be able to include more teachers (assuming state policy-makers don’t pull the plug on it, as some will attempt next year), education leaders, including Louisiana’s, are wondering whether testing needs to be more selectively applied.
There is considerable truth to that sentiment, for the more testing there is, the less time is available for learning. Relevant to both teacher and school ratings, standardized tests are used for the former, where they exist in a subject area, in scoring half of a teacher’s evaluations, with the other half being observation. While the logic for this use is compelling – research demonstrates that change observed over time in a student’s academic progress is significantly attributable to the effort of a teacher – during the next legislative session in the few months before the rule suspension period ends this system can be tweaked to improve the quality of the evaluation.
Some changes can address the inherent strengths and weaknesses of the VAM. Best practices note some imprecision in measuring different kinds of students, perhaps most notoriously that student growth potential is far higher among the lowest performing students that the highest, simply because the latter already are at a level where incremental gains become less likely and more difficult to entice. Some interpretations of a VAM factor in all sorts of presumably intervening conditions, such as gender and poverty level to make assessment by VAM a multivariate exercise, rather than a simple univariate cause of teaching quality and change in scores being the effect, but whether demographic factors such as those are needed is debatable, as this carries the assumption that less should be expected from these kinds of students in terms of growth if teachers will be graded less sensitively for those kinds of students, in essence thrusting a soft bigotry of lower expectations on them.
Also problematic is that there are some subjects that simply do not make themselves amenable to standardized testing, such as art and music. The current practice is to work out a set of goals for a teacher and then evaluate whether they have been reached, which creates one set of teachers whose evaluations are half objective and half subjective while the other is almost entirely subjective, which brings up questions of fairness. That also may be an issue with the weighing the objective part receives for those in eligible disciplines, as research shows the VAM contributes but that its moderate degree of validity and reliability means it should be used only to a moderate degree in an overall evaluation. Finally, research has revealed that a pretest/posttest regime measuring from the beginning of the year and at its end has greater validity than using the previous year’s posttest as the current year’s pretest, as long absences from the classroom such as experienced during summer have idiosyncratic effects on children’s retention.
Thus, to produce an evaluation process maximizing validity and reliability, some changes should be made in Louisiana’s system starting academic year 2016. For every grade level for every testable subject (the American College Test requirement for seniors aside), there should be just three standardized exams given a year – at the very beginning of the school year as a pretest and so that teachers understand what they have and where they need to go with it, at the beginning of the calendar year to measure how far along the class as a whole has gotten and what needs to be done for successful completion, and then at the end of the year. Further, in making comparisons of scores from the beginning to the end of the academic year, an algorithm needs to be worked out where change from the pretest among higher performers is weighed more heavily than for lower performers.
Some grades may have children too young for standardized testing to be meaningful, so for teachers of those the goals system should be employed instead, as well as for those disciplines where standardized testing can’t be done. Where possible, tests used should be the same ones being used nationally so that additional tests beyond three a year a minimized.
Also, instead of counting for half of the evaluation score, the VAM (or goals system), as well as observation, each should be cut to one quarter, keeping in line both with the relative power of the VAM to validly measure student growth and of the subjectivity of observation. To take up the remaining half, two other components, widely agreed upon by researchers as indicative of quality in teaching, should be introduced. One quarter should be devoted to a subject knowledge test taken each year by teachers, as done already in many states, because teachers with a poor grasp of the subject material they teach only can retard student progress, while knowledgeable teachers have a far greater opportunity to help students excel. The other quarter should be based on a best practices rubric, including such things as assessment of items like class policies, syllabus construction, assessment methods, exam construction, etc.; in other words, not issues of class management that observation would cover.
The end result produces more fairness among evaluation of teachers across all disciplines relative to each, brings greater objectivity to measuring teacher capacity, and minimizes potential validity and reliability concerns surrounding measuring teacher performance. These alterations might take more than a few months (after passage of the law enabling these) of transition and planning, so they could be delayed until AY 2017 with another year’s suspension of the current law thrown in, giving teachers another year of a trial run to adjust to a system that, unlike prior to the reforms, asks for genuine accountability through objective measures.
The system now in place, if not yet implemented as far as consequences are concerned, is a vast improvement over the previous one where annually statewide more tenured teachers died than were fired for incompetence. But it can be made even better by these kinds of improvements.