If the Educational Testing Service and its competitors are right, the lives of English teachers across the country will get a lot easier this fall thanks to various automated scoring programs that can grade student essays in a fraction of the time teachers spend. Artificial intelligence techniques can now judge anywhere from 50 to 100 features, with more likely to come (“Automated Essay Scoring Remains An Empty Dream,” Forbes, Jul. 2).
But whether they can identify good writing is an entirely different story, which is why English teachers remain skeptical. They don’t dispute the ability of robo-graders to rapidly scan student essays for basics like spelling, grammar, vocabulary, and sentence structure. They’re grateful for being relieved of that drudgery. Evaluating creativity, however, remains beyond robo-graders’ capability.
That’s the essence of the controversy. All art forms by their very nature are idiosyncratic. If they weren’t distinctive, they wouldn’t receive critical acclaim. AI is a whiz at carrying out formulaic tasks. When it comes to assessing originality, however, it fails. If that were not so, then editors at all major newspaper would rely on robo-graders to determine which op-eds to publish. Think of the cost savings that would accrue if computers could do a better job.
High school English teachers are unique among their colleagues because of the heavy paper load they carry. Composition classes eventually require students to engage in what is called a creative response, rather than a selected response. Teachers of other subjects routinely use multiple-choice and true-or-false questions to determine what their students have learned. These tests can be machine scored rapidly, cheaply and objectively.
Essays that are designed to probe the ability of students to make an argument, for example, take time and thought to evaluate. If all we want are students who can demonstrate their basic knowledge by writing a two-or-three sentence paragraph, then robo-graders are indeed a godsend.
Yet taxpayers demand more of schools than just that. They want evidence of critical thinking. AI is presently used by Utah and Ohio, as well as by several other states in scoring their standardized tests. Massachusetts, which is known for the quality of its public schools, is considering using robo-graders on its state-wide Massachusetts Comprehensive Assessment System tests. Whether education officials there will be satisfied is unclear.
The National Council of Teachers of English correctly predicts that students will eventually learn how to game the system. There’s already evidence that this is happening. Essays making little sense can receive a high score as long they tick off all the boxes that robo-graders seek. Length, for example, impresses the computer. Therefore, if Lincoln’s 272-word Gettysburg Address were submitted, it would be downscored for its brevity compared with, say, a typical State of the Union address.
Lofty words and key phrases also rate high in the mind of a computer. But these alone do not constitute graceful writing. In fact, they detract from it, as Ernest Hemingway proved. Vendors will argue that they can minimize such shortcomings by tweaking their products. I doubt that.
(To post a comment, click on the title of this blog.)