Is robotic marking a threat to the profession?

Although it has moved away from its position of introducing automated scoring of NAPLAN persuasive writing tests this year, the Australian Curriculum and Reporting Authority (ACARA) is still in favour of automation in the future. While the IEUA acknowledges that automation of simple tests such as multiple choice may be helpful, it has grave concerns about the introduction of robotic marking for more complex written responses, Journalist Sue Osborne writes.

The push for automation because it is cheaper, and the link to increased corporate influence, such as by Pearson Education and Pearson Knowledge Technologies, over the education sector, is of concern.

In 2015 ACARA produced a report An Evaluation of Automated Scoring of NAPLAN Persuasive Writing, which explained that “automated scoring of writing uses computer algorithms designed to emulate human scoring. This is achieved by extracting linguistic features from essays and then using machine learning and modelling to establish a correspondence between these features and essay scores based on a sample of essays that have been scored by human markers”.

University of Melbourne Professor and Australian Institute for Teaching and School Leadership (AITSL) board chair John Hattie told ABC Fact Checker that automated marking was “stunningly successful” with computers five or six
times more accurate than humans, and cheaper.

The report concluded that the marking system “provided satisfactory (equivalent or better) results relative to human marking” and that the “transition to online delivery will provide a better targeted assessment, more precise measurements and a faster turnaround of results to students and schools”.

Teachers, to protect themselves and their schools, could spend significant time teaching students ways to game the machine with strategies that will improve their scores but make their writing less effective.

Human markers

ACARA had planned to use ‘human markers’ to back up the automated marking this year, with the aim of full automation by 2020.

Four companies were engaged to score NAPLAN persuasive essays: Measurement Incorporated, Pearson, Pacific Metrics and Meta Metrics.

The algorithms used are the companies’ intellectual property, and not available for public examination.

According to the BBC, Pearson is testing the use of a robot teacher called ‘Jill Watson’ and looking at a digital education project that combines an interactive textbook, online course and automated tutor.

IEUA NSW/ACT Branch Secretary John Quessy said the IEUA was concerned about the growing commercial influence of companies such as Pearson Australia on education, and the domination of NAPLAN on educational planning.

He said teachers have never been consulted about automated marking.

The NSW Teachers Federation commissioned its own report by MIT Professor Les Perelman, who described the ACARA report as “so methodologically flawed and so massively incomplete that it cannot justify any uses of automated essay scoring in scoring of NAPLAN essays”.

Criticism ignored

Dr Perelman reported in Automated Essay Scoring and NAPLAN: A Summary Report, that the ACARA report ignored a “significant body of scholarship critical of various applications of automated essay scoring”.

He wrote “because automated essay scoring is solely mathematical, it cannot assess the most important elements of a text”. He quoted Pearson’s own report which said “assessment of creativity, poetry, irony or other more artistic uses of writing is beyond such systems. They are also not good at assessing rhetorical voice, the logic of an argument, the extent to which particular concepts are accurately described, or whether specific ideas presented in the essay are well founded. Some of these limitations arise from the fact that human scoring of complex processes like essay writing depend, in part, on ‘holistic’ judgements involving multivariate and highly interacting factors”.

Dr Perelman also reported that automated essay scoring could be biased against certain ethnic groups.

For instance, a study found African Americans, particularly males, were given significantly lower marks by an e-rater than they were by a human marker. This could be because their different verb construction could be easily identified by the machine, and over counted in comparison to a response from a human marker.

He concluded that automated marking “cannot assess high level traits such as quality and clarity of ideas”. Dr Perelman said “teachers, to protect themselves and their schools, could spend significant time teaching students ways to game the machine with strategies that will improve their scores but make their writing less effective”.

Expert teachers

Quessy said “students and parents would have a reasonable expectation that a trained human – a teacher – would mark their written responses”.

He said the current NSW HSC marking system was a model for how all marking should be done. The HSC has been developed over 50 years and is always double marked by two expert teachers.

“ACARA is undermining tried and tested educational practices which rely on the experience, judgment and professionalism of trained teachers, in favour of a machine marking,” Quessy said.

In Future Frontiers, Education for an AI World, University College London Professor of Learning with Digital Technologies Rose Luckin writes that “no AI has the human capability for metacognitive awareness. We must ensure that we use AI wisely to do what it does best: the routine cognitive and mechanical skills . . . The full spectrum of skills and abilities required of teachers is broad and complex . . . AI is not (yet) able to fulfill the entire role of a human teacher.”

In a joint statement late last year, parent, principal and teacher organisations called on the NSW Minister for Education to be informed by a unanimous and firmly held position:

“That the implementation of NAPLAN Online be delayed until at least 2020 so that the issues and concerns identified by parents, principals and teachers may be addressed over the next two years. 

“That robot marking of student writing in NAPLAN not be implemented, either solely or in conjunction with teacher marking, in either a whole NAPLAN assessment or as part of a trial or partial NAPLAN assessment. 

“Due to the inequities and irregularities that arise from running two systems of NAPLAN testing it is proposed that the opt in provision for NAPLAN Online not be proceeded with as the results cannot be regarded as valid or reliable”. 

In January, the Chair of the Education Council, Susan Close MP, said: “Over the past few years education systems have been working with students, teachers and school communities to transition NAPLAN from a pen and paper test to an online environment.

“Education Ministers are committed to this transition over the coming years and welcome the many benefits that online testing can deliver to students, their parents and teachers through improved diagnostics and the faster turnaround of NAPLAN results.

“In transitioning to NAPLAN Online, education systems have been considering the appropriateness of utilising certain technologies, including automated essay scoring. Automated essay scoring allows for writing scripts to be assessed using sophisticated computer programing.

“In December 2017, the Education Council determined that automated essay scoring will not be used for the marking of NAPLAN writing scripts. Any change to this position in the future will be informed by further research into automated essay scoring, and be made as a decision of the Education Council.”

The Union will remain vigilant in ensuring that robust marking is used appropriately and technology is not used in lieu of teacher judgement.


Leslie, Loble, Tish Creenaune, Jacki Hayes, Future Frontiers, Education for an AI World, 2017, Melbourne University Press

ACARA report:

Perelman report:

ABC Fact Check