Study finds anti-Asian racial bias in AI grading

A new study by the Educational Testing Service (ETS) has raised concerns about potential racial bias in AI essay grading systems. In the study, shared with the nonprofit The Hechinger Report, researchers compared human grader scores with those of a ChatGPT-based AI model, GPT-4o, on over 13,000 anonymized essays written between 2015 and 2019.

Discrepancy found: GPT-4o consistently scored essays lower than human graders, averaging 2.8 compared to 3.7. This discrepancy was most pronounced for Asian American students, who received an average of 3.2 from GPT-4o versus 4.3 from human graders, a difference of 1.1 points. The gap for white, Black and Hispanic students was smaller, averaging 0.9 points.
Cause of bias unknown: Researchers are unsure why the AI model exhibited this racial bias, attributing the discrepancy to the model’s complex algorithms. ETS researcher Mo Zhang called for caution in the use of AI grading systems in classrooms, noting that “there are methods for doing this and you don’t want to take people who specialize in educational measurement out of the equation.”

Share this Article

Study finds anti-Asian racial bias in AI grading

China’s ChatGPT-like program MOSS to be made open source

College student develops new app that detects AI-written essays

AI beats humans in creative potential, study finds