- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Undergraduate Research /
- Evaluating LLM Performance in Essay Assessment : A...
Open Collections
UBC Undergraduate Research
Evaluating LLM Performance in Essay Assessment : A Comparative Analysis of AI Grading and Feedback Systems for University English Courses Stasuik, Noah Carter
Abstract
With artificial intelligence rapidly transforming industries globally, its integration into higher education appears increasingly inevitable. This thesis explores the potential of using LLMs (Large Language Models) to grade students’ essays and provide feedback on their writing in 100-level university English courses. With grading consuming significant portions of professors’ and TAs’ time, there often remains insufficient opportunity to directly engage with students. In this study, various LLM models and assessment strategies were implemented to evaluate the quality of feedback and accuracy of grades delivered by AI systems in comparison to the original human graders. All participating students consented to be evaluated by both locally hosted AI models (Llama 3.1, 3.2) as well as OpenAI’s commercial offerings (GPT-4o-mini, o1, and o3-mini). The findings indicate that while AI currently lacks the consistency necessary to fully replace human assessment, newer and more powerful LLMs demonstrate progressively better performance in both grading accuracy and feedback quality. Furthermore, when these models are combined with specialized assessment methodologies, the results show even greater accuracy in both grading and feedback semantic similarity. Although the results confirm that AI cannot completely substitute human grading expertise, they strongly suggest that these technologies could serve as valuable assistive tools in the assessment process. The AI-generated feedback showed particular promise for helping students improve their work, with semantic similarity metrics achieving acceptable scores when compared to human-provided guidance.
Item Metadata
Title |
Evaluating LLM Performance in Essay Assessment : A Comparative Analysis of AI Grading and Feedback Systems for University English Courses
|
Creator | |
Date Issued |
2025-04-28
|
Description |
With artificial intelligence rapidly transforming industries globally, its
integration into higher education appears increasingly inevitable. This thesis
explores the potential of using LLMs (Large Language Models) to grade
students’ essays and provide feedback on their writing in 100-level university
English courses. With grading consuming significant portions of professors’
and TAs’ time, there often remains insufficient opportunity to directly engage
with students.
In this study, various LLM models and assessment strategies were implemented
to evaluate the quality of feedback and accuracy of grades delivered
by AI systems in comparison to the original human graders. All participating
students consented to be evaluated by both locally hosted AI models
(Llama 3.1, 3.2) as well as OpenAI’s commercial offerings (GPT-4o-mini,
o1, and o3-mini). The findings indicate that while AI currently lacks the
consistency necessary to fully replace human assessment, newer and more
powerful LLMs demonstrate progressively better performance in both grading
accuracy and feedback quality. Furthermore, when these models are
combined with specialized assessment methodologies, the results show even
greater accuracy in both grading and feedback semantic similarity.
Although the results confirm that AI cannot completely substitute human
grading expertise, they strongly suggest that these technologies could
serve as valuable assistive tools in the assessment process. The AI-generated
feedback showed particular promise for helping students improve their work,
with semantic similarity metrics achieving acceptable scores when compared
to human-provided guidance.
|
Genre | |
Type | |
Language |
eng
|
Series | |
Date Available |
2025-05-12
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
Attribution-NonCommercial-NoDerivatives 4.0 International
|
DOI |
10.14288/1.0448868
|
URI | |
Affiliation | |
Peer Review Status |
Unreviewed
|
Scholarly Level |
Undergraduate
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International