February 21, 2022

University of Birmingham - Graide Efficacy Research

Higher Education

Manjinder Kainth, PhD

Abstract

Graide by 6 Bit Education is an AI powered assessment and feedback platform. It learns the way educators give feedback so they do not have to grade the same answer twice. This research compared grading on paper to in the Graide platform for 10 questions with 172 submissions per question. The median grading times reduced by 74%, and the number of words of feedback given increased by a factor of 7.2. We estimate that a university, with 3500 STEM students, using Graide could save over £240,000 a year.

‍

Introduction

It is generally accepted that the highest quality of teaching occurs one-on-one. This allows personalised feedback to the approach students take. Unfortunately, this does not scale; we cannot teach everyone one-on-one. A common replacement for this is setting assignments for classes to do and then giving high quality feedback to those assignments. Sadly this too has issues: it takes a lot of time and, where that time is paid for directly, it is costly. This increase in workload has resulted in disillusioned staff leaving education and also has detrimental effects on their well-being.

In recent years, technology is increasingly permeating the education sector, mostly with automated solutions. The simplest example is multiple choice, which is easy to create and offers fast feedback, although students frequently are able to reverse engineer the answers. Another example is automated mathematics grading which requires educators to laboriously program questions to remove the potential for reverse engineering. Question programming requires competency which is often outside educators' skill sets and requires a significant investment of time to reach proficiency. Additionally, both of these approaches have one serious issue: they only grade the final answer. When providing formative assessment, the approach students take is just as, if not more, important than the final answer. Knowing where they go wrong allows educators to pinpoint exactly what their students need to know to improve.

6 Bit Education has developed a platform that approaches adding technology to education in a novel way. Instead of complete automation, we are using technology to assist educators and increase productivity. This is done in two ways: workflow optimisation and artificial intelligence (AI) learning. The workflow is optimised by digitising the process, in order to remove the administrative overheads such as collation. The AI learns how educators give feedback so they do not have to grade the same method twice. This is called `Replay Grading'.

In the rest of this report, we will outline the technology behind Replay Grading, followed by discussing how Graide is used. Next, we will go over the method for this research and our findings. Finally, we will conclude the report.

‍

Replay Grading

Replay Grading is a technology that uses an AI to learn how educators give feedback so they do not have to grade the same method twice. When grading, each submission is handled one at a time. It is analysed to see if a similar response has been seen before. If no other submissions are similar it is given to an educator to grade and provide feedback on. This is then learned by the system so if a similar response appears in the future, it can automatically be graded and given high quality feedback. Additionally, if a portion of a submission has been seen, it can be partially graded automatically. This means each approach is only ever graded once. This workflow is shown in the figure below.

To highlight the effectiveness of this approach, let us demonstrate this with an example. We can break down a student's response into a series of steps they took and represent it as a tree; the root of the tree is the question and the branches the steps the student took. Students taking the same step form part of the same branch. A single instance of this tree can be seen in the figure below, while a "complete" tree of 172 responses can be seen in the figure after.

An example breakdown of a single student response with all their method.

A collated tree of 172 student responses with each node representing a step in their method. Nodes are coloured green if a teacher has graded this step as correct and red if the step has been graded as incorrect.

It has been collated as much as possible, some lines are thicker than others, but it is very broad and rather deep. After running this data through Replay Grading, the tree in above figure becomes the tree below.

The full tree once it has been the through Graide's workflow. Each node represents a step in a method. Nodes are coloured green if a teacher has graded this step as correct and red if the step has been graded as incorrect. Thicker lines correspond to more students taking the approach.

The reduction is immediately apparent. It is now possibly to identify common paths and different clusters, all while giving those unique responses the feedback they require. In this case, the tree was reduced 84%.

An important thing to note is that the Replay Grading workflow means the entire response receives high quality feedback. In the next section we will be looking at Graide and the interface used in this research.

‍

Graide

Graide is the platform developed by 6 Bit Education which interfaces with Replay Grading to reduce educator workload. The name originates from the idea of `putting the AI in grading' to be an `aide' for educators.

The platform is designed with simplicity in mind: creating content does not require programming and grading is as simple as clicking or drawing regions and typing the relevant feedback.

Student Entry

The student experience is vital in any digital assessment and feedback platform. Graide has multiple methods of answer entry to make this as simple as possible. The four methods include a visual editor, markdown editor, stylus entry, and optical character recognition (OCR).

Graide's visual editor allows for students to input work into a rich text editor which many are familiar with. This allows for the inclusion of images and an intuitive inline maths editor, depicted in the figure below.

A visual "what you see is what you get" editor. It allows for different formatting such has bold and italics. It also allows for the input of mathematics visually, and the adding of images for diagrams.

As individuals in the science and engineering community are familiar with markdown and LaTeX, Graide's markdown editor allows them to use this format. The markdown editor is depicted in the figure below.

A markdown editor. Allows for formatting such bold and italics with the use of appropriate delimiters. Additionally, it allows for the input of mathematics via latex.

Increasing numbers of students have access to large touchscreen devices. This allows for students to input their answers using on screen handwriting similar to how many already submit work on paper. The handwriting interface Graide offers is shown in the following figure.

Handwriting input, allowing for students to use a stylus or touch sensitive device such as a tablet.

Finally, as many students still prefer to work on paper, Graide supports optical character recognition (OCR) on handwritten work which can then be edited in the markdown editor. This uses a neural network to convert handwritten work, and is depicted in the figure below.

Optical character recognition interface for handwriting. A student uploads an image, which is processed by a neural network to convert into latex and markdown. This can then be edited.

Grading Interface

There are two modes of grading and giving feedback in Graide: digital and PDF. Both have the same core and functionalities, with only minor user interface differences.

The digital grading interface is shown in figure below. A rubric is on the left and the student response is on the right. To add feedback, simply click on the relevant region in the student response and then the desired feedback in the rubric. Once this is submitted, the feedback is processed and the progress bar is updated: regions update between red (to do), amber (partially completed), and green (completed). If the feedback is suggested by Replay Grading, it is presented with a confidence percentage. You can remove feedback by pressing the "x" button on the right of the feedback.

The digital grading interface allows for feedback to be given to steps of the student response. This is done by clicking on the response (on the right) followed by the feedback (on the left). Once submitted it is processed by Graide's AI. The progress bar (top) gets updated, changing from red to green highlighting the level of automated work. The suggestions are provided by the platform with a certain confidence level (shown as a percentage).

The PDF grading interface is shown in figure below. This is very similar to the digital grading interface with a few differences. Instead of clicking on the regions you draw them. The system cannot Replay Grade PDFs currently, but this is being developed.

The PDF grading interface allows for feedback to be given on scans of student's work by drawing relevant regions (right), followed by clicking on the feedback (on the left). Grades are totalled automatically. Editing the feedback in the table, edits it for all students with the same feedback, ensuring consistency.

‍

Method

The research was conducted in the School of Physics and Astronomy at the University of Birmingham. An assignment was graded on iPads where scans of scripts were manually graded with digital handwriting. This was timed and the feedback was analysed. Approximately 18 months later the scripts were digitised, uploaded into the platform, and graded on the Graide platform.

The following metrics were measured:

Median grading time (original and Graide)
Serial grading time (Graide)
Words of feedback given (original and Graide)
Typing reduction (Graide)
Percentage of automatic grading (Graide)
Words per minute of feedback given (original and Graide)

These metrics were chosen for their relevance in the grading and feedback space. If we focused on time to grade alone, then we would be ignoring the student's role in the education loop. Similarly, if we focused on feedback alone, we would be ignoring the teacher's role in the education loop.

There were 10 questions in this assignment with 172 submissions to each question. The questions were:

‍

Results

Graide showed an overwhelming benefit in all the measured metrics. An overview of the results can be seen in table below. From here onward, "original" grading corresponds to grading performed by digital handwriting on iPads.

Average (Median) Grading Time

The original grading time averaged to 11.2 minutes per script. Through Graide the average grading time per script was 2.8 minutes. This corresponds to a reduction of 74%.

Serial Grading Time

The figure below shows the serial grading time when using Graide. There are three things to note: The first script took 5.6 minutes to grade. When compared to 11.2 minutes of the original grading, this corresponds to an immediate reduction 50%. There is a rapid reduction in grading times. By the 20th script the grading time has dropped by a factor of almost 7. By script 142 all the grading is automated, but each script is still given high quality feedback.

Graph showing the scaling of time required to grade assignments (10 questions). Light line is the raw data, and the thick line is a line of best fit. At submission 142 every question is automatically graded resulting in no time grading, but the same quality feedback to the responses.

Words of Feedback Given

Originally, the average amount of feedback given per script was 23 words. Through Graide, this increased to an average of 166 words, an increase of a factor of 7.2.

Typing Reduction

This is calculated by taking the number of words of feedback given to students, and dividing it by the number of words of feedback actually typed by the markers. The extra words are due to feedback being re-used, whether automatically by Replay Grading or manually by the markers. It is important to note that this number will be related to the number of scripts having to be graded: as the number of scripts increases, so too does the typing reduction. We found that there was an 86% reduction in typing.

Percentage of Automatic Grading

Of the 1720 questions that were graded, 506 were completely automatically graded. This corresponds to a percentage of 29%.

Words per Minute of Feedback Given

To give a sense of quantity of feedback and time together we can calculate the words per minute (WPM) of feedback given. Originally, the average amount of feedback given was 23 words in 11.2 minutes, which corresponds to 2 WPM. Through Graide, the same calculation gives 59 WPM, which is an increase of 29 times. This is 50% higher than the average typing speed of 40 WPM.

Conclusion

In this report, we introduced Replay Grading, along with the research that was performed using Graide. Graide offers the benefit of a flexible workflow for students, while optimising the grading process for educators. Comparing an assignment in Graide to handwritten feedback showed a reduction in grading times, and increase in feedback the student's receive.

Assuming a student does 44 assignments in a year, the average time to grade work is 11 minutes, and educators are paid £17 an hour; Graide would save a departments £71 per student per year. For a university with 3,400 STEM students Graide could save them over £240,000 a year.

Future work will study how this performance compares through different subjects and levels of education.