🤔 That’s questionable

Designing and deploying effective models for generating multiple versions of auto-marked questions

Liza Bolton

liza.bolton@auckland.ac.nz

Anna Fergusson

a.fergusson@auckland.ac.nz

Lars Thomsen

lars.thomsen@auckland.ac.nz

Charlotte Jones-Todd

c.jonestodd@auckland.ac.nz

2024-12-02

Scan for slides

or go to link.lizabolton.com

Presented at NZSA 2024 @ Te Herenga Waka Victoria University of Wellington

Funding

This project has received funding from the Faculty of Science Scholarship of Teaching and Learning Fund.

Ethics

This study was approved by the University of Auckland Human Participants Ethics Committee (ref: UAHPEC27494).

Course context

STATS 101/108 is an introductory statistics course
- an introduction to using data to learn, identify and solve problems, make decisions, and communicate
1600 to ~2000 students in Semesters 1 and 2 + a summer school offering
Required for programmes in business and psychology
👩🏻‍🏫 Teaching teams of 4–6 lecturers, 8–10 help room tutors and 15–20 markers
Major redesign in 2023

Redesign components

Adapted from Fergusson, A. (2024, November 19). Getting the best of both worlds: Integrating human and automated assistance to support student learning via an online question-answering platform. University of Auckland STELA group meeting.

Quizzes

11 total chapter quizzes, for the first 11 (of 12) chapters in the coursebook
Unlimited attempts
No time limit
Low stakes assessment: each worth 1% of the final grade
Students are encouraged to ask questions and get help with the quizzes in the drop-in help sessions with tutors and peers

An example

Question generating models

There are existing tools like the 📦 exams (e.g., Zeileis et al., 2014) package in R to generate versioned assessments.
HOW to generate these questions from models is not yet as developed an area.
As with any model, we need to consider our inputs, outputs and assumptions.
- Quite different vs similar?
- Visual skew vs calculated kurtosis?
- The right answer shouldn’t always be in the middle 🤦‍♀️

Zeileis A, Umlauf N, Leisch F (2014). “Flexible Generation of E-Learning Exams in R: Moodle Quizzes, OLAT Assessments, and Beyond.” Journal of Statistical Software, 58(1), 1–36. doi:10.18637/jss.v058.i01.

More examples

Link to iNZight

Words can be hard

Results

From Semester 1, 2024

1,984 enrolled students

Data shown is for submissions by the due date

Quiz	Mean	Median	Upper 90%
1	4.8	4	9
2	4.7	4	9
3	4.1	3	8
4	5.8	5	12
5	5.2	4	11
6	3.6	3	7
7	3.9	3	8
8	5.5	4	11
9	4.2	3	8
10	4.0	3	8
11	3.3	3	6

Attempts

Grades

You still have a small number of students who don’t do any quizzes. 👻
Lower quartile: 76%
Median: 90.5%
Mean: 82.5%

Reflections

Students use these quizzes for revision quite effectively.
Some students will brute force it to try to get their 10/10 without understanding fully what they’re supposed to be doing.
- BUT some students would later use these questions to guide the help they ask for in revision. Helps them be keenly aware of what they are still stumped on.
Writing portions of the tests and exams are easier (for the teaching team) as we can draw on familiar question styles with new data context.

Conclusions

Automation is not really lessening workload here, but it is enabling us to create useful learning tasks with some future proofing.
- Someone posts answers to all 40 versions online? Rerun the question generating model and get 40 new questions.
Almost certainly want to iterate and have some human checking during development.
🤑 Putting your “marks where your mouth is” in this way results in student engagement and practice.

Next steps

📦 Working towards a package that would make it easier to write these question generating models in R and then set up the components appropriately for Canvas, Inspera, HTML, etc.
👥 Understanding student interaction profiles.
❓✅❌ Developing tools for assessing the difficulty of these types of questions and quizzes.

Thank you!

+ Thank you and credit to our summer research assistants who are working on the profile and difficulty work, as we speak: Jingdi Sun, Brittany Alexander, Cris Escandor & Fergus Lee.

Aims summary

This talk had three aims:

to explore design principles that support pedagogy-first approaches to creating question-generating models,
to share considerations and opportunities with respect to having students analyse data (with iNZight Lite) to answer quiz questions, and
to report on how students are actually using quizzes with multiple versions in a large introductory statistics course, including findings based on data about quiz attempts, as well as reflections from the teaching team.

FAQ

Does this ruin your class average because they all get 10s? Nope.

Do you do this for the test and exam? Nope. The current tech support drama of one testing platform is more than enough for high-stakes assessments, so we don’t have students use iNZight in the test and exam

Slides: link.lizabolton.com

Quiz	Mean	Median	Upper 90%
1	4.8	4	9
2	4.7	4	9
3	4.1	3	8
4	5.8	5	12
5	5.2	4	11
6	3.6	3	7
7	3.9	3	8
8	5.5	4	11
9	4.2	3	8
10	4.0	3	8
11	3.3	3	6

Quiz	Mean	Median	Upper 90%
1	4.8	4	9
2	4.7	4	9
3	4.1	3	8
4	5.8	5	12
5	5.2	4	11
6	3.6	3	7
7	3.9	3	8
8	5.5	4	11
9	4.2	3	8
10	4.0	3	8
11	3.3	3	6

Quiz	Mean	Median	Upper 90%
1	4.8	4	9
2	4.7	4	9
3	4.1	3	8
4	5.8	5	12
5	5.2	4	11
6	3.6	3	7
7	3.9	3	8
8	5.5	4	11
9	4.2	3	8
10	4.0	3	8
11	3.3	3	6