Grading Infrastructure for Large Data Science Courses


Data Science and Computer Science courses with large enrollments face grading challenges due to the size and nature of their assignments. In our university’s Introductory Data Science class, every student submission is broken up into auto-graded code sections and manually graded response sections to be separately graded, then merged to produce a final grade. In response, our university’s Department of Data Science has developed a grading solution pipeline to productionize this process, in addition to auto-grading software and a backend to grade student’s assignments that are on Jupyter Notebooks.

The session will showcase the grading process for student assignments for a large introductory data science course, discussing how it incorporates existing grading platforms, such as Gradescope, Canvas, and other solutions into the pipeline. We will also discuss the challenges of creating autograding infrastructure for large in-person courses and online MOOCs with tens of thousands of students, which our department also offers.

Deploying auto grading solutions, especially on a large scale, is an infrastructural challenge that many universities have expressed interest in implementing. We hope to use this opportunity to present our solution and better package it for other educational institutions going forward.


Previous Knowledge

Nothing mandatory. However, familiarity with Jupyter Notebooks, JupyterHub, and existing grading platforms may be useful. 

Software Installation Expectation


Session skill level
Session Track
Enabling Teaching and Learning