The work aims to improve the assessment of creative problem-solving in science education by employing language tech- nologies and computational-statistical machine learning methods to grade students' natural language responses automat- ically. To evaluate constructs like creative problem-solving with validity, open-ended questions that elicit students' constructed responses are beneficial. But the high cost required in manually grading constructed responses could become an obstacle in applying open-ended questions. In this study, automated grading schemes have been developed and evalu- ated in the context of secondary Earth science education. Empirical evaluations revealed that the automated grading schemes may reliably identify domain concepts embedded in students' natural language responses with satisfactory inter-coder agreement against human coding in two sub-tasks of the test (Cohen's Kappa = .65-.72). And when a single holistic score was computed for each student, machine-generated scores achieved high inter-rater reliability against human grading (Pearson's r = .92). The reliable performance in automatic concept identification and numeric grading demon- strates the potential of using automated grading to support the use of open-ended questions in science assessments and enable new technologies for science learning.
Computers & Education , vol. 51, no. 4, pp. 1450-1466