EAAST shared research on how the British Council independently evaluated the quality of an ai-driven pronunciation rater using accessible research approaches. The research shows how feedback from test developers can be key in helping to refine the technologies to ensure that ai-driven feedback encourages positive learning outcomes. 

There has been a proliferation of AI-driven learning and assessment solutions providing automated scores and feedback on learner pronunciation over the past few years. Whilst testing authorities have commissioned research on the validity and reliability of automated scoring systems for summative assessment (Bernstein J, et al., 2010), there are fewer studies on automated bespoke feedback processes that could, when delivered effectively, have positive consequences on individualised learning. The presentation outlined research into the validation of an AI-driven evaluation engine that provides bespoke pronunciation feedback.

Two main research questions were posed: 1. Is the AI engine sufficiently reliable to consistently identify samples of strong and weak pronunciation performance? 2. What is the impact of adopting the findings of this study on the reliability of the feedback engine? To address question one, we presented a study analysing the correlation between human and machine ratings for a sample of 300 word-level audio files and implications for the automated selection of formative feedback examples for learners. To address question two, we outlined how we selected actionable findings to the technology provider and presented the impact on engine reliability of adopting our recommendations. The presentation concluded by showing how this research approach can be more generally applied by language institutions to explore the validity and reliability of the increasing number of autorated pronunciation tools used by language learners.

Reference

Bernstein, J., Van Moere, A., & Cheng, J. (2010). Validating automated speaking tests. Language Testing, 27(3), 355–377