Two Stern professors implemented an artificial intelligence-led oral exam in the fall semester to test that students weren’t relying on technologies such as ChatGPT to comprehend course materials.
In the Artificial Intelligence and Machine Learning Product Management class, professors Panagiotis G. Ipeirotis and Konstantinos Rizakos built the exam using ElevenLabs’ AI voice generator model. After students completed the 25-minute oral exam, Claude, Gemini and OpenAI graded the students’ work by evaluating the generated voice transcripts. Ipeirotis said that the AI evaluations ensured that the scores were “more consistent” than his own manual grading.
“We encourage people to use AI,” Ipeirotis told WSN. “We just noticed that not everyone had the right level of maturity to engage with the material that they were getting back from the AI LLMs — they were just taking it at face value and submitting without really understanding what was going on.”
Over the course of nine days, the ElevenLab AI agent conducted 36 exams. The two-part exams first asked students about their group capstone projects, followed by an assessment based on course material. The AI agent then asked specific follow-up questions based on the students’ response, according to Ipeirotis’ blog post last month.
In emails sent to Ipeirotis, students reported that the “intensity” of the AI agent’s voice heightened their anxiety and affected their performance. Students also said that the AI agent asked multiple questions at a time, making it difficult for students to comprehend the prompt. Ipeirotis said the problem could be fixed by asking the AI agent to only ask one question per sentence, though concerns about the agent’s voice are still being tested.
Although 70% of students said in a post-exam survey that the new test format better assessed their comprehension of classroom materials, only 13% said they prefer it over written exams, and 83% said they found the oral AI exam more stressful.
“People were excited to try something like that,” Ipeirotis said. “There were a few problems — but people found it very fitting for the class where we’re studying AI product management that they’re going to be tested in this way.”
He added that using AI agents to grade the exams drastically lowered the cost of labor, which now costs about $0.45 per student, compared with paying teaching assistants $25 an hour for roughly 30 hours to grade the entire class.
Ipeirotis said he will continue to implement the oral exam in his class as it tests students’ knowledge in real-time. In the spring semester, students will take their final exams using the same AI agent, but this time designed by students themselves.
“This was not about cheating,” Ipeirotis said. “It was really about encouraging students to learn and dive deeper in the material — they used AI, but I was worried that they were not using it properly to their benefit. That was the main motivation.”
Contact Selin Kemiktarak at [email protected].















































































































































