Stern researchers found 23 artificial intelligence models that successfully passed the highest level of the Chartered Financial Analyst exam, which is globally used to evaluate candidates’ readiness. They said that last month’s findings could make financial advising more accessible to companies with lower budgets.
The researchers collaborated with the AI-driven wealth platform GoodFin to examine the “high-stakes analytical reasoning” abilities of large language models such as ChatGPT and Gemini. The tested models successfully passed a mock version of the exam’s third level, which contains multiple choice and essay questions that require synthesizing complex financial concepts.
Stern professor Srikanth Jagabathula told WSN that he began the research unsure of whether AI could play a role in financial advising because it is such a high-stakes industry.
“You need to be careful about what you deploy in practice,” Jagabathula said. “These studies clearly demarcate areas where machines might be good and humans might be good, and points to a future where we could have both work together to offer the best advice.”
Previous studies have found that LLMs pass the first two levels of the CFA exam, which are composed of simple multiple choice questions. In the new study, researchers saw that the performance of “frontier” LLMs were able to surpass their predicted passing threshold of 63% in the overall exam — with ChatGPT’s o4-mini model passing at 79.1% and Gemini passing at 75.9%.
The new findings also identified that reasoning versions of the LLMs — which are designed to spend more time on analyzing prompts — outperform non-reasoning models by an average of 19% when handling essay questions. LLMs also respond best when they receive precise description of the steps to take in an analysis, which researchers said is a first step to relying on AI in higher-stakes situations.
“Because the space that we’re in is heavily regulated and we’re dealing with people’s money, and usually professionals in this space will have credentials like the CFA Level III, it’s really important that we have this understanding of how performant these models are,” GoodFin founder and CEO Anna Joo Fee said in an interview with WSN.
The study also found that the number of details LLMs require for accurate analysis will demand more computing power, which translates to a heightened cost for users — a 7.8% increase in multiple choice accuracy would see a jump in three to 11 times in cost.
To balance cost and efficiency, the researchers suggest that financial advising firms could assign smaller, faster models for simpler prompts, and only utilize more advanced models for equally complex prompts. However, Fee said that companies will still need to ease into AI usage.
“Some might argue that financial advising is more relationship-based than finance-based,” Fee said. “It’s no longer going to be enough to do the more basic things, but really understanding your client, being able to empathize and building those relationships — that’s going to become paramount.”
Contact Mia Shou at [email protected].