Notes on : ‘The use of generative AI in statistical data analysis and its impact on teaching statistics at universities of applied sciences’
generative AI
Source
Schwarz, J. (2025). The use of generative AI in statistical data analysis and its impact on teaching statistics at universities of applied sciences. Teaching Statistics, 47(2), 118-128.
Summary
- Study population: Students who are not pursuing careers as data analysts or data scientists. For example, those studying social sciences or business administration. In line with GAISE recommendations, these students are introduced to foundational theoretical concepts and learn how to apply them to real-world data sets.
- Research question(s):
- Can generative AI enable statistical data analysis even for people with little or no knowledge of statistics, at least for problems with a simpler structure?
- If generative AI can perform a substantial portion of data analysis, what should be the future focus of statistics education? That is, what priorities should guide the design of statistics curricula?
- Methods:
- The authors focus on analyses that can be performed using basic methods of t-test, linear regression, and ANOVA.
- Four data sets were intentionally designed to include minor violations of statistical test assumptions. For example, unequal variances across factor levels (violating ANOVA assumptions), outliers, or missing data.
- Prompts were given to ChatGPT Data Analyst as if the user had no statistical training. For example: “Please investigate statistically whether [y] differs for different levels of [x],” or “I have no statistical knowledge, can you please explain the results to me?”
- The analyses were repeated 2–3 times for each data set to check the stability of the results.
- Results:
- ChatGPT Data Analyst completed full analyses without requiring domain-specific knowledge from the user, explaining each step before implementation.
- However, the analyses were often incomplete. For example, issues such as heterogeneity in group variances, outliers, sample size, and missing data were not consistently addressed.
- In only one of the three repeated analyses, ChatGPT acknowledged the need to check assumptions in a linear regression context but did not follow through.
- The authors successfully reproduced the results using the Python code generated by ChatGPT.
- Conclusions:
- Reliable use of ChatGPT for data analysis requires some statistical knowledge to formulate effective prompts and identify missing steps.
- Less emphasis may be needed on programming, as AI can support code generation; instead, teaching can focus on using AI tools effectively.
- Core statistical concepts should still be taught to help students assess the completeness of results.
- Clear guidelines are needed on if and how ChatGPT may be used in assessments.
Key Quotes
“The selection of suitable statistical test procedure can often be based on rules and checking assumptions and can therefore be completely taken over by generative AI. The simpler the structure of the data set to be analyzed, the truer this is.”
“Although the examination of application assumptions is a standard part of test selection, ChatGPT failed to perform this step consistently, or at all in some cases.”
Reflection
- It’s interesting to note that ChatGPT was inconsistent in completing routine analyses. While the stability checks demonstrate that it can assess application requirements, this ability was not consistently observed across iterations. This raises challenges for users with limited statistical background, as they may not be equipped to recognize potential issues in the output.
- The suggestion that “teachers can incorporate exercises that encourage students to code more efficiently and accurately with the assistance of AI” is compelling. It would be valuable to see concrete examples or sample activities that illustrate how this idea could be implemented in practice.
- The authors acknowledge as a limitation that the prompts were designed under the assumption that users have no statistical knowledge, while also noting that it’s unclear whether this reflects real-world usage. Since the examples were artificially constructed, future work could usefully explore how students or analysts actually engage with ChatGPT and what their completed analyses look like in authentic settings.