Notes on : ‘The use of generative AI in statistical data analysis and its impact on teaching statistics at universities of applied sciences’
generative AI
Source
Schwarz, J. (2025). The use of generative AI in statistical data analysis and its impact on teaching statistics at universities of applied sciences. Teaching Statistics, 47(2), 118-128.
Summary
- Study population: Students who are not pursuing careers as data analysts or data scientists. For example, those studying social sciences or business administration. In line with GAISE recommendations, these students are introduced to foundational theoretical concepts and learn how to apply them to real-world data sets.
- Research question(s):
- Can generative AI enable statistical data analysis even for people with little or no knowledge of statistics, at least for problems with a simpler structure?
- If generative AI can perform a substantial portion of data analysis, what should be the future focus of statistics education? That is, what priorities should guide the design of statistics curricula?
- Methods:
- The authors focus on analyses that can be performed using basic methods of t-test, linear regression, and ANOVA.
- Four data sets were intentionally designed to include minor violations of statistical test assumptions. For example, unequal variances across factor levels (violating ANOVA assumptions), outliers, or missing data.
- Prompts were given to ChatGPT Data Analyst as if the user had no statistical training. For example: “Please investigate statistically whether [y] differs for different levels of [x],” or “I have no statistical knowledge, can you please explain the results to me?”
- The analyses were repeated 2–3 times for each data set to check the stability of the results.
- Results:
- ChatGPT Data Analyst completed full analyses without requiring domain-specific knowledge from the user, explaining each step before implementation.
- However, the analyses were often incomplete. For example, issues such as heterogeneity in group variances, outliers, sample size, and missing data were not consistently addressed.
- In only one of the three repeated analyses, ChatGPT acknowledged the need to check assumptions in a linear regression context but did not follow through.
- The authors successfully reproduced the results using the Python code generated by ChatGPT.
- Conclusions:
- Reliable use of ChatGPT for data analysis requires some statistical knowledge to formulate effective prompts and identify missing steps.
- Less emphasis may be needed on programming, as AI can support code generation; instead, teaching can focus on using AI tools effectively.
- Core statistical concepts should still be taught to help students assess the completeness of results.
- Clear guidelines are needed on if and how ChatGPT may be used in assessments.
Key Quotes
“The selection of suitable statistical test procedure can often be based on rules and checking assumptions and can therefore be completely taken over by generative AI. The simpler the structure of the data set to be analyzed, the truer this is.”
“Although the examination of application assumptions is a standard part of test selection, ChatGPT failed to perform this step consistently, or at all in some cases.”
Reflection
- It is surprising that ChatGPT is inconsistent in completing routine analyses. The stability checks show it can assess application requirements, yet it does not do so consistently across iterations. This is a limitation for users with little knowledge of statistics, as they would not be able to recognize the limitations of the results.
- The suggestion that “teachers can incorporate exercises that encourage students to code more efficiently and accurately with the assistance of AI” is interesting. I would like to see concrete examples of how this can be implemented.
- The authors note as a limitation that prompts were designed assuming users had no statistical knowledge. However, they acknowledge that it’s unclear whether this reflects how real users actually interact with AI. Additionally, since these examples were artificially generated, it remains to be studied how students or analysts genuinely use ChatGPT and what their final analysis outcomes look like.