Science

Surge in AI-Generated Papers Using Public Data Raises Alarms in Academia

Published on May 16, 2025
Image Credit: Ron Lach

In recent years, the number of research papers generated using public datasets and artificial intelligence (AI) tools has surged dramatically, prompting concern within the academic community. Editors at journals such as Scientific Reports have reported a flood of nearly identical submissions based on data from the U.S. National Health and Nutrition Examination Survey (NHANES). Between 2014 and 2021, only about four such papers were published annually. Since 2022, however, that number has skyrocketed—reaching 190 by October 2024—far outpacing overall publication growth in the health sciences.

These studies often follow a formulaic approach: select a health condition, a potential risk factor, and a demographic group, then substitute variables to generate "new" findings. Experts warn that this trend reflects a broader misuse of public datasets, effectively turning scientific research into a "fill-in-the-blanks" exercise. Similar patterns have emerged in fields such as genetics and bibliometrics. The widespread availability of generative AI tools like ChatGPT may be enabling researchers to rephrase identical conclusions to evade plagiarism detection. Meanwhile, the involvement of so-called “paper mills” is believed to be compounding the issue.

Analyses reveal that many of these papers selectively mine NHANES data to achieve statistically significant results, often ignoring the high risk of false positives. For instance, of 28 studies on depression, only 13 accounted for multiple testing corrections. More broadly, the number of papers using NHANES data jumped from 4,926 in 2023 to 7,876 in 2024. Other major datasets—such as the Global Burden of Disease Study—may also face similar risks.

This phenomenon highlights systemic flaws in scientific publishing and academic evaluation. Open-access journals are criticized for accepting low-quality work in exchange for high publication fees, while researchers—under career pressure—focus on quantity over quality. Scholars warn that unless the incentive structure is fundamentally reformed, the problem will worsen, ultimately undermining the credibility of science.

Tags

Comments