Research integrity is locked into an arms race with agentic AI slop-Median Watch

Science prides itself on being self-correcting. While scientific fakery has always been a problem, cases of fraud have been isolated, and a combination of scepticism and scrutiny has up to now generally worked to highlight published papers that are unreliable.

The 30 minute paper

However, the world of research and publishing is changing. The introduction of agentic artificial intelligence (AI) allows an automated assembly line of research tasks without any human checkpoints. We recently demonstrated that writing a plausible and entirely ‘new’ paper has become a 30 minute task, without any need for human agency in creating a research hypothesis or interpreting results.

The consequences of these developments are troubling, as this technology in the wrong hands allows scientific fraudsters to massively outproduce careful researchers. As recognition and reward systems often favour quantity over quality, engaging in corrupt behaviour can be a winning strategy. You do not need to look far in academia to see perverse reward systems driving unethical publishing practices. For instance, university league tables that are already prone to manipulation; medical students bolstering their credentials with conference submissions; and researchers winning fellowships based on the length rather than the quality of their CV.

Open data mining

The last two years have also seen a parallel explosion in the use of generative AI to exploit open data to produce superficially plausible but misleading papers. This mining of Open Data pollutes the academic record. In many cases, these mined papers have likely been produced by paper mills, shadowy organisations who create bogus papers using pre-made templates and AI. These manuscripts and authorship slots can sell for serious money, sometimes thousands of dollars.

Paper mills operate as industrial-scale extractors, churning out thousands of formulaic analyses by cycling through the almost endless combinations of data from open resources like the UK Biobank or the US CDC National Health and Nutrition Examinations Survey. These ‘papers’ produce large amounts of noise and very little signal. They are a digital pollution that could overwhelm good science and further exhausts the precious resource of peer review.

For some datasets, we estimate that the amount of mass-manufactured research now outnumbers legitimate research by ten to one. These trends were already a major concern, but a ‘30 minute paper’ – without any human-driven hypothesis – should create a whole new level of concern for anyone interested in maintaining scientific quality.

The corruption of open research

Scientific misconduct is an adversarial process between bad actors on one side, and integrity officers at journals and integrity researchers on the other. In the past, much fraud has been seen in duplicated images or plagiarism. The push for more Open Data (and Open Science in general) was partly intended to counter fraud by increasing transparency. However, fraudsters have now co-opted the very assets that were intended to signal ‘good research’ for their paper mines.

Some publishers are starting to act, but the countermeasures also impose time costs on researchers. Some publishers have begun automatically rejecting submissions using large Open Data sets. While necessary, these “mining bans” are a second-best solution to better control of Open Data in the first place; for example, asking researchers using Open Data to publicly register a protocol before they are given data access. However, the paper mines are unlikely to be easily closed, and more adversarial adaptations are likely.

A rising tide of scientific slop

Large language models are already being used to mass-produce redundant publications; their combination with agentic AI can further compromise research integrity. In short, it enables fast-churn research, produced for selfish purposes and without any guiding human-generated research question. Agentic AI has no regard for the quality, ethicality or suitability of the data it exploits, it is all ultimately raw material to be ingested. As more data become available online and machine readable, we are creating more opportunities for exploitation.

A key solution is for a range of stakeholders to reduce the demand for tick-box papers. Where they are not needed, institutions should stop requiring students to produce a paper to qualify. Universities should disengage from league tables that are driving the hyper-inflation in paper and citation numbers. Funders should focus on a researcher’s top five papers and wider impacts rather than basing decisions on the number of publications.

Researchers may also have to change practices to differentiate themselves from fraudsters. We encourage all researchers to leave a paper trail that demonstrates their hard work, including publishing protocols and releasing any analytical code. Better journals would also do well to require evidence that proves that the work was not instantly created.

Of course, in an arms race every measure prompts a countermeasure. For example, introducing pre-registration could lead to fraudsters pre-registering thousands of potential analyses. Without fundamentally changing a system that rewards quantity over quality, the only option is escalation.

(Link to original LSE Impact blog.)

Research integrity is locked into an arms race with agentic AI slop

Reproduced from the LSE impact blog.

The 30 minute paper

Open data mining

The corruption of open research

A rising tide of scientific slop

FEATURED TAGS