How AI is Moulding the Future of Peer Review in Science Research

Artificial intelligence has become deeply embedded in scientific peer review, with a December 2025 global survey revealing that 53% of researchers now use AI tools when evaluating manuscripts. This quiet transformation, occurring faster than formal policies can keep pace, presents both opportunities and challenges for scientific integrity. While AI helps reduce reviewer workload and levels the playing field for non-native English speakers, most usage remains superficial (drafting reports and polishing language) rather than enhancing rigor through statistical validation or methodology checks. A parallel Cornell University study shows scientists using AI are publishing 50% more papers, but many AI-polished manuscripts fail peer review despite impressive language, creating a crisis where traditional quality signals no longer reliably indicate scientific merit. Regional adoption varies dramatically (87% among early-career researchers, 77% in China, 66% in Africa versus lower rates in Europe and North America), while 57% of scientists say they would be unhappy if AI wrote reviews of their own work, revealing a deep trust paradox at the heart of modern science.

In research institutes and laboratories around the world, a subtle but consequential transformation is taking place. Peer review (the process by which scientists scrutinize each other’s work before publication) has long been the cornerstone of scientific credibility. Rooted in human expertise, judgment and collegial critique, it has endured for centuries with remarkably little structural change.

That era is now quietly giving way to a new reality: artificial intelligence is becoming embedded in the mechanics of scholarly evaluation.

This shift is laid bare by a first-of-its-kind global survey conducted by Frontiers, which found that 53% of peer reviewers now use AI tools in their work. The study, based on responses from 1,645 active researchers across 111 countries collected in May and June 2025, represents the first large-scale examination of AI adoption, trust, training, and governance within scholarly evaluation.

“AI is transforming how science is written and reviewed, opening new possibilities for quality, collaboration, and global participation,” said Kamila Markram, CEO and Co-Founder of Frontiers. “Our 2025 global survey reveals that while AI is steadily finding its place in peer review, its full potential is currently untapped.”

She noted that AI has already redefined the authoring process for scientific research, but it is now quietly reshaping how it is reviewed. “Today, it is often used for surface tasks, like polishing language, drafting text, or handling administration, rather than for deeper analytical and methodological work where it could truly elevate rigor, reproducibility, and scientific discovery,” she added.

The Transformation Is Already Here

For decades, peer review has functioned as science’s quality control system, with experts volunteering their time to test assumptions, interrogate methods and assess conclusions before research enters the public domain. But rising submission volumes, reviewer fatigue and long publication timelines have strained the system.

AI has entered this space not through formal redesign, but through quiet, individual adoption. According to the Frontiers analysis, more than half of reviewers now integrate AI into their workflow, with nearly a quarter reporting that their usage has increased significantly in the past year. The pace of change suggests a cultural shift rather than a temporary experiment, particularly as generative AI tools become cheaper, faster and more accessible.

Yet most usage remains limited in scope. Reviewers primarily rely on AI for drafting reports, improving clarity and summarizing manuscripts. Only about 19% use AI to evaluate methodology, statistical validity or experimental design, areas traditionally considered the intellectual core of peer review.

“There is still some way to go in rethinking how science should work in the 21st century and how it informs policy,” commented Jean-Claude Burgelmen, Editor in Chief at Frontiers.

This cautious pattern aligns with broader observations in scientific publishing. A recent investigation by Nature found that while many researchers are comfortable using AI for language support, they remain wary of deploying it for substantive scientific judgment, especially given unresolved concerns around confidentiality and data ownership.

The Productivity Paradox

The restrained use of AI reveals both promise and limitation. On one hand, AI demonstrably reduces workload, improves clarity and helps non-native English speakers participate more fully in global science. On the other, its most transformative potential (strengthening rigor, reproducibility and fraud detection) remains largely untapped.

A separate Cornell University study published in Science reveals the double-edged nature of this transformation. Scientists who use large language models like ChatGPT are posting dramatically more papers: up to 33% more on arXiv, over 50% on bioRxiv, and nearly 60% on social science preprint servers.

“It is a very widespread pattern, across different fields of science, from physical and computer sciences to biological and social sciences,” said Yian Yin, assistant professor of information science at Cornell. “There’s a big shift in our current ecosystem that warrants a very serious look, especially for those who make decisions about what science we should support and fund.”

The biggest beneficiaries are researchers whose first language isn’t English. Scientists from Asian institutions, for example, posted between 43% and 89% more papers after adopting AI tools, depending on the preprint platform. The benefit is so substantial that Yin predicts a global shift in scientific productivity toward regions previously held back by language barriers.

But here’s the catch: many of these AI-polished papers fail to deliver real scientific value. Across all three preprint sites analyzed, papers likely written by humans that scored high on writing complexity tests were most likely to be accepted by scientific journals. But high-scoring papers probably written with AI assistance were less likely to be accepted, suggesting that despite convincing language, reviewers deemed many to have little scientific merit.

“People using LLMs are connecting to more diverse knowledge, which might be driving more creative ideas,” said Keigo Kusumegi, first author of the Cornell study and a doctoral student in information science. But the disconnect between polished prose and scientific substance is creating new challenges for everyone from journal editors to funding agencies trying to evaluate research quality.

A World Divided on AI

The Frontiers survey reveals stark regional differences in adoption. Researchers in China report the highest usage levels, with roughly three-quarters of reviewers using AI tools, while African researchers follow closely. There, AI is often seen as an equalizer in multilingual and resource-constrained research environments.

By contrast, adoption rates remain significantly lower in Europe and North America, where reviewers express greater concern about algorithmic bias, opacity and the absence of enforceable governance.

The generational divide is even more pronounced. Among early-career researchers (those with five years or less experience), 87% report using AI regularly. For many, AI feels like a normal part of their toolkit. Yet more junior researchers (48%) tend to have a more positive view of AI’s impact on peer review compared to their senior counterparts (34%).

“When we analyzed responses by career stage, we found that more junior researchers tend to have a more positive view of the impact of generative AI compared with their more senior colleagues,” notes an IOP Publishing analysis. This generational difference may stem from younger researchers being “digital natives” or from having limited experience understanding what constitutes rigorous peer review.

Perhaps most revealing is what researchers call the trust paradox. While many scientists agree that AI can improve manuscript quality, 57% say they would be unhappy if a reviewer used AI to write peer review reports on their own manuscripts. That number drops to 42% when AI is used merely to augment reports, but the discomfort remains substantial.

“AI feels acceptable as an assistant, but troubling as an unseen co-author,” noted one UK-based biomedical researcher quoted in the Frontiers report. Without shared rules, the boundary between support and substitution becomes increasingly blurred.

The Detection Challenge

Adding to the complexity, 72% of respondents believe they could accurately detect an AI-written peer review report on a manuscript they had authored. But research suggests this confidence may be misplaced.

An IOP Publishing study examining actual peer review reports found that AI-generated reviews often use generic language, lack subject depth, employ mechanical or unnatural tone, display excessive or unusual punctuation, and include verbose, overly elaborate language with “a large number of dashes” used throughout.

Research on AI conference peer reviews found that between 7% and 17% of review reports submitted in 2023 and 2024 contained signs they had been substantially modified by large language models, meaning changes beyond simple spelling and grammar corrections.

What AI Can (and Can’t) Do

Despite rapid uptake, the Frontiers report identifies a clear absence of formal governance. Few institutions or publishers provide structured training on ethical AI use in peer review, leaving researchers to self-educate through informal experimentation. Policies vary widely across journals, creating uncertainty about what constitutes acceptable practice.

“AI is already improving efficiency and clarity in peer review, but its greatest value lies ahead,” said Elena Vicario, Director of Research Integrity at Frontiers. “With the right governance, transparency, and training, AI can become a powerful partner in strengthening research quality and increasing trust in the scientific record.”

Some publishers are developing specialized tools to harness AI’s potential while maintaining human oversight. A startup called Grounded AI has developed a tool called Veracity, which checks whether cited papers exist and then uses an LLM to analyze whether the cited work corresponds to the author’s claims. It functions like “the workflow that a motivated, rigorous human fact checker would go through if they had all the time in the world,” says co-founder Nick Morley.

Meanwhile, in a groundbreaking experiment, a paper produced entirely by The AI Scientist-v2 system passed peer review at a workshop during the ICLR 2025 conference, receiving acceptance scores of 6, 7, and 6 (clearly above the acceptance threshold). The paper was withdrawn before publication as part of the experimental protocol, but the result demonstrates AI’s evolving capabilities.

The Unethical Use Problem

At the same time, the unethical use of AI to mass-produce low-quality or fabricated research poses new challenges for peer review. Markram observed that these risks only sharpen the imperative to harness AI as a force to combat fraud, one that scales quality and strengthens integrity across the entire research cycle.

“With the right leadership and guardrails, AI can act as an elevator for research quality, rather than a shortcut around it,” she noted.

Frontiers’ research integrity team (one of the largest in publishing, established in 2016) filters out approximately 35% of manuscript submissions before they even reach editorial boards and peer review, while the overall rejection rate across all Frontiers journals stands at 58%, reaching over 80% in some journals or regions.

A Call for Coordinated Action

The whitepaper calls for coordinated action across the research ecosystem. Publishers are urged to embed transparency, disclosure and human oversight into editorial workflows. Universities and research institutions are encouraged to integrate AI literacy into formal training. Funders and policymakers are asked to harmonize standards internationally, while AI developers are pressed to prioritize auditability and safety by design.

Frontiers’ position is that “clear boundaries, human accountability and well-governed, secure tools are more effective than blanket prohibitions in protecting and strengthening research integrity.” The company notes that “the greater risk to peer review quality comes from unregulated, opaque or undisclosed AI use, which is already occurring across the research ecosystem.”

This aligns with European Commission guidelines from 2024, which state researchers should “refrain from using generative AI tools substantially in sensitive activities, for example peer review.” The guidelines add that using AI to find background information is not substantial use, but “delegating the evaluation or the assessment of a paper is a substantial use.”

The Future

As Yin’s team at Cornell plans a symposium for March 3-5, 2026, examining how generative AI is transforming research and how scientists and policymakers can best shape these changes, one thing is clear: the question is no longer whether AI will be part of peer review.

“Already now, the question is not, ‘Have you used AI?’ The question is, ‘How exactly have you used AI and whether it’s helpful or not,'” said Yin.

The quiet revolution inside peer review is no longer theoretical. It is already reshaping how science is evaluated, paper by paper, reviewer by reviewer. Whether it strengthens scientific integrity or weakens public trust will depend on whether the global research community can govern AI with the same rigor it demands of evidence itself.