The prevalence of mistakes in published gene research could be more widespread than previously thought, according to an analysis of cancer-genetics papers in two high-impact journals.
By combing through the supplementary information for hundreds of papers, a team led by cancer researcher Jennifer Byrne at the University of Sydney in Australia has identified some highly cited studies that contain errors in the DNA or RNA sequences of reagents. Scientists use these reagents for various reasons — for example, to study the function of a given gene or genetic sequence in a disease — and if the sequences are wrongly reported it could affect the reproducibility of the research.
Errors in genetic sequences mar hundreds of studies
It is unclear whether the errors are accidental or indicate misconduct. The study, published on the preprint server bioRxiv on 3 February1, has not been peer reviewed. Some researchers also question the extent to which errors at the level of individual nucleotides might affect the papers’ conclusions. However, most agree that the presence of such mistakes in the scientific literature is worrying.
“It is certainly disheartening to see these sorts of errors,” says Jeremy Wilusz, a molecular biologist at Baylor College of Medicine in Houston, Texas. “If it’s just a reporting issue or something bigger — that, I don’t know — but it shouldn’t be happening.”
Searching for errors
Byrne and her team have been scouring the scientific literature for mistakes in genetic sequences since finding a handful of papers with this problem in 2015. In 2021, Byrne and her colleagues analysed almost 12,000 papers in the journals Gene and Oncology Reports using Seek & Blastn, a software tool that extracts short nucleotide sequences mentioned in papers, spots potential errors, and cross-checks those against a public database of nucleotides known as BLASTn. They found more than 700 papers reporting experimental reagents that had errors in their RNA or DNA sequences.
In the latest study, the team wanted to investigate journals with a relatively high impact factor — a measure, based on citations, that is used by some as a proxy for a journal’s reach and prestige. “Arguably, it’s the higher-impact-factor literature that people pay more attention to,” Byrne says.
The researchers focused on cancer-genetics papers published in two journals: Molecular Cancer — in which they had found a few papers with incorrect sequences during previous analyses — and Oncogene. (Both journals are published by Springer Nature — Nature’s news team is independent of its publisher.)
Jennifer Byrne: error sleuth
Byrne and her colleagues manually screened reagents claiming to target unmodified human genes or genomic sequences in 334 Molecular Cancer papers published in 2014, 2015, 2018 and 2020. (Because the nucleotide-sequence reagents in the papers were reported in the supplementary files, rather than the main text, the team couldn’t use the Seek & Blastn tool.)
The team found errors in 253 (3.8%) of the 6,647 nucleotide sequences analysed. The mistakes were spread across 92 of the 334 manuscripts, and the median number of problematic sequences was 2 per paper. The proportion of papers with nucleotide-sequence errors ranged from 10% of papers in 2016 to 38% in 2020. “The biggest surprise for us was the proportion of papers with errors in 2020,” Byrne says. “We didn’t expect that.”
For papers in Oncogene, the team did a more targeted search, pinpointing papers published in 2020 containing mentions of circular RNA or microRNA — terms that were associated with the papers the researchers had identified in Molecular Cancer. Of the 1,165 sequences they screened, 50 contained errors. These were found in 21 of the 42 papers included in the analysis.
“We all know this happens, but I was surprised by the extent of the problem,” says Jo Vandesompele, a cancer researcher at Ghent University in Belgium. He adds that there are known issues in circular-RNA research, such as incomplete and inconsistent databases of circular-RNA sequences, that might make it easier for these types of issue to slip through.
The papers flagged by Byrne’s team had been highly cited: together, the 92 Molecular Cancer papers had been cited 8,048 times and the 21 Oncogene papers 878 times. Some of the papers had been cited more than 100 times.
Byrne acknowledges that some of the errors flagged in the analysis could be unintentional. Other researchers note that it is also possible some errors will have little effect on the overall conclusions of a paper. Some told Nature that one or two nucleotide changes to a sequence needn’t render a reagent completely dysfunctional — under certain conditions, it might still work with a commonly used laboratory technique such as the polymerase chain reaction (PCR).
Bernd Pulverer, head of scientific publications at the European Molecular Biology Organization in Heidelberg, Germany, says that, regardless of the errors’ origins, their presence in the published literature is a problem. “Mistakes are damaging, because people cannot rely on these papers to base future research on,” he says.
Byrne and her team say that the nature of many of the errors makes them seem suspicious. They found that some of the reagents that purported to target human genes or genomic sequences had no identifiable targets in the human genome, and that some targeted sequences in other species, such as rodents, plants and fungi.
Autocorrect errors in Excel still creating genomics headache
“It’s definitely very concerning that they don‘t have a perfect match to the intended target,” Vandesompele says. He adds that although researchers might be able to amplify sequences with PCR using reagents that have one or two errors, such mistakes could compromise the reagents’ selectivity or specificity, and that “it’s just common sense” not to design a reagent with mismatches.
Another factor to consider, Byrne says, is that around one-third of the Molecular Cancer papers and around one-quarter of the Oncogene papers identified as having errors have also been flagged on the post-publication peer-review platform PubPeer, mostly for separate, image-integrity issues.
“The editors-in-chief of both journals, and Springer Nature, agree with Professor Byrne that ensuring the integrity of the publication record is of the utmost importance, and we take concerns raised regarding the papers published in our journals very seriously,” says Chris Graf, research-integrity director at Springer Nature. “We requested details of these concerns, so that we could investigate them and act where appropriate, over a year ago, but they have only just been made available. Now that we do have them, we are able to start a full investigation.” He adds that, “If concerns prove to be well founded, we will take action.”
Graf says that two Molecular Cancer papers flagged in the 2021 analysis have been investigated, and although one was corrected after typographical errors in two reagent sequences were verified, the ‘errors’ flagged in the second paper turned out to match the intended gene and species. Byrne asserts that they remain incorrect.