ORI Grant Funds Automated Tool to Detect Potential Fraud in Scientific Papers - iSchool

The Office of Research Integrity in the U.S. Department of Health and Human Services has awarded funding to a School of Information Studies (iSchool) professor to further automate the detection of fraudulent material in scientific papers.

A grant of $149,310 has been awarded to Daniel Acuna, assistant professor. His project aims to advance the detection process by developing tools and systems, including scalable software and infrastructure and statistical feedback, to be used by integrity investigators. The award was presented for his project, “Methods and Tools for Scalable Figure Reuse Detection with Statistical Certainty Reporting.”

Acuna plans to develop a data-searching tool that will boost the scale at which articles are automatically searched to detect figure reuse, thus finding cases of potential inauthenticity and inappropriate reuses much more quickly and across broader repositories of information.

Until recently, the Office of Research Integrity could only act on charges of inappropriate reuse of material on an ad-hoc basis, using painstaking methods to review cases individually, sometimes only when notified of instances by whistleblowers and operating at a limited scale, Acuna says.

Fraudulent reuse of images in articles presents a significant problem within scientific research, and it is an issue that is important to the general public, too, Acuna contends. “Fraudulent reuse is becoming an increasingly common problem that damages the public perception of science. Science has long-term consequences on health policy, drug development and disease prevention. Treating the public with procedures that are based on fraudulent science would be perhaps one of the most unethical and scary outcomes of this practice. Everybody would like to deter fraudsters before this happens,” he explains.

Acuna previously worked with researchers Paul Brookes (University of Rochester) and Konrad Kording (University of Pennsylvania) using machine learning techniques to review scientific materials to detect fraudulent reuses.

In an early 2018 study, “Bioscience-Scale Automated Detection of Figure Element Reuse,” they describe scanning Pub Med Open Access articles and finding that fraudulent reuses could be detected automatically at a significantly larger scale than before. They also determined that, of the group of articles investigated, around 0.6 percent were very likely to be fraudulent. “Adapting machine learning techniques and developing useful user interfaces went a long way. We were able to detect cases at a very high pace,” Acuna reports.

In the next step, plans call for dramatically scaling automated detection of figure reuse across bodies of articles. Acuna proposes developing statistical methods to support conclusions regarding figure reuses, and eventually, developing tools and techniques to help detection of reuses become a standard practice. He will be collaborating with the Office of Research Integrity’s researchers and others in the same field during the course of the project.