For academic journal editors and research integrity officers at post-secondary institutions, detecting the re-use of images and illustrations in academic papers can be a time-consuming, if not impossible task. While resources for detecting similarities and plagiarism in text submissions have been in use for several years, up until now, there has been no technological solution that could be applied to finding duplicate images across research literature.
That may soon change, thanks to work done by School of Information Studies (iSchool) Assistant Professor Daniel Acuna.
In a paper posted on the bioRxiv preprint server and reported in Nature, Acuna and his research team, Paul Brookes at the University of Rochester and Konrad Kording at the University of Pennsylvania, outline how they used an algorithm to successfully search through nearly 800,000 biomedical papers and 2 million images, scanning for and detecting duplicate imagery.
“This research shows that it is feasible to use machine learning to conduct advanced analysis of science with big data,” Acuna explained. “If editors and research integrity officers were to adopt this method, it would make it easier for them to screen and evaluate images in scientific papers before publication – something that currently requires considerable effort, isn’t widely undertaken, and is prone to errors.”
Acuna and his colleagues have found that editors and research officers identified image reuse as a problem, but one that they lacked an easy solution for. “They have cases sitting on their desks, but it’s hard to check for this manually, as they’d need to take each of the figures and then analyze them by hand,” Acuna said. “With the algorithm, it goes through all the data and finds the duplicated figures, even if they’re rotated or skewed in some way.”
With the way that their tool can rapidly detect image reuse at scale, Acuna believes that it soon will be able to ensure scientific integrity across a broad range of disciplines.
“I think that a great deal of scientific fraud will be, sooner or later, detectable by automatic methods,” Acuna remarked.