The Society for Neuroscience conference (SfN) is the largest and oldest neuroscience conference in the world, and one of the largest conferences across science. More than 50,000 scientists travel from all over the world to learn the latest research presented in more than 15,000 posters.
While many things have changed since the first days of the conference in 1945, one thing remains the same: every attendee spends a long time figuring out which posters to visit out of those 15,000+ options. SfN is not unique: the annual conference of the American Library Association has more than 16,000 attendees and thousands of presentations.
Organizers try to help attendees by clustering posters into hand-curated sessions or letting poster presenters provide searchable keywords. One shortcoming of these approaches is that topics and keywords are not fine grained enough or, worse, they fail to capture the latest trends in research.
How can we help scientists better understand the space of ideas close to their interests in such large events?
My colleagues and I applied data science to the problem, developing a method to model the content of posters and creating a recommendation system based on those contents. This method is called Science Concierge and we have successfully applied it to many conferences—small and large—since its creation.
How Science Concierge works
Science Concierge uses Natural Language Processing (NLP) with data science to suggest posters close to attendees’ preferences and far from attendees’ dislikes. The technique transforms the contents of each poster (e.g., its author list, title, abstract) into a mathematical fingerprint. Then it takes attendees’ likes and dislikes to produce an average preference vector from those fingerprints and make recommendations based on this vector.
There are many parameters in the method and we use cross-validation to find the best ones. Using data provided from previous years, we aim at mimicking what humans would have done. We use this idea in two key parameters.
First, we use past human-curated poster classifications as a signal to learn the proper combination of fingerprints from liked and disliked posters. For example, how averse would attendees have been to recommendations close to disliked posters?
Second, we use this human curated classifications to learn the proper complexity of fingerprints and preference vectors. For example, would small preference vectors capture large scientific fields with few numbers but risk being too coarse? Cross-validation finds the combination of parameters that would recommend posters that were nearby in human topic distances. As an example, for the fingerprint complexity parameter figure below shows that around 100 numerical components gives the closest topic distances for recommendations.
Of course, there are many more details about the method and you can read the full article online. However, the general data science principle is the same: If you want to automate a task, get training data, find the appropriate Machine Learning technique, and properly evaluate the performance of your method.
We made the method available as a Python software package that can be downloaded on Github.
Take the next step.
Harness the power of information and further your career with a Master's in Applied Data Science.Get Started
This work has many applications beyond finding interesting posters in conferences. We have applied the same principle to automatically suggest which reviewer should review what article in conferences. Also, we have built a large scale recommendation system to suggest scientific articles called Scholarfy.
In a recent grant “EAGER: Improving scientific innovation by linking funding and scholarly literature”, my students and I are building a large recommendation system that would cross suggest publications and grant opportunities, helping young scientists that are new to grantsmanship and program officers as well. This shows that recommendation systems and data science can significantly contribute to many tasks done manually by scientists today.
I am a strong believer in Open Science and Reproducibility and therefore I try my best to share datasets, analysis, and code to reproduce results. You can visit my Github account for the code of my projects, follow me on Twitter @daniel_akuna, or simply drop me a line at email@example.com if you would like to collaborate!