On the 28th of October 2022, data science day was held at Syracuse University to provide a platform for all data science practitioners, newbies, students, and professors to discuss various topics in the field. The school of information studies successfully organized the day with three main sessions: early morning, late morning, and Afternoon. I was mainly interested in machine learning and AI; thus, I attended the morning session.
Overall, the session was well organized and, in my opinion, has succeeded in helping attendees see the connection between theory and practice. In particular, I was able to see possible data science applications in the real world.
The session was mainly divided into the following three talks:
(a) Jeff Saltz: What is Data Science, and how it relates to Machine Learning and AI
(b) Raj Dewan: How Data Science is used and the Skills needed to “Do” Data Science
(c) Sevgi Erdogan: Smart Cities and Data Science.
In the following paragraphs, I will reflect on my thoughts, key takeaways, and insights I obtained throughout the session.
Great insights sometimes hide within the data we already have:
Foodlink is a Non-for-Profit organization that collects food from restaurants, groceries, and farms and gives it to people in need at almost zero or meager prices. During the session, Professor Dewan spoke about how Foodlink utilized data science to understand the distribution of needs in different suburbs in New York. What astonished me in their story was how data was used to provide completely unexpected insights using only simple descriptive analytics and visualization techniques.
From the analysis made on the data, it was surprisingly found that there was an actual food shortage not only in the poor suburbs but also in some wealthy ones. This insight led the management to consider distributing food in new areas they would never have considered.
My key takeaway from this story is that data can make transformative life changes. Furthermore, as data scientists, we sometimes focus intensively on the technical part; however, we do not necessarily need to build complicated models to create great insights; sometimes, great insights hide within the data we already have.
Utilizing data from multiple sources and visualization can do miracles
As Professor Dewan emphasized, combining multiple data sources can result in miracle insights. One of the main reasons behind the Foodlink project’s success was its ability to retrieve data such as the organization’s historical sales records, the number of students who receive free lunch at each school, etc., and then combine these various sources for analysis.
Thus, as data scientists, we should consider incorporating various data sources as needed in our projects and learn how to do so using scripting techniques. Visualization is also necessary to deliver the insights we create to our stakeholders.
Ethical Concerns about data science
Prof. Jeffery Saltz talked about the ethical concerns of AI and data science projects and why we, as data scientists, should be aware of the implications and consequences of the models we build. He provided an example of Deep Fake as a tool with excellent AI power that can be misused. Deepfakes are extremely convincing fake images and videos created through the use of artificial intelligence.
It once required a lot of images of someone, a lot of time, and a fair-degree of both coding skill and special effects know-how to create a believable deepfake, but it is becoming increasingly easy to create a deepface. My key takeaway from his talk was that data science and AI have changed and will continue to change our world. However, it is essential to understand that with this great power comes a significant risk of misusage that requires accountability.
Model bias is another problem that we might encounter. Models can sometimes be biased toward a specific group in the dataset; most of the time, these biases are unintentional. Thus, we should be more cautious and responsible for the models we create and ensure that they provide fair predictions, which is indeed a problematic task. Finally, I want to end this paragraph with a quote from Prof. Saltz speech: “Fairness is hard because understanding what to do is often hard, as well as fairness is hard because building a model that we want to behave the way we think it should is also challenging.”
Communication skills for data scientists
Almost all speakers emphasized the importance of communication skills for data scientists.
For example, Prof. Raj Dewan said that we could build the greatest model on the planet; however, it would be meaningless and useless if we failed to communicate its insights to the stakeholders. Thus, using visualization tools such as Tableau to tell a story can be an excellent means of communication.
He also suggested different ways and resources to learn from, such as looking at New York Times visualization and trying to mimic those using public Tableau online resources. Additionally, I found the example he provided of Foodlink interesting, especially the visualization usage to communicate the insights.
In conclusion, data science is an interdisciplinary field that requires the possession of various sets of skills. The speakers have successfully highlighted those skills, in addition, to explaining what the best practices are when we carry out data science projects.