Over the summer, I worked upon the air quality citizen-sensing project. Many people from all over the world talk about air quality and its importance. The primary objective for the project was to try and collect data where citizens were speaking about air quality. Working with Ph.D. student Ehsan Sabaghian under Professor Murali Venkatesh over the course of summer, I have been working on the AirPi forum. The AirPi Forum is a forum where citizens from all around the world are able to take part in the discussion on air quality using data gathered from AirPi kits. I loved the idea of trying to find different citizen’s take on air quality.

Reasons for Doing Air Quality Research

I wanted to do this research to not only learn the technical aspect of data analysis, but also to know how research is done to extract the required data. Research lays the foundation for data analysis, which makes it crucial. Collecting and having a good data set is critical before beginning any kind of analysis. Moreover, I also wanted to gain experience in taking on the responsibility of handling a project.

I began working on this project in May 2017 and has provided a great learning experience. I began with a search for information dealing with air quality. Finding information proved difficult due to the myriad of research available.

After researching air quality information, the next step was to organize the data in a way that would be easy to analyze. I discovered how to clean data and how important it is to organize data before starting to work on it.

Importance of Utilizing Excel

Excel provides a set of inbuilt functions, along with several other features that clean the data. I mastered this idea while working with the air quality dataset. Metadata is data directly correlating with the original data. Essentially, it helps figure out and understand the given dataset correctly, particularly when you are working on your own.  Creating a metadata helped me know and understand the dataset thoroughly.

After having a good  foundation of knowledge regarding the dataset, I went ahead with performing analysis on that data. Performing descriptive analysis helped me understand the details in the dataset, and provided me with the quantitative summary of the collected dataset.

In addition, I learned about how data can be well-summarized using the measure of frequency, the measure of central tendency, the measure of dispersion, and measure of position. Excel makes summarizing the quantitative data easy by providing these options under a tab named “Data Analysis”. Using graphs, Excel visualizes the quantitative data, which provides a visual aid to make the data easier to comprehend.

I typically struggled with learning some new tools, I was working on performing network analysis using Gephi and it took me a lot of time to understand and familiarize myself with this tool Gephi. Also, in the initial stages, researching the data was not exactly a struggle, but I had to do an extensive research to find quality information and that was not easy.

Differences in Various Data

While working on this research, I got a clear picture of the difference between quantitative data and qualitative data, as well as the different analysis techniques used for each. Quantitative data is the numerical data. Descriptive analysis is performed on the data to comprehend what the data represents.

On the other hand, qualitative data is typically the data which is not numeric.  Qualitative data analysis  identifies patterns in the dataset and to grasp the characteristics of the data.

Network analysis is the study of the relation between the different actors in the dataset. Network analysis assisted me with understanding which user talks to the other user, and also the number of times they talk to one other. Also, network analysis provided me with a picture of which users drive the conversation on the AirPi forum and how these users influence the overall network.

Also, in order to implement network analysis, I learned two tools: NodeXL and Gephi. NodeXL is an excel add-in and it is a very easy to use tool to visualize networks. Gephi is a popular tool used for network visualization and appeals to larger graphs and visualizations.

Tasks Accomplished

Over the course of a fruitful summer, I accomplished the following tasks while working on this project:

  • Researched air quality to gather all the data required for the project while mapping all the required data into a standard Excel template
  • Cleaned a large dataset using Excel
  • Created a Metadata for the dataset
  • Performed statistical descriptive analysis on the data using Excel
  • Conducted qualitative analysis using Excel and understood the differences from quantitative analysis
  • Implemented social network analysis using tools like Gephi and NodeXL

Benefits of Doing Air Quality Research

In conclusion, I want to say that this form of data analysis can be of great help for the researchers working in global air quality. Researching and organizing data attempts to bring awareness about the quality of the air. Furthermore, this research has greatly benefitted me by strengthening my concepts within data analysis. This experience also gave me hands-on experience of working with a large data set with real outputs.


Hitarthi’s Key Terms

AirPi: Raspberry Pi shield kit, capable of recording and uploading information about temperature, humidity, air pressure, light levels, UV levels, carbon monoxide, nitrogen dioxide and smoke level to the internet. (source: airpi.es)

AirPi Forum: A forum where people from all over the world are able to take part in the discussion of air quality using data gathered from AirPis

Qualitative Data: Non-numeric date. Used to identify problems

Quantitative Data: Numerical data

NodeXL: Excel add-in feature, visual representations

Gephi: Network visualizer specific to larger representations