Are you fresh-out-of-college, or a young professional seeking a career in data science – but don’t know how to get there? Getting your foot in the door for a coveted data science position may not be as hard to achieve as you think.
What skills should you learn?
If you’re just getting started, you’ll want to figure out which data science skills are essential for you to pick up.
As a research analyst working in data science, I would recommend learning Python and R programming, along with foundations in databases and information visualization. This combination of skills will open the doors to an entry-level data science position that can be the beginning of a fun and challenging data science career.
Why do I suggest learning these skills in particular?
Data Storage and Retrieval
Data can come in all sizes and all formats, but in most cases, data is collected and stored in databases. Having a strong understanding of databases and the ability to run queries is essential for data scientists. Without it, you won’t be able to access your data.
Now, you’ll likely want to do something with the data in your database: manipulate it, look for patterns, or analyze it. It only takes a few lines of Python code to process or run natural language processing (NLP) analysis on a large amount of text data. I recommend starting with Python because it is easy to learn, the syntax is readable and there are tons of resources available online.
R is a programming language for statistical computing and graphics. R focuses on user friendly data analysis, statistics and graphical models. Learning even some basic skills in R will save you time creating graphics and other visual representations of your data. You can either spend hours of time creating and refining visualizations by hand, or you can know a few lines of R programming that can both find hidden patterns in your data and create graphs to visualize them.
A good data scientist will be able to retrieve and analyze data, but a great data scientist will also be able to communicate their findings to a broader audience, such as a boss, a board of executives, or the general public.
How should you learn these skills?
Now you have an idea of what skills you want to learn. How will you choose to spend time learning these skills?
If you are brand new to data science, you will probably want to start with the basics. An Introduction to Data Science, written by iSchool Interim Dean Jeffrey Stanton, is a good start. It is an eBook and is available as a free download. In the book, you will be introduced to R programming and you can work through some of the basic problems that data scientists address on a regular basis.
Python.org has the resources for you to get started with Python programming. Once you feel you’ve got a good handle on Python, try to familiarize yourself with the following tools:
- Natural Language Toolkit (more commonly known as NLTK), which provides text processing libraries
- scikit-learn, which is a machine learning toolkit
- Weka is a data mining toolkit
Going the Extra Mile: Data Science Classes
If you want to push your skills further, consider taking a class in NLP or data mining. In NLP and data mining classes, you can learn from professors who are working data scientists and data researchers to better understand a variety of data science problems, and the different scientific approaches for solving those problems.
While knowing Python will help you in implementing solution to a data science problem, knowing the basics of NLP and data mining will help you understand the data science problem and find a solution to your problem. From my experience, these skills are best learned when you can get hands-on experience: playing around and experimenting with real or close to real datasets, while also studying the basic concepts.
Each semester, graduate and undergraduate iSchool students
present their final information visualization projects.
Here at the iSchool, our data science classes offer a blend of classroom and lab sessions to get hands-on experience playing with datasets. The iSchool offers a couple ways to get classroom experience in data science:
Certificate of Advanced Study (CAS) in Data Science
The CAS in Data Science is a streamlined track of classes that provides students with an opportunity to gain foundational data science knowledge. You don’t have to be enrolled in a master’s program to enroll in the CAS in Data Science, but you can add a CAS in Data Science to a full master’s program.
Master of Science in Information Management (IM)
If you’re serious about studying data science, the iSchool also offers a full, two-year master’s program, the MS in Information Management, which gives students the flexibility to take data science classes and apply those credits toward their master’s degree. Current IM students use the program’s elective credits to take as many data science classes as they can (usually to get a CAS in Data Science), or customize their program to another specialization.
Can’t make it to class, but still want a challenge?
Are you a data scientist or data science student? What other resources or advice would you recommend to new data scientists to get the foundations in the field? Share your suggestions in the comments below.