Applied Data Science with Python UM Specialization
Dylan | May 18, 2020
I completed the Applied Data Science with Python Specialization by the University of Michigan through Coursera. Coursera is a giant online learning platform offering massive open online courses (MOOC) and “Specializations”. Many major companies and universities have courses available on the platform, such as IBM, Microsoft, Harvard, and Yale.
Each Coursera course is broken down into 4-10 weeks, each composed of video lectures, quizzes, assignments, projects, and assessments. Coursera “Specializations” can be obtained by completing a group of courses related to a specific skill. In this case, Data Science with Python.
To receive the Applied Data Science with Python Specialization, you must complete the following 5 courses offered by the University of Michigan
- Introduction to Data Science in Python
- Applied Plotting, Charting & Data Representation in Python
- Applied Machine Learning in Python
- Applied Text Mining in Python
- Applied Social Network Analysis in Python
Each course is broken down into 4 weeks, meaning the entire specialization is expected to take 20 weeks, or roughly 5 months to complete. Just looking at the course names, it certainly felt like 5 months worth of material! Now let’s explore each course in more detail.
Course 1: Introduction to Data Science in PythonGiven my prior experience with Python, I expected to breeze through the first course. Perhaps I was slightly misled by the word “Introduction”, but they really expect you to be quite comfortable with Python before you begin.
This course dives headfirst into the NumPy and Pandas, two essential Python toolkits for data cleaning and processing. You’ll learn the basics of Pandas, such as how to store data into DataFrames, query data stored in DataFrames, and later you’ll learn more advanced topics such as merging, grouping data, and manipulating dates. You’ll also be introduced to the Jupyter Notebook, which you’ll use to complete your weekly assignments.
This course is perfect for intermediate Python users who are looking to learn more about NumPy and Pandas. The weekly assignments can be very challenging at times, so be prepared to Google and scan through Stack Overflow.
Course 2: Applied Plotting, Charting & Data Representation in PythonBuilding upon the tools to manipulate and store data in Course 1, this course moves into the excited world of data visualization. There’s a lot of theory covered in this course and I thoroughly enjoyed reading all of the supplementary material provided. In addition to theory, you’ll be introduced to the matplotlib and Seaborn libraries.
This course shines a light on the importance of designing graphs and charts that won’t accidentally deceive viewers. After completing this course, my eyes have been opened to the way people use data visualizations to mislead (intentionally or not).
The final project is really interesting and forces you to venture out and explore a question interesting to you. You will have to find your own datasets and produce a meaningful visualization that peer learners will review and provide feedback on.
Course 3: Applied Machine Learning in PythonThis is a crash course in supervised machine learning methods. You’ll learn basic machine learning concepts, tasks, and workflow with the essential scikit-learn machine learning library. You’ll learn the most popular supervised methods for both classification and regression tasks.
The course also touches on the connection between model complexity and performance, proper feature scaling, and overfitting. Some of the models you’ll put to use in this course are linear regression (least-squares, ridge, lasso, logistic, and polynomial), support vector machines, k-nearest neighbors, ensembles such as random forests and gradient boosted trees, and finally a touch of neural networks.
There’s an entire week devoted to model evaluation and model selection methods that was very insightful. You can use this insight to understand and optimize the performance of your models. This was probably the most exciting course in the series.
Course 4: Applied Text Mining in PythonAfter this course, you’ll be well equipped to handle a variety of text mining and text manipulation tasks. You’ll learn how text is handled by python and the important nltk framework for manipulating texts.
Being personally interested in NLP and how computers understand text and language in their own special way, I enjoyed this course quite a bit. You’ll be introduced to regular expressions and how to clean and prepare text for the machine learning process. At the end of the course, you’ll get to play with topic detection and grouping them by similarity (topic modeling).
Course 5: Applied Social Network Analysis in PythonFor me, this was certainly the most challenging course in the specialization. You’ll be taught the basics of network analysis through the NetworkX library. There are many phenomena that are well-suited to be modeled as networks, something I rarely considered before completing this course.
You’ll understand the concept of network connectivity and network robustness. The course also dives into different ways to measure the centrality of nodes in a network and finally how to observe and predict the evolution of networks over time.
Final ThoughtsI would definitely recommend this Specialization to intermediate Python users who are interested in learning more about Data Science or Machine Learning in Python. There’s a strong focus on completing projects in this specialization, so you can immediately apply what you’ve learned in the lectures and supplemental reading materials.
This specialization is a large time commitment, but you’ll certainly learn a lot and afterwards be well-equipped to begin solving interesting questions using data and machine learning.
If you’ve taken this course or another on Coursera, let us know your thoughts below! If you’re considering enrolling in this course and have additional questions, I’ll be happy to answer them in the comments below too. Thanks for reading and happy coding from Nimble Coding!