In this course, students are introduced to the sequence of steps needed to carry out Internet-scale data analytics, from hypothesis formation and data collection to methods of analysis and visualization. Students will become proficient in data collection and storage strategies that can be used in later analysis. Script-based programming techniques are used to automate collection from a variety of third-party resources, such as application programming interfaces (APIs). Methodologies for constructing representative samples, storing raw data, merging disparate data sets, cleaning inconsistent entries and constructing derivative data sets are reviewed. Students are introduced to two classes of basic analysis of gathered data – descriptive statistics and data visualization – which are used to validate and improve the accuracy of gathered data sets, a prerequisite to more advanced analysis.
Data Wrangling, Accessing APIs, Data Collection Design and Implementation, Synthesize Concepts in a Capstone Project