Students in the program complete 33.5 credits, which include 30 credits of coursework, a 2-credit capstone project and a 1.5-credit immersion experience that will take place at SMU. Students can earn the Master of Science in Data Science in 20-28 months.
Students in this course receive an overview of statistical methods from an experimental design perspective. Students will review statistical sampling, T-tests, Analysis of Variance, Linear Regression and other skills. Rather than calculations, the course focuses on interpretation, analysis and communication of the results and ethics of statistical analysis.
Experimental Design, Statistical Sampling, T-tests, Analysis of Variance, Linear Regression, Diagnostics and Checks for Statistical Methods, Interpretation and Communication of Results (both oral and written), Ethics of Statistical Analysis
A project-based course that brings together methods, concepts and current practices in the growing field of data science, including statistical inference, financial modeling, data visualization, social networks and data engineering. Emphasis is on the ethical dilemmas involved in gathering, storing, analyzing and disseminating information from large databases.
Statistical Thinking in the Age of Big Data,Exploratory Data Analysis (EDA), Kernel Density Estimation, Advanced Regression, Social Networks and Data Journalism, Financial Modeling, Reproducible Research and Sharing Your Work, Ethics and Privacy
This course builds on Statistical Foundations for Data Science with attention to the analysis of multivariate data. Basic machine learning methods, such as linear discriminant analysis, logistic regression and principal components analysis, are discussed. Emphasis is on interpretation of the analysis rather than calculations.
Multiple Linear Regression and Variable Selection, Multivariate Analysis of Variance (MANOVA), Linear and Quadratic Discriminant Analysis, Unsupervised Learning (Clustering), Methods for Categorical Variables (Explanatory and Response), Autoregressive Models for Time Series Data, Basic Bootstrap
This course surveys current database approaches and systems, as well as the principles of design and the use of these systems. Students learn database query language design and implementation constraints as well as applications of large databases, including a survey of file structures and access techniques, such as NoSQL databases. Students will use a relational database management system to implement a database design project.
This course introduces the processes of managing, exploring, visualizing and acting on large amounts of data. This course provides an introduction to data mining techniques such as classification, regression, association rules, cluster analysis and recommendation systems. All material covered is reinforced through hands-on experience using state-of-the art tools to design and execute data mining processes. Class examples come from Python and R.
Machine learning, Association Mining, Cluster Analysis, Recommender Systems
This course covers principles of planning and conducting surveys, including both probability and nonprobability sample design and analysis, sample size determination and how to use auxiliary sources of data external to the sample to improve estimation. Methods for using information from both samples and “found” (big) data together are discussed.
Probability Sampling, Complex Sample Designs, Analysis of Survey Data
This course introduces data visualization and creative coding using the Processing programming language. Students explore visual and information design principles through code examples. Class activities incorporate 2-D and 3-D computer graphics, interactivity and data input. Procedural and object-oriented programming approaches to data visualization will be covered, as well as an overview of leading-edge data visualization libraries and APIs, including web-based approaches.
Data Visualization, Creative Coding, Visual and Information Design, Programming
This course introduces the sequence of steps needed to carry out Internet-scale data analytics, from hypothesis formation and data collection to methods of analysis and visualization. Students will become proficient in data collection and storage strategies that can be used in later analysis. Script-based programming techniques are used to automate collection from a variety of third-party resources, such as application programming interfaces (APIs). Methodologies for constructing representative samples, storing raw data, merging disparate data sets, cleaning inconsistent entries and constructing derivative data sets are reviewed. Students are introduced to two classes of basic analysis of gathered data – descriptive statistics and data visualization – which are used to validate and improve the accuracy of gathered data sets, a prerequisite to more advanced analysis.
Data Wrangling, Accessing APIs, Data Collection Design and Implementation, Synthesize Concepts in a Capstone Project
This course introduces students to the growing assortment of cloud computing technologies, with an emphasis on fundamental cloud topics such as virtualization, IaaS, PaaS, and DevOps. The course is intended to be hands-on, with students working with current technologies that make the cloud possible. They learn top cloud service providers, the “as a service” deployment model, and selective big data tools. Students will also get a high-level overview of NoSQL, and big data topics such as Hadoop, MapReduce, Pig, Hive, and Spark.
This class introduces machine learning and the data preparation workflow. The machine learning tasks covered include multivariate non-linear non-parametric regression, supervised classification, unsupervised classification and deep learning. For these machine learning tasks, it is shown how to assess the quality of the machine learning models and perform error estimation and feature engineering. All material covered is reinforced through hands-on experiences using state-of-the art tools to design and execute data mining processes. Class examples come from Matllab, Python and R.
Multivariate Non-Linear Non-Parametric Regression, Supervised Classification, Unsupervised Classification, Deep Learning
Matlab, Python, R
Visualization of Information and Creative Coding II
This course focuses on the fundamental concepts, mechanisms and protocols for data and network security. Symmetric key cryptography is the foundation of many security and authentication protocols; their use, security and vulnerabilities are discussed in detail. Students learn public key cryptography; algorithms such as AES, DES and hash algorithms; and protocols built upon and applications that use those fundamental building blocks, such as message authentication, digital signatures and digital certificates. Students also learn about network security principles, access control and user authentication, privacy and the ethics of security.
Ciphers, Hash Algorithms, Secure Communication Protocols
The immersion is designed to offer additional learning, networking and relationship-building opportunities. Taking place on the SMU campus in Texas, the immersion is a 3- to 4-day experience. Students will attend a conference and have the chance to meet in person with classmates, faculty, industry leaders and employers for collaborative, hands-on workshops, panels, lectures and informational sessions. Students are required to attend two immersions during their time in the program. Hotel accommodations are included.
Students will spend the first of two consecutive full terms working on a collaborative group project. During the first of the two terms, students will begin their work on the project and are expected to complete at least half the project by the end of the term. Students will develop and work on their projects under faculty supervision.
Students will spend the second full term working on their collaborative group project from Capstone 1A. Students will then be required to present their completed projects during the on-campus immersion, typically held near week 11 of the term.