DataScience@SMU FAQs Answered by Professor Monnie McGee

DataScience@SMU recently hosted a curriculum webinar that featured a Q&A portion with Professor Monnie McGee and prospective students. Below, Dr. McGee provides prospective students with an overview of what to expect in the program, the types of skills they will acquire through the program, and how data science compares to traditional statistics and analytics.

How does a data science degree compare to a traditional statistics degree?

Data science is somewhat of a hybrid of statistics and computer science. A data scientist knows more about computer science than a statistician and knows more statistics than a computer scientist. I used to think that data science was a different term for statistics but there is a distinction between the two. Statistics is just as much about solving real-world problems as data science. The difference is in the approach. A statistician is trained to find an equation or a distribution function that explains relationships in the data. This function will have parameters that need to be estimated from the data, usually using an optimization algorithm such as maximum likelihood estimation. A data scientist will more likely proceed with a machine learning approach, where parameters are used to “tune” an algorithm. That isn’t to say that data scientists don’t use statistical models and statisticians don’t use machine learning. Both types of scientists should and do use both types of methods. The difference is mainly in the methods emphasized in training.

For someone with a non-computer science background, how steep of a learning curve will there be in terms of the coding languages? Can students catch up by taking courses beforehand and teaching themselves (e.g., Coursera, Code Academy)?

Yes, it is possible to learn the necessary computer science skills to be successful in the data science program with self-directed coursework. In particular, a course in basic programming skills (i.e., how to talk to a computer), would be helpful, regardless of the programming language used. If students are not familiar with basic programming, students will struggle when we apply programming skills to particular languages.

What does the learning curve look like for students who haven’t taken statistics?

Any current student that has a strong computer science background and a weak statistics background will probably tell you that the statistics courses are challenging. We send new students an intro to stats textbook, and we are also developing a bridge course to close the large gap.

Is machine learning taught in one of the program courses? If so, which one?

First, we need to define machine learning. By machine learning, I mean the use of iterative algorithms that update a model or a process on the basis of what is learned in the previous step of the algorithm. To me, machine learning is synonymous with data mining. While this course content covers machine learning, there is also a focus on using visualization to display results that is not typically covered in a traditional data mining course. The Doing Data Science course covers some machine learning algorithms at a very high level. The second semester of statistics gives a more detailed overview of linear discriminant analysis and logistic regression as machine learning algorithms.

For some students who have a strong mathematical background, especially in fundamentals, they may expect data science to include heavy amounts of linear algebra and theory. How would those skills be applicable in the program?

We do not cover the theoretical aspects of linear algebra and other mathematical tools in our program. In data mining, students will learn the optimization algorithms required for machine learning. Little linear algebra and calculus is used. The main idea is not to build upon too much of the math, but to give students enough understanding of the theory so they can understand the methods. For example, in the statistics course, some linear algebra is introduced in the context of multivariate methods, so that students can understand the logic behind the methods. Mathematical theory isn’t required, but if students already have it, they will have a better grasp of the material.

What’s the difference between this program and business analytics?

A large percent of students currently in the program have a background in business. Many of the students in the program require data science skills for their current jobs. Possessing the skills to make effective decisions based on comparative analytics is becoming essential across all industries. Business analytics is a more applied degree and data science is a broader degree intended to give students the background to go into any field — and even on to pursue a research path.

Will there be any courses on coding such as Java, or C or C++?

No, the courses currently in the DataScience@SMU curriculum use SAS, Python and R.

How were the courses/curriculum designed for this program?

DataScience@SMU is truly an interdisciplinary program, with faculty from three SMU schools — Dedman College of Humanities and Sciences, Lyle School of Engineering, and Meadows School of the Arts — working together to develop coursework in computer science, statistics, strategic behavior and data visualization. The program was designed to meet these key objectives: ask relevant questions about data, retrieve pertinent data, visualize data, analyze data, interpret results, communicate findings, and understand the ethical responsibility and legal requirements of data, such as those regarding data security.

Learn more about the DataScience@SMU curriculum.