Long before data analysis, data mining and data visualization were popular, there were pioneers carving a path through the nascent field of data science – helping to create a vision for what data could do when used in new and creative ways. Here, we take a look at five of the top data scientists who launched the field and what they have contributed along the way.
DJ Patil earned his bachelor’s degree in mathematics from the University of California, San Diego, and has a Ph.D. in applied mathematics from the University of Maryland in College Park. There, he used open datasets published by the National Oceanic and Atmospheric Administration (NOAA) to help improve numerical weather forecasting while he was a doctoral student and later as a faculty member. He is the author of numerous articles and books that have influenced the field of data science.
Patil’s various positions have included vice president of product at RelateIQ – which was acquired by Salesforce – as well as roles at LinkedIn, Greylock Partners, Skype, PayPal and eBay. He also worked at the U.S. Department of Defense, where he led efforts to bridge computational and social sciences to help anticipate emerging threats to the country.
One of the first to coin the term “data scientist,” Patil was named as the first U.S. Chief Data Scientist earlier this year. With an official title that also includes Deputy Chief Technology Officer for Data Policy, Patil is also working on the administration’s Precision Medicine Initiative.
This data scientist struggled at Harvard, dropped out to work in a General Motors factory – then returned to complete his degree. After a short stint as a quantitative analyst at Bear Stearns, he got a job in 2006 with a little company called Facebook – where he began building software tools for data analysis. After three years, he left to become a co-founder of Cloudera – a maker of data management software.
When he recovered from a 2010 health crisis, Hammerbacher decided to put his data science skills to work in a new field: medicine. Recruited by Dr. Eric Schadt, a computational biologist at Mount Sinai Hospital in New York, Hammerbacher was deeply affected by Dr. Schadt’s view of chronic diseases as “complex networked disorders” involving a variety of factors that can be captured as data and modeled.
Still holding the title of chief scientist at Cloudera, Hammerbacher is mostly focused on leading a research team of data scientists at Mount Sinai whose expertise includes machine learning, data visualization, statistics and programming. Together with health researchers, they are working toward making personalized cancer treatments more “automated, practical and affordable.”
A graduate of Stanford University with bachelor’s and master’s degrees in statistics, Edward Tufte earned a Ph.D. in political science from Yale University, where he is a professor emeritus of political science, statistics and computer science. He has written extensively on information design and is a pioneer in the field of data visualization. After teaming up with renowned statistician John Tukey to teach a series of seminars on statistical graphics, he used the content to develop his first book on information design, which became a classic: The Visual Display of Quantitative Information.
His expertise focuses on the presentation of informational graphics and the dire consequences that can occur when data is visually misrepresented. He also has written, designed and self-published three other important works on data visualization: Envisioning Information, Visual Explanations, and Beautiful Evidence.
He is currently writing his next book/film, The Thinking Eye, and constructing a 234-acre tree farm and sculpture park in northwest Connecticut. He still runs ET Notebooks, a moderated forum where he answers questions about information design.
Born in 1915, John Tukey earned bachelor’s and master’s degrees in chemistry from Brown University and then completed a Ph.D. in mathematics from Princeton University – where he spent the rest of his academic career as the founding chairman of Princeton’s statistics department. He is viewed as one of the most influential statisticians in the last half century and is credited with inventing the term “software.”
Tukey also worked for the U.S. government, was awarded the National Medal of Science by President Nixon and was a researcher for AT&T’s Bell Laboratories – where he coined the term “bit,” as an abbreviation of “binary digit” that describes the binary language of computer programming.
Much of his most important work focused on the statistical field of robust analysis. In 1977, he published Exploratory Data Analysis, providing mathematicians with new methods for data analysis and clarity in data presentation. He was awarded the IEEE Medal of Honor for his contributions to spectral analysis of random processes and his work with James Cooley to develop the fast Fourier transform (FFT) algorithm.
John Tukey died in 2000 at the age of 85.
A leading expert in business analytics, data mining and data science, Gregory Piatetsky-Shapiro holds both a master’s degree and a Ph.D. in computer science from New York University. He’s a co-founder of KDD (Knowledge Discovery and Data mining conferences) and SIGKDD, a professional organization for knowledge discovery and data mining.
Piatetsky-Shapiro has led data mining and consulting groups at GTE Laboratories, Knowledge Stream Partners and Xchange. His background includes extensive experience in developing customer relation management (CRM) systems and other data-driven business applications for an array of industries as well as data analysis for a number of biotech and pharmaceutical companies.
Currently, Piatetsky-Shapiro is the president of KDnuggets and a consultant in the areas of business analytics, data mining, data science and knowledge discovery. He’s also the editor and publisher of KDnuggets News, and KDnuggets.com – both of which provide extensive resources related to the field. He has more than 60 publications, including two best-selling books and several edited collections on topics related to data mining and knowledge discovery.
Thanks to pioneers such as these, data science has come a long way since its beginnings – and has a promising and dynamic future ahead.