As recruiters in the analytical space for almost 40 years, our first reaction was eh, Data Science, just another name for statistical analysis. We were wrong. Data Science requires a set of skills that includes statistics but isn’t limited to statistical analysis. Silicon Valley question-and-answer website, Quora.com had some interesting exchanges on this topic.
Statisticians are assumed to distance themselves from the “nitty-gritty details of real world data” says Alex Blocker, a Harvard PhD in Statistics. Another PhD in Statistics, Michael Hochster from Stanford, says, “Statisticians have a well-deserved reputation for focusing on issues no one cares about, for encouraging ritualistic application of our methods, and for being terrible communicators.” Xiao-Li Meng, former chair of the Harvard Statistics Department, is “uncomfortable saying that we need data science without ever mentioning statistics.” Dr. Meng goes on to ask that while, “we need to adapt to the new data environment in a proactive manner, can we make this into an information-preserving transformation? How do we continue thinking about the big picture of statistics, such as bias variance tradeoff, minimizing information loss, conditioning, choosing the right replications, while revamping the parts of statistics inappropriate for modern applications?”
Michael Hochster, our aforementioned Stanford PhD in Statistics, offers some perspective on what statistics do offer in this data science world.
- “The habit of thinking about model’s underlying data. In many real data sets, the records are not even close to independent. Values may be censored or truncated or missing non-randomly. Data is often sampled in biased ways that need to be taken into account. In these situations, black-box prediction won’t be enough, you actually have to think about what things mean. Statisticians get this, data scientists not so much.
- Collecting data efficiently. Statisticians put more thought into survey sampling methods and experimental design than most others.
- Appropriate skepticism about storytelling. Statisticians know that if you fish through data you will find something. It takes discipline not to do this in a pernicious way, and data scientists from other backgrounds haven’t internalized this as well.
- Awareness of the pitfalls of seemingly simple analysis. Simpson’s Paradox is not just a sidebar in a textbook, it comes up over and over in the real world. Statisticians are often bitten by it, but at least they know what is biting them.” (For the uninitiated, Simpson’s paradox is a phenomenon in probability and statistics in which a trend appears in several different groups of data but disappears or reverses when these groups are combined.)
So what does all this say about the need to rebrand Statistics? As a foundational tool within Data Science, statistical expertise is critical to doing informed, informative research. The data scientist must, however, have more skills in their toolbox including experience with large scale engineering systems and strong interpretive and presentation skills both graphical and verbal. Individuals have grasped the need to expand their skills beyond statistics and understand, too, the need to define themselves correctly to the marketplace, both for clarity and for their own marketability.