Unarguably in today’s hyper-competitive marketplace, data science plays an indispensable role for organizations that seek to personalize experiences and create value from their data. Analyzing large amounts of unstructured data without preset rules for scoping the analysis would have been considered a sublime concept until recently. Now, and in the future, the application of data science to understand unique user needs and desires will form the key basis of competitive advantages. This will unleash new waves of innovation and productivity for businesses. To enable data-driven transformation, organizations must construct the right ecosystem and put proper enablers in place. A few years ago, Harvard Business Review proclaimed, presciently:
“Data Scientist is the sexiest job of the 21st century”
No wonder the obsession with data science and the role of the data scientist in both the general business and technical media. So, what exactly is data science and why all the hype around data scientists.
Frankly speaking, multiple job descriptions and explanations of the same role make it harder for businesses to clearly understand what a data scientist is and does. This complicates the ROI business leaders expect when investing in them.
To me, data Science involves mining actionable and sensible insights from multiple data formats by applying mathematics, statistics, machine learning, etc. Data scientists typically analyze data sets, or data depositories that are maintained within an organization and/or they analyze data scraped from publicly available sources. They look at business challenges both upstream and downstream of the value chain. They’re armed with relevant statistical models to help analyze voluminous data stacks to optimize decision making. They’ve mastered the art of rigorous hypothesis testing and experimental design best practices. Leading data scientists across the globe have dirtied their hands with real-world data to draw actionable insights.
“140,000 to 190,000 unfilled data scientist positions by 2018″ – McKinsey
Data science is said to be a “science” for good reason — it needs all the diligence, perseverance and intellect required of a scientist. It also requires the orchestration of talent, tools and techniques. Having said that, it’s not rocket science. A data scientist is one who’s able to cut through thorny business and technical complexities and provide a clear, effective way for business leaders to deploy insights for value generation.
Having all these skills in one brain is what makes a professional data scientist such a hot commodity.
For years, CMOs, CSOs and other business leaders and IT organizations have engaged in a pseudo cold war. On the one hand, IT half struggles to keep up with round-the-clock technological shifts, an inability to communicate true value to the business, a persistent skills shortage, and many other unforeseen challenges. On the other, business leaders must maintain focus on achieving business goals while being constrained by time, budget and market factors. Data scientists, in many ways, bridge the differences by extracting data from one side and delivering relevant, contextualized insights to the other.
“Expect a shortage of over 100,000 data scientists by 2020” – Gartner, 2012
A data modeler/analyst and a data scientist are not nearly the same thing
A modeler typically has a defined scope and data that they have acquired, are analyzing and are working with. Typical terminology associated with this work is linear regression, logistic regression, known distributions, confidence intervals, predictor variables and goodness of fit. On the contrary, data scientists are primarily driven by an essential tendency of human nature, our insatiable curiosity and the need to find answers to the hardest of problems. Data scientists are inquisitive, have a knack for asking questions that may be not so intuitive to the business at first go, do extensive “what if” analysis, question the existing underlying assumptions and business-as-usual processes. Armed with data and analytical results, a top-tier data scientist will then communicate informed conclusions and recommendations. For them the world produces data in a black box (associated generally with algorithmic modeling) and often their vocabulary has machine learning, AI, neural networks, random forests, SVM, unknown multivariate distributions, iterative analysis, predictive accuracy, etc.
A data analyst may view data in silo from a defined source (e.g., a CRM survey, etc.). A data scientist typically operates differently and primarily examines data from numerous disparate sources. They are expected to sniff all the incoming data with the intent of uncovering some hidden insight that may add tremendous value to the business. A data scientist looks at the data from a different eye, looks to provide insights beyond the usual reporting and contextualizes the insights in a form that business users may see exceptionally applicable. In a nutshell, they possess strong business acumen, communicate well with both the IT and business side, simplify complex concepts into understandable information nuggets, are experts with analytics and modeling large volumes of data, and are seen as part analyst, part artist.
A typical day in the life of a data scientist involves performing historical data review and preparation (missing value estimation, outlier detection, descriptive statistics) followed by data segregation (training and validation set) and variable selection (checking multi-collinearity, selecting important variables). Post-data massaging, the next key step is to build predictive algorithms (logistic regression/random forest/decision trees/K-means clustering/sequence mining/text analytics) and review results to refine the model (model diagnostics review). Model finalization is the last piece of the puzzle, post which usual questions pertaining to propensity to buy, customer churn, channel optimization, customer lifetime value, etc. can be answered.
Let’s look at an example of how a data scientist could add value beyond visible boundaries and across the value chain of any business. Imagine an online retailer that wants to build a recommendation engine that renders a new customer experience, and promotes specific products based on current trends, browsing behavior, past purchase history and sentiment analysis. A typical solution is expected to increase both the conversion ratio and the average basket size. A data scientist in most cases would go beyond his usual role firstly to explore data available in the public domain as well and may also deliver insights on the supply or the procurement side, how to avoid inventory stockouts, identify the right pricing strategies for a certain segment of customers, placement of certain SKUs, etc. He may come up with specifics around how the retailer can identify previously undiscovered products for cross-selling opportunities. Or, he may look for new revenue streams to offset the decline in revenues from certain channels. From a vendor management standpoint, it might be, which vendors can service a given online order with minimal turnaround time and optimal costing (based on the order delivery committed to the customer in next three hours, one to two days, or three to four days).
Such waves of change have just begun to ripple out. There are innumerable exciting applications of Data Science, limited only by our imagination. Eager to hear your thoughts on this fascinating topic- do comment below, or write to me.