Our BLOG

Doing Data Science Right

Share it
Facebook
Twitter
LinkedIn
Email

Michael Li and Matt Maccaux of O’Reilly.com used two terms that perfectly define the struggle firms are having in building a data science function and monetizing that work:

1. Organizational Maturity
2. Scalability Limitations

As recruiters of data scientists we have many clients interested in doing advanced artificial intelligence or machine learning work before they’ve even organized their data or hired their first data scientist. Expectations of improved efficiency, increased revenue or reduced costs can only happen once the right tools and people are in place.

Data Scientist’s number one complaint is “the data is a mess.” This is not just about missing values or lack of coherency between databases. The availability of seemingly limitless data becomes too overwhelming and complex to understand at first glance even by experienced data scientists. Working with data that has already been aggregated into useful variables is far easier than a table of every action a user has ever taken on a site.

Another complaint, often by senior management, is, “We have a lot of data and we are not doing anything with it.” This can be a problem if the data scientists have to spend all their time organizing the data, processing the data, trying different models and fine tuning the models, versus formulating questions to answer business problems.

Kaylan Veeramachaneni, Principal Research Scientist at MIT, in a Harvard Business Review article  offered four principles for creating true impact from data science:

Simple Models

In their research at MIT they discovered that simple models, like logistic regression or those based on random forests or decision trees, were sufficient to answer most problems. Keeping it simple meant the time between data acquisition and the development of the first predictive model was reduced.

Exploring More Problems

“Instead of exploring one business problem with an incredibly sophisticated machine learning model, companies should be exploring dozens, building a simple predictive model for each one and assessing their value proposition.”

Use Sample Data

“Circumventing the use of massive computing resources, will enable the exploration of more hypotheses.”

Focus on Automation

“To achieve both reduced time to first model and increased rate of exploration, companies must automate processes that are normally done manually. Over and over across different data problems, we found ourselves applying similar data processing techniques…streamline these, and develop algorithms and software systems that do them automatically.”

Anthony Deighton of the Forbes Technology Council said, “The value of big data is not limited to finding answers to your questions. Often it is about finding new questions to ask.” 

Interested in hiring the right person to ask the questions for your Data Science group? Contact Smith Hanley Associates’ Data Science and Analytics Recruiter, Paul Chatlos, at pchatlos@smithhanley.com or 203.319-4304.

Share it
Facebook
Twitter
LinkedIn
Email

Related Posts

Happy Birthday America

Happy 4th of July!

Smith Hanley Associates’ Recruiters Wish You a Happy 4th of July Pharmaceutical Commercial Analytics Eda Zullo – ezullo@smithhanley.com Biostatistics and Clinical Data Management Nihar Parikh – nparikh@smithhanley.com

Read More »
NLP in Content Curation

NLP in Content Curation

In today’s digital era, where information is abundant and attention spans are short, effective content curation has become essential for businesses and individuals alike. Natural

Read More »