The internet started to offer significant bandwidth in the nineties which allowed for the regular use of cloud computing by the general public. Major milestones in cloud computing development include the launch of Salesforce in 1999, Amazon Web Services in 2002, LinkedIn in 2003, Facebook in 2004, Twitter in 2006 and Drop Box in 2008. Google introduced their Cloud in 2009 and Apple’s ICloud came out in 2011. New cloud based applications continue to be developed at a breakneck pace. How has cloud computing changed the way data scientist’s work?
The number of computer and mobile users and their use and creation of data exponentially increased the need for data storage and analysis. The old model of each company having their own servers on site with the IT staff to support data storage, access and processing speed became too expensive for most smaller and mid-size firms to manage. Limited computing resources with limited, experienced workers supporting the computer resources meant slow and often poor performance and security issues.
The advent of cloud computing gave smaller businesses access to the same tools as large corporations. It created a level playing field across all size companies. Many large companies have adopted it as well. Cloud computing provides lower IT infrastructure and computer costs, improved performance, fewer maintenance issues, instant software updates, improved compatibility between operating systems, backup and recovery, performance and scalability, increased storage capacity and increased data safety.
How has cloud computing changed the way data scientist’s work?
- The ability to store massive amounts of data affordably.
- On-demand, elastic capacity available as a resource.
- Easy to access and share software, libraries, codes and models across cross-functional teams.
- Ability to track experiments for comparison and reproduction through shared model metric tracking software.
- Probably the biggest reason it has dramatically impacted a data scientist’s work more recently is through the portability and programming language capability of model deployment to production, particularly in the booming field of machine learning.
A more recent development in cloud computing is DaaS or data as a service. Generic cloud computing services were not initially designed for handling massive data workloads. They were designed to cater to application hosting and basic data storage. Data scientists needed fast data integration, analytics and processing and the cloud computing platforms have responded by standardizing data as a service much like they did with SaaS (software as a service), IaaS (Infrastructure as a service) and PaaS (platform as a service.)