Asked any data scientist why the software named after Monty Python’s Flying Circus, Python, has become the industry standard and they will answer because of its extensive libraries. There are 70,000 libraries in the Package Index and NumPy, SciPy and Pandas lead the way in usage. But there are a number of other reasons why Python has become the defacto data science software.
Wait a Minute! Defacto?
According to ZDNet.com in April of 2019 there were 8.2 million developers utilizing Python versus 7 million in September of 2018. That is 17% growth in 8 months. ZDNet.com goes on to say 69% of all machine learning developers and data scientists use Python versus 24% using R. A survey by TechRepublic.com found this software to be the “fastest growing major programming language.” Julia Silge, Data Scientist at Stack Overflow, said, “We have not seen a technology that large grow so fast ever in the history of Stack Overflow.”
Fintan Ryan, Gartner Research Director, said Python, “offered a very clean syntax. You could enforce this in other languages, but Python enforced it automatically.” With such a concise and expressive language, coding requires less time, effort and lines of code to perform the same operation as other software. While Python processes slower than Java, its lines of code are 3 to 5 times shorter than Java. It is easy to learn because it has English-like commands that make the code readable. Barry Warsaw, Developer and Steering Council Member, says, “Python came on the scene. I was like ‘Wow, this is making programming fun again.’” Simple to use and happy developers means a more productive workforce.
Extensive Standard Library
Python has been around for 28 years supporting multiple paradigms including functional programming, object oriented programming, structured programming and procedural programming. With 70,000 libraries in the Page Index there is a rich set of frameworks available. Take, for example, the NumPy library. NymPy is widely used and is derived from the first array object built for Python, the library Numeric. Jim Hugunin, a software programmer who wrote Numeric ended up doing so because, “Writing code in Python felt like writing the sort of natural informal code that developers would use when they wanted to quickly share ideas. It was executable pseudo-code.” The development of Numeric/NumPy became a first mover advantage – if it is the first entrant it garners a competitive advantage through the control of resources.
Large, Engaged Community
Open source has led to the growth of a large, dedicated and insightful user community. TechRepublic.com said it is the second most loved language or the language developers most enjoy working with and want to continue to use. This dedicated user community has made Python a more stable product. There are regular updates and releases to improve and expand the software. This maintenance is successful through the dedication and participation of a large user community.
The concern most cited by users is the lack of support for mobile and new web platforms. “We don’t have a story about how you can use Python on these devices,” says Russell Keith-Magee, BeeWare co-founder. “What happens to Python when laptops are niche devices?”
Hakon Hapnes Strand, a data scientist writing on Quora, details the most important Python skills to have for data analysis and machine learning, in descending order of importance:
- Good understanding of the built-in data types. In particular, lists, dictionaries, tuples and sets.
- Mastery of N-dimensional NumPy arrays.
- Mastery of Pandas dataframes.
- Ability to perform element-wise vector and matrix operations on NumPy arrays.
- Knowing to use the Anaconda distribution and the conda package manager.
- Familiarity with scikit-learn.
- Ability to write efficient list comprehensions instead of traditional for loops.
- Ability to write small, clean functions.
- Knowing how to profile the performance of a script and how to optimize bottlenecks.
Interested in a career as a data scientist? Contact the Data Science and Analytics Recruiters at Smith Hanley Associates.