Förderjahr 2021 / Projekt Call #16 / ProjektID: 5899 / Projekt: PrototAIp
Our last blog post addressed the technical requirements and software architecture of PrototAIp. We also announced that we started developing the PrototAIp software - which we will now discuss in detail.
We created a web-based interactive programming platform based on Jupyter Notebook. In the final version of PrototAIp the users will access our project through a hub and then each user is assigned one of these notebooks. If you want to try it out and host the solution yourself: It is available on our GitHub page and can also be downloaded as container from DockerHub.
True to our principle "Batteries included", we preload our notebooks with the most important libraries for data science. No waiting time for our users, they can get started straight away. To find out which libraries are the most popular, we systematically searched the internet. We looked for websites and scientific publications that deal with Python libraries especially for data science. In the end we have identified 11 websites (find the full list at the end of the post) and 2 papers (Nagpal & Gabrani, 2019; Stančin & Jović, 2019) covering the topic.
The result was a long list of exciting libraries. In total, we identified 43 libraries. The following list shows the top 10 of the identified libraries ordered by mentions.
-
NumPy: 13
-
Pandas: 13
-
Matplotlib: 12
-
SciKit-Learn: 12
-
TensorFlow: 11
-
SciPy: 10
-
Keras: 9
-
Seaborn: 9
-
Plotly: 7
-
PyTorch: 5
NumPy and Pandas were featured on every website and publication. This is not surprising since they serve as the basis for any kind of processing and analyzing of data in Python. These two libraries are closely followed by Matplotlib and Scikit-Learn with just one mention less. Matplotlib can be defined as the standard library for data visualization in Python (next to Pandas built-in features). Scikit-learn integrates perfectly with all the previously mentioned libraries and is an all in one toolbox for machine learning packed with algorithms for classification, regression and clustering. In the fifth place is TensorFlow, a library for machine learning and specifically for artificial intelligence. This library can be used for various applications, but the main focus lies on deep learning.
The remaining libraries cover a wide range of data science topics. There are other toolboxes for machine learning and data visualization, but also highly specialized libraries. We recommend taking a closer look at all of them to get the best possible overview:
Literature
- Nagpal, A., & Gabrani, G. (2019, February). Python for data analytics, scientific and technical applications. In 2019 Amity international conference on artificial intelligence (AICAI) (pp. 140-145). IEEE.
- Stančin, I., & Jović, A. (2019, May). An overview and comparison of free Python libraries for data mining and big data analysis. In 2019 42nd International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO) (pp. 977-982). IEEE.
Web resources
- https://www.simplilearn.com/top-python-libraries-for-data-science-artic…
- https://www.dataquest.io/blog/15-python-libraries-for-data-science/
- https://analyticsindiamag.com/best-python-libraries-for-data-science-in…
- https://bigdata-madesimple.com/top-20-python-libraries-for-data-science/
- https://www.analyticsvidhya.com/blog/2020/11/top-13-python-libraries-ev…
- https://towardsdatascience.com/top-10-python-libraries-every-data-scien…
- https://www.geeksforgeeks.org/top-10-python-libraries-for-data-science-…
- https://deeptechbytes.com/most-popular-data-science-python-libraries-fo…
- https://databrio.com/blog/10-essential-python-libraries-for-data-scienc…
- https://www.makeuseof.com/data-science-libraries-for-python/
- https://hackr.io/blog/top-data-science-python-libraries