Selecting an Enterprise Platform for Python and Open Source: A Checklist for Buyers

Updated June 30, 2023



Introduction


Marc Andreessen famously opened an August 2011 blog article with this provocative sentence: “Software is eating the world.” His prediction was that software development would disrupt traditional industries. Indeed, companies like Airbnb, Netflix, and Uber emerged as just a few of many winners in the “on-demand” economy that disrupted industries like travel, entertainment, and shopping in significant and lasting ways.

About a year later, in October 2012, Harvard Business Review reported that data scientist was the “sexiest job of the 21st century,” promising professionals who could “coax treasure out of unstructured data.” And the race to structured data began, with organizations taking a closer look at their messy data and finding ways to make it more consumable by machines.

Fast-forward seven years to October 2019, and McKinsey Global Institute offered exciting and cautionary words about the “coming of AI spring.” Their research showed hundreds of business cases that, combined, had the potential to create between $3.5 trillion and $5.8 trillion in value annually. As organizations applied artificial intelligence, they found it could yield outsized business value. Data science capabilities emerged as a prerequisite for high-performing AI, so organizations increased their investments in technologies, data science teams, and techniques like machine learning and deep learning.

In August 2022, Stable Diffusion rocked the visual arts world with its text-to-image model built with Python and deep learning that could generate detailed images based on text prompts. It stoked the world’s fascination with AI and unleashed its next wave: generative AI.

Finally, three months later, in November 2022, OpenAI released another generative model: ChatGPT, a large language model (LLM) that uses training on OpenAI’s GPT-3 and GPT-4 LLMs to generate text based on prompts from the user. In a short time, LLMs have taken many industries by storm, with new products and capabilities that make it possible for programmers to write and debug code alongside a machine and for writers to work with AI to produce content, just to name a couple of a growing number of use cases.

Finding an Enterprise Platform

With all of this progress, it would seem that every organization should somehow incorporate these new technologies into their research, products, and operations. However, applying these techniques to their fullest potential requires a set of fully featured tools, clean and structured data, expert teams, and the power of open-source software, backed by an engaged community of makers and maintainers.

Leveraging the power of open-source software across an enterprise organization requires capabilities for building and deploying secure Python solutions. There are a burgeoning number of options available for stitching together tools that can enable teams to collaborate and build powerful applications with data science and machine learning. But bolting together tools to deploy predictive models into production is not the best way to approach creating a platform that your organization can rely on to deliver excellent outcomes. And building your own can be expensive and complex, because you’ll need to maintain the platform you create.

Finding an enterprise platform that can provide the open-source packages you need, the managed environments that allow you to reproduce and scale models in production, and the security tools to protect your organization from bad code and bad actors can be a tough challenge. That’s what this guide is all about—exploring what to consider when you are selecting an enterprise platform to use with Python and open-source software to achieve your organization’s development goals. 

What makes a platform?

One popular description of a platform comes from Microsoft CEO Bill Gates, as paraphrased by Charmath Palapithiya: “A platform is when the economic value of everybody that uses it, exceeds the value of the company that creates it.” As you evaluate platforms, consider these basic characteristics that will help you leverage the innovation of the community as your teams develop and deploy applications using open source with Python:

  • Number of individual users: The more users, the more opportunities there are to discover new techniques shared by others, identify and address security risks faster, and benefit from a rich community of software makers and maintainers.
  • Number of enterprise users: The more enterprise users, the more the platform has been tested at scale. Number of users may be expressed as a percentage of a total group of organizations or businesses, such as the Fortune 500.
  • Years of experience: The longer an organization has been working to develop their platform, the more expertise their team is likely to have across tools, techniques, and use cases.

Cross-industry customers: The more industries in which a platform has been applied, the more integrations, use cases, and data types the platform and supporting team have likely encountered.

Python, Open Source, Data Science, and the Enterprise

Data science has revolutionized the way businesses operate. Today, it seems that everyone is working with data in some capacity, whether it’s analyzing customer behavior, building predictive models, or creating generative models. As the demand for data-driven insights continues to grow, Python has emerged as the go-to language for data science work.

In fact, Python has long been the gold standard for data science work, thanks in large part to its simplicity and versatility. Unlike other languages, Python allows users to easily manipulate and analyze data, making it an ideal choice for everything from data visualization to machine learning. Additionally, the availability of numerous open-source libraries and frameworks ensures that Python remains a popular choice for data scientists.

Open-source software provides developers access to a global network of contributors who are constantly updating and improving code, making it possible for companies to create applications much faster and more efficiently than ever before. The vast majority (96%) of code bases contain open-source software, according to the Synopys 2023 OSS Risk Analysis Report.

The widespread application of OSS makes sense; open source not only helps companies save on licensing costs, but also allows them to leverage the collective knowledge of the open-source community to create customized solutions to meet their specific business needs. As a result, open source has become an essential strategic tool for organizations looking to stay ahead in the fast-paced world of technology.

However, managing Python development in enterprise organizations has become more complex and difficult over the past few years. This is due in part to the rapid pace of development within the Python community, which has led to the release of new tools and technologies on a regular basis. Some of these tools are proprietary, and some are open source. While this is ultimately a positive development for teams that work with data, it can make it challenging to keep up with the newest techniques and best practices.

Despite these challenges, Python remains one of the most powerful and versatile tools available for data science work. As the industry continues to evolve, Python will remain a critical component of any successful data science team’s toolbox.

Enterprise Python Challenges


At Anaconda, we speak with organizations around the world who are working with Python. We find that most of these teams are experiencing similar challenges, and they are attempting to solve them in similar ways.

1. Package Management and Build Environments

This image features green and white text on a black background. It is titled: Common Python Challenges and includes these three challenges: 1) Package management; 2) Collaboration and deployment; and 3) Governance and security. Each challenge has an icon shown above it that represents the challenge.
Common Python challenges for enterprise teams include package management, collaboration and deployment, and governance and security.

For busy enterprise teams, managing packages and build environments is a significant challenge. Many teams manage packages manually, which has the advantage of giving them control over each package and the customization of environments. However, this is time-consuming and error prone. It also can lead to inconsistent environments and lack of oversight for data protection and governance of resources.

Other teams use proprietary third-party package management tools, which can streamline package management and provide off-the-shelf functionality. However, these tools are not suited to Python workflows. They offer limited customization and force you to rely on vendors to build out the tool to meet your business needs. 

2. Collaboration and Deployment

Project collaboration is an important part of building and scaling great models, so reproducibility is a formidable challenge, especially for large teams. Most teams do this in a fractured way, with models on individual machines, leading to the often-heard phrase among data scientists and data engineers: “It works on my machine.” 

When it comes to deployment, manual processes give you more control over your pipeline but, like manual package management, are time-consuming and prone to errors and scalability issues. Building your own infrastructure for deployment allows you to customize and also gives you more control, but you may see lower return on investment due to high development and maintenance costs.

There are easy-to-use machine learning platforms with off-the-shelf functionality and some support, but these can be highly restrictive compared to open-source software, with limited customization options. They also can be quite expensive.

3. Governance and Securing the Open-Source Pipeline

A trusted source for your open-source packages has never been more important. The March 2023 National Cybersecurity Strategy and frameworks from the National Institute of Standards and Technology (NIST) show that the burden of security is shifting to organizations and individuals who develop software.

Manual security audits can help you meet minimum regulatory requirements and identify some security risks. However, they, too, are time-consuming and resource intensive, and they put your organization in a reactive position. In-house security training can increase awareness and promote good practices, but its effectiveness is limited and it is insufficient on its own.

Third-party scanning tools are often easy to use and, like some machine learning platforms, offer off-the-shelf functionality and some support. However, these tools are not suited to Python workflows, throw a high rate of false positives, and can mishandle compiled packages.

The Top Features to Look for in an Enterprise Python Platform: A Buyer’s Checklist


An enterprise platform should be flexible enough to meet your needs today and powerful enough to withstand the demands of your future workloads and projects. You can use this checklist as you evaluate enterprise platforms for Python and open-source software.

FUNDAMENTAL CAPABILITIES

1. Data Integration

Other vendor(s):
Integration is possibleIntegration is possibleIntegration is not possible
Code repositories (Git, Bitbucket)âś…
Data lake support âś…
Filesystems âś…
Hadoop (Cloudera, Hortonworks, EMR)âś…
IoT/sensor data âś…
Monitoring solutions (log shipping)âś…
NoSQLâś…
Proprietary databases (SAS, Teradata)âś…
SQLâś…
Web data integrationâś…

2. Infrastructure and Hardware

Other vendor(s):
Supported, and air gapped is an optionSupported, and air gapped is an optionSupported but not air gappedNot supported
AWS Sagemakerâś…
Azureâś…
Domino Data Lab MLOpsâś…
Googleâś…
Microsoft Azureâś…
Oracle Cloud Infrastructure (OCI)âś…
Snowpark for Pythonâś…
On premises (VSphere)âś…
On premises (bare metal)âś…
Air gappedâś…
GPU and CPU supportâś…

3. Machine Learning Capabilities 

Other vendor(s):
SupportedSupportedNot supported
Classification & regressionâś…
Deep learningâś…
Generative adversarial networks (GANs)âś…
Pre-trained large language models (LLMs)âś…
Reinforcement learningâś…
Support vector machines (SVMs)âś…
Testing strategies (A/B, multi-armed bandit, sensitivity analysis)âś…
Text-to-image modelsâś…
Text & image analytics and processingâś…
Time-series analysisâś…

4. Collaboration and Deployment

Other vendor(s):
AvailableAvailableNot available
Centralized project hubâś…
Deploy with one clickâś…
Deploy REST APIâś…
Deploy webappâś…
Governance controls for collaboration and deploymentâś…
Job scheduler / automationâś…
Version controlâś…
Visualizations and dashboardsâś…

5. Support

Other vendor(s):
AvailableAvailableNot available
Dedicated support contactsâś…
Guaranteed uptime SLAâś…
Access tokensâś…
Advanced troubleshooting supportâś…
Assistance with Anaconda package managementâś…
Custom conda package buildsâś…
Custom installer buildsâś…
Environment management issuesâś…
Learning: Live and on-demandâś…
Repository access during high demandâś…
Severity response: Level 112 hours, standard1 hour, premium
Severity response: Level 224 hours, standard12 hours, premium
Technical supportâś…

6. Security and Governance

Other vendor(s):
IncludedIncludedCan integrateNot possible
Administrative monitoring (track users, projects, deployments)âś…
Audit logsâś…
Cloud-native security controlsâś…
Disaster recoveryâś…
End-to-end encryptionâś…
Package signature verificationâś…
Publishing permissionsâś…
Role-based user access controlsâś…
Scanning for common vulnerabilities and exposures (CVEs)âś…
Secure package repositoryâś…
Software bill of materials (SBOM)âś…

COLLABORATION AND TOOLS

1. Notebooks and Integrated Development Environments (IDEs)

Other vendor(s):
PurposeIncludedIncludedNot included
Jupyter NotebookCreating and sharing computational documentsâś…
JupyterLabWeb-based interface for Juypyterâś…
PyCharmIDE for programming in Pythonâś…
RStudioIDE tools for Python and Râś…
SpyderScientific Python development environment for scientific programmingâś…
Visual Studio Code (VS Code)Source-code editor for debugging, snippets, code refactoring, and moreâś…

2. Data Visualization Capabilities

Other vendor(s):
SupportedSupportedNot supported
Allows users to choose their favorite plotting library (e.g., Bokeh, hvPlot, Matplotlib, Plotly)âś…
Supports fully interactive visualizationsâś…
Supports visualizing very large (i.e., petabyte) datasetsâś…
Supports visualization in Jupyter or as stand-alone applicationsâś…

3. Data Science and Machine Learning Libraries

Anaconda gives you access to thousands of libraries. We name just a few of the most common libraries below to help you compare your options.

Other vendor(s):
PurposeAccessibleAccessibleNot accessible
DaskParallel and distributed computingâś…
DjangoPython web framework for designâś…
FlaskModel deploymentâś…
KerasDeep-learning framework (API for TensorFlow)âś…
KubeflowML workflows on Kubernetesâś…
MLflowExperiment trackingâś…
NumPyMathematical operations on arraysâś…
PandasWork with data sets—analyzing, cleaning, exploring, and manipulating data✅
ProphetTime-series forecasting in Pythonâś…
PyTorchDevelop and train deep learning modelsâś…
SciPyScientific and technical computing (built on NumPy)âś…
Scikit-learnML library for classification, regression, and clustering algorithmsâś…
TensorFlowDevelop and train ML modelsâś…
TheanoMathematical expressions involving multi-dimensional arrays (built on NumPy)âś…
XGBoostDistributed gradient boosting libraryâś…

4. Model Deployment and Management

Other vendor(s):
YesYesNo
Deployment from QAâś…
Deployment to productionâś…
One-click deployment to pre-provisioned resourcesâś…
Refine models in productionâś…
Reproducibility—rollback to older models✅
Centralized administration of deployed appsâś…

Anaconda’s Platform Makes Innovation Possible


For more than a decade, industry leaders have been using Anaconda’s platform to build some of the world’s most innovative predictions, products, and experiences. Data science and machine learning teams count on our trusted packages and capabilities to centralize open-source software access and empower consistent, reproducible workflows.

This image features white and green text on a black background and lists Anaconda's platform capabilities that address three major challenges when working with Python in the enterprise. 1) Build: Provide trusted packages, centralize your distribution process, and enable consistent, reproducible environments. 2) Deploy: Enable project collaboration, centralize workflows, and make deployment easy. 3) Secure: Enable IT visibility and access controls, and keep license and vulnerability risks out of your software supply chain.
Anaconda’s platform has capabilities that address three major challenges when working with Python in the enterprise.

Enterprise practitioners use our platform to collaborate across users and teams, centralize workflows for better reproducibility and scalability, and deploy models into production with just one click.

IT administrators and security teams choose Anaconda because it is the only platform in the Python ecosystem with access to thousands of packages that—unlike those from community package providers—are privately hosted, built from source, and free from malicious packages.

Finally, you can deploy Anaconda in the cloud or on premises—with private cloud, managed hosting, and air-gapped options—making Anaconda the platform of choice for those working in highly regulated industries and/or with sensitive or protected data. With Anaconda, peace of mind evolves from fantasy to reality. 

Ready to learn more about how Anaconda can help your teams build and deploy secure Python solutions, faster? Book time with one of our experts to discuss your organization’s requirements.

Reviewed and maintained by:

Christian Capdeville, Director of Product Marketing, Anaconda
Saundra Monroe, Director of Product Management, Anaconda
Jim Bednar, Director of Custom Services, Anaconda