Substantial and impactful open-source innovation is at the heart of Anaconda’s efforts to provide tooling for developing and deploying secure Python solutions, faster. With the goal of capturing and communicating our teams’ many ongoing contributions to a wide variety of open-source projects, we are now providing regular roundups of related news items on our blog.
Highlights by Dev Group
Anaconda has many different teams working on open source, and each performs a wide variety of tasks. Below I will cover some of our core efforts and recent milestones. Please note that the split into bullets is merely for readability; in practice, many of us work across these divisions.
Dask and Data Access
Find, read, and write data in any format or location → here
- Dask-awkward was released for the first time late last year, bringing together big data parallel distributed processing and the convenience of awkward array processing of irregular, variable-length data. Since then, we have been working hard on optimization, so that we need not load data into memory that is not actually necessary for performing the operations requested. This will save a lot of processing time and resources as you churn through terabytes of JSON or Parquet nested data structures.
- Intake has been undergoing a refresh of its documentation and GUI, in preparation for new publicly curated catalogs on Anaconda Nucleus and a richer plugin ecosystem. This is in advance of our proposed tutorial at the SciPy conference in Austin this July, where we would crowd-source interesting datasets and drivers.
- fastparquet received a big boost to writing performance, where previous optimization efforts have been mostly focused on read.
- Python-graphblas reflects ongoing work in the graph community, and we are involved in a supporting capacity. Recent efforts entail the speeding up of NetworkX and a binary sparse standard for storing graphs efficiently.
Jupyter
In-browser interactive development environment for Python and other languages → here
- JupyterCon 2023 is coming up, and our team will attend. We’ll be giving a talk about the past, present, and future of the Jupyter Notebook. Don’t miss it!
- We’ve been working on helping to test and update JupyterLab extensions to confirm compatibility with the upcoming JupyterLab 4.
- The “classic” Jupyter Notebook is displaying a banner directing users to learn more about the upcoming release of Notebook version 7 (based on JupyterLab components), which may have an impact on users with custom extensions to Notebook version 6. Learn more in the migration guide.
HoloViz
Interactive visualization and plotting for PyData → here
- Panel is sprinting towards its 1.0 release. Highlights include:
- Significant performance improvements, thanks to Bokeh 3
- Virtuous feedback loop: Panel benefits greatly from the new Bokeh, and Bokeh benefits from bug reporting by Panel maintainers, as bugs can promptly be fixed by our Mateusz Paprocki
- Aging and somewhat messy user guides are being turned into more self-contained, well-organized how-to guides and background pages
- A new design system that can be applied independently of templates
- HoloViews: Simon Hansen has assumed the role of lead maintainer, and work continues on two major new features:
- A general concept and API for annotating data
- Support for multiple y-axes
- Datashader is improving its inspection capabilities by adding new reduction functions, which will ultimately benefit all HoloViews inspection tools.
- Lumen’s first official release (0.5.0) took place on the 14th of December, 2022 and featured a complete documentation rewrite based on the Diátaxis documentation framework.
Conda
The Python (and everything) package manager → here
- We will soon be releasing conda-project, a tool for encapsulating, running, and reproducing projects with conda environments. It allows you to package code, the environment that it needs, and expected commands together with a simple prescription, making distributed workflows easier.
- Conda itself was released twice this quarter, with a long list of improvements and fixes.
BeeWare
Run Python apps on any platform, including mobile → here
- Toga 0.3.0 (the GUI toolkit) is finally out, the first “non-dev” release in a long while. It incorporates new testing frameworks on macOS, iOS, and GTK, and much better coverage on Android.
- There’s been a new release of Chaquopy (deployment on Android), with much improved build times.
- There’s been a new release of Briefcase (top-level distribution management), with many fixes, speed improvements, and a new “testing mode.”
- View the latest roadmap here.
Numba
Accelerate your Python numerical code with JIT compilation → here
- A large refactor and upgrade of Numba is proposed and discussed here, with talk of showcasing the new architecture by the end of the year. The compiler toolkit will obtain new components to aid in the continuing maintenance and development of the package relative to Python and compiler versions, which has become necessary due to the increased frequency of major Python releases.
- Support for Python 3.11, LLVM 14, and NumPy 1.24 have all been merged to the main development branch in preparation for a near-term release candidate.
Stay in Touch!
We appreciate any feedback on whether a more frequently updated Anaconda OSS newspage would be useful or interesting to the community, so please don’t hesitate to to get in touch with us via your preferred project channel, social media, or Anaconda Nucleus.
You may also be interested in some of our other recent software activities, covered on our blog:
See you next quarter!
About the Author
Martin Durant is a former astrophysicist with several years of scientific research experience. He has also worked in medical imaging, building AI/ML pipelines and a research platform. After a brief stint as a data scientist in ad-tech, Martin moved to Anaconda to work on PyData education. He now leads a number of open-source PyData projects, focussing on data access, formats, and parallel processing.