Data Science Platforms News Roundup, June 2018
On a monthly basis, I will be rounding up key news associated with the Data Science Platforms space for Amalgam Insights. Companies covered will include: Alteryx, Anaconda, Cloudera, Databricks, Dataiku, Datawatch, Domino, H2O.ai, IBM, Immuta, Informatica, KNIME, MathWorks, Microsoft, Oracle, Paxata, RapidMiner, SAP, SAS, Tableau, Talend, Teradata, TIBCO, Trifacta.
At Spark + AI, Databricks announced new capabilities for its Unified Analytics Platform, including MLFlow and Databricks Runtime for ML. MLFlow is an open source multi-cloud framework intended to standardize and simplify machine learning workflows to ensure machine learning gets put into production. Databricks Runtime for ML scales deep learning with new GPU support for AWS and Azure, along with preconfigured environments for the most popular machine learning frameworks such as scikit-learn, Tensorflow, Keras, and XGBoost. The net result of these new capabilities is that Databricks users will be able to get their machine learning work done faster.
H2O.ai and IBM announced a partnership in early June that permits the use of H2O’s Driverless AI on IBM PowerSystems, Driverless AI has automated machine learning capabilities, while PowerAI is a machine learning and deep learning toolkit, and the combination will permit significantly faster processing overall. This builds on the pre-existing integration H2O’s open source libraries into IBM’s Data Science Experience analytics solution, though when this announcement was made, IBM had not yet debuted PowerAI Enterprise, so the availability of Driverless AI on PowerAI Enterprise remains TBD.
This month, IBM announced the release of PowerAI Enterprise, which runs on IBM PowerSystems. It’s an expansion of IBM’s PowerAI applied AI offering that extends its coverage to include the entire data science workflow. IBM continues to cover its bases by diversifying its data science offerings, adding PowerAI to their existing Data Science Experience and Watson Studio offerings, but this also creates confusion as companies seek to determine which data science platform product suits their needs. We look forward to covering and clarifying this in greater detail.
At Alteryx Inspire, Alteryx announced the latest release (2018.2) of the Alteryx Analytics Platform, with improvements such as making common analytic tasks even easier via templates, extending community search across the entire platform, and enhanced onboarding for new users. I detail the new features in my earlier post, Alter(yx)ing Everything at Inspire 2018; the upshot is that Alteryx continues to focus on ease of use for analytics end users.
Anaconda released Dask, a new Python-based tool for processing large datasets. Python libraries like NumPy, pandas, and scikit-learn are designed to work with data in-memory on a single core; Dask will let data scientists process large datasets in parallel, even on a single computer, without needing to use Spark or another distributed computing framework. This expedites machine learning workflows on large datasets in Python, with the added convenience of being able to remain in your Python work environment.
Finally, I’m also working on a Vendor SmartList for the Data Science Platforms space this summer. If you’d like to learn more about this research initiative, or set up a briefing with Amalgam Insights for potential inclusion, please email me at firstname.lastname@example.org.