Data Science and Machine Learning News Roundup, March 2019
On a monthly basis, I will be rounding up key news associated with the Data Science Platforms space for Amalgam Insights. Companies covered will include: Alteryx, Amazon, Anaconda, Cambridge Semantics, Cloudera, Databricks, Dataiku, DataRobot, Datawatch, Domino, Elastic, Google, H2O.ai, IBM, Immuta, Informatica, KNIME, MathWorks, Microsoft, Oracle, Paxata, RapidMiner, SAP, SAS, Tableau, Talend, Teradata, TIBCO, Trifacta, TROVE.
Dataiku released version 5.1 of their software platform. This includes a GDPR framework for governance and control, as well as user-experience upgrades such as the ability to copy and reuse analytic workflows in new projects, coders being able to use their preferred development environment from within Dataiku, and easier navigation of complex analytics projects where data sources may number in the hundreds.
Being able to document when sensitive data is being used and prevent inappropriate use of such data is key for companies trying to work within GDPR and similar laws and not lose significant funds to violations of these laws. Dataiku’s inclusion of a governance component within its data science platform distinguishes it from its competitors, many of whom lack such a component natively, and enhances Dataiku’s attractiveness as a data science platform.
Domino Data Lab Platform Enhancements Improve Productivity of Data Science Teams Across the Entire Model Lifecycle
Domino announced three new capabilities for their data science platform. Datasets is a high-performance data store that will make it easier for data scientists to find, share, and reuse large data resources across multiple projects, saving time in the search process. Experiment Manager gives data science teams a system of record for ongoing experiments, making it easier to avoid unnecessary duplicate work. Activity Feed provides this type of information for data science leads to understand changes in any given project when they may be tracking multiple projects at once. Together, these three collaboration capabilities enhance Domino users’ ability to do data science in a documented, repeatable, and mature fashion.
SAS announced a $1B investment in AI across three key areas: Research and Development, education initiatives, and a Center of Excellence. The goal is to to enable SAS users to use AI to some degree even without a significant baseline of AI skills, to help SAS users improve their baseline AI skills through training, and to help organizations using SAS to bring AI projects into production more quickly with the help of AI experts as consultants. A significant percent of SAS users aren’t currently using SAS to perform complex machine learning and artificial intelligence tasks; helping these users to get actual SAS-based AI projects into production enhances SAS’ ability to sell its AI software.
- H2O.ai Accelerates Automatic Machine Learning with New NVIDIA-Powered Data Science Workstations and NVIDIA RAPIDs
- SAS partners with NVIDIA on deep learning and computer vision
H2O.ai and SAS both announced partnerships with NVIDIA this month. H2O.ai’s Driverless AI and H2O4GPU are now optimized for NVIDIA’s Data Science Workstations, and NVIDIA RAPIDS will be integrated into H2O as well. SAS disclosed future plans to expand NVIDIA GPU support across SAS Viya, and plan to use these GPUs and the CUDA-X AI acceleration library to support SAS’ AI software. Both H2O.ai and SAS are using NVIDIA’s GPUs and CUDA-X to make certain types of machine learning algorithms operate more quickly and efficiently.
These follow prior announcements about NVIDIA partnerships with IBM, Oracle, Anaconda, and MathWorks, reflecting NVIDIA’s importance in machine learning. With NVIDIA GPUs making up an estimated 70% of the world market share, data science and machine learning software programs and platforms need to be able to work well on the de facto default GPU.