On a monthly basis, I will be rounding up key news associated with the Data Science Platforms space for Amalgam Insights. Companies covered will include: Alteryx, Anaconda, Cambridge Semantics, Cloudera, Databricks, Dataiku, DataRobot, Datawatch, Domino, Elastic, H2O.ai, IBM, Immuta, Informatica, KNIME, MathWorks, Microsoft, Oracle, Paxata, RapidMiner, SAP, SAS, Tableau, Talend, Teradata, TIBCO, Trifacta, TROVE.
Dataiku announced the release of Dataiku 5. New capabilities include full containerization with Docker and Kubernetes, Keras support for deep learning, and enhanced documentation and collaboration features. Being able to deploy containerized Docker images to Kubernetes provides more flexibility in compute resource usage, permitting easier scaling up and down of said resources. Keras support extends even to users unfamiliar with constructing deep learning models through example “recipes” provided by Dataiku. Finally, adding documentation infrastructure to AI projects in the form of discussion boards and wikis provides needed human-friendly natural-language context around said projects.
H2O.ai announced the availability of a new version of Driverless AI that includes Natural Language Processing (NLP) capabilities. Typically, unstructured text blocks need to be converted into a more structured form for analysis. The new NLP capabilities in Driverless AI no longer require such conversion, opening up the ability to use Driverless AI for use cases such as sentiment analysis and document classification. H2O.ai has also integrated NLP with TensorFlow, enhancing Driverless AI’s deep learning capabilities.
IBM debuted Trust and Transparency, a software service that provides visibility into the decisions made by AI systems. The service explains how a given model determined its recommendations and the “why” behind that assessment process, allowing for a better understanding of the reasoning. It also detects bias in outcomes, and provides recommendations for how to mitigate that bias with data. Trust and Transparency works with models from a wide variety of machine learning frameworks, and is available on the IBM Cloud. Amalgam Insights’ Hyoun Park wrote more about why IBM’s Trust and Transparency capabilities matter.
MathWorks Expands Deep Learning Capabilities in Release 2018b of the MATLAB and Simulink Product Families
MathWorks announced Release 2018b of its MATLAB and Simulink product families. Among the new features are critical improvements to deep learning capabilities in the form of the Deep Learning Toolbox, a framework to design and implement deep neural networks. This new framework simplifies the design process for deep learning models in use cases such as image processing and computer vision. The Deep Learning Toolbox replaces the earlier Neural Network Toolbox. MATLAB also has a new ONNX converter that allows users to import and export models from supported frameworks such as PyTorch, MxNet, and TensorFlow, and additional import capabilities for Caffe and Keras-Tensorflow models.
Among the numerous product updates announced by Microsoft at Ignite 2018 was the addition of automated machine learning to Azure Machine Learning. Microsoft’s Automated ML focuses on automatically determining the best machine learning pipeline for a given dataset, aimed at optimizing model performance and preventing users from needing to spend time manually determining the most efficient algorithm to use. Automated ML is currently in preview, available through the Azure Machine Learning service; Microsoft is also working on availability of Automated ML in PowerBI.
SAP announced the latest version of SAP Analytics Cloud. The new version provides data analysts with additional machine-learning-powered features such as anomaly detection, risk and correlation detection, and automatic dashboard creation. In addition, time-series forecasting is now in beta.
TIBCO announced that its new data science offering, TIBCO Data Science, will be available exclusively on the Amazon Web Services Marketplace. Data preparation and machine learning computations can be executed directly in native AWS resources such as EMR and Redshift, as well as Hadoop and Spark clusters and databases. TIBCO Data Science is aimed at the “citizen data scientist,” with emphasis on code-free workflows and support for Jupyter notebooks.