Four Key Announcements from H2O World San Francisco

Last week at H2O World San Francisco, H2O.ai announced a number of improvements to Driverless AI, H2O, Sparkling Water, and AutoML, as well as several new partnerships for Driverless AI. The improvements provide incremental improvements across the platform, while the partnerships reflect H2O.ai expanding their audience and capabilities. This piece is intended to provide guidance to data analysts, data scientists, and analytic professionals working on including machine learning in their workflows.

Announcements

H2O.ai has integrated H2O Driverless AI with Alteryx Designer; the connector is available for download in the Alteryx Analytics Gallery. This will permit Alteryx users to implement more advanced and automatic machine learning algorithms into analytic workflows in Designer, as well as doing automatic feature engineering for their machine learning models. In addition, Driverless AI models can be deployed to Alteryx Promote for model management and monitoring, reducing time to deployment. Both of these new capabilities provide Alteryx-using business analysts and citizen data scientists more direct and expanded access to machine learning via H2O.ai.

H2O.ai is integrating Kx’s time-series database, kdb+, into Driverless AI. This will extend Driverless AI’s ability to process large datasets, resulting in faster identification of more performant predictive capabilities and machine learning models. Kx users will be able to perform feature engineering for machine learning models on their time series datasets within Driverless AI, and create time-series specific queries.

H2O.ai also announced a collaboration with Intel that will focus on accelerating H2O.ai technology on Intel platforms, including the Intel Xeon Scalable processor and H2O.ai’s implementation of XGBoost. Driverless AI on Intel, globally.  Accelerating H2O on Intel will help establish Intel’s credibility in machine learning and artificial intelligence for heavy compute loads. Other aspects of this collaboration will include expanding the reach of data science and machine learning by supporting efforts to integrate AI into analytics workflows and using Intel’s AI Academy to teach relevant skills. The details of the technical projects will remain under wraps until spring.

Finally, H2O.ai announced numerous improvements to both Driverless AI and their open-source H2O, Sparkling Water, and AutoML, mostly focused on expanding support for more algorithms and heavier workloads among their product suite. Among the improvements that caught my eye was the new ability to inspect trees thoroughly for all of the tree-based algorithms that the open-source H2O platform supports. With concern about “black-box” models and lack of insight around how a given model performs its analysis and why it yields the results it does for any given experiment, providing an API for tree inspection is a practical step towards making the logic behind model performance and output more transparent for at least some machine learning models.

Recommendations

Alteryx users seeking to implement machine learning models into analytic workflows should take advantage of increased access to H2O Driverless AI. Providing more machine learning capabilities to business analysts and citizen data scientists enhances the capabilities available to their data analytics workflows; Driverless AI’s existing AutoDoc capability will be particularly useful for ensuring Alteryx users understand the results of the more advanced techniques they now have access to.

If your organization collects time-series data but has not yet pursued analytics of this data with machine learning yet, consider trialing KX’s kdb+ and H2O’s Driverless AI. With this integration, Driverless AI will be able to quickly and automatically process time series data stored in kdb+, allowing swift identification of performant models and predictive capabilities.

If your organization is considering making significant investments in heavy-duty computing assets for heavy machine learning loads in the medium-term future, keep an eye on the work Intel will be doing to design chips for specific types of machine learning workloads. NVIDIA has its GPUs and Google its TPUs; by partnering with H2O, Intel is declaring its intentions to remain relevant in this market.

If your organization is concerned about the effects of “black box” machine learning models, the ability to inspect tree-based models in H2O, along with the AutoDoc functionality in Driverless AI, are starting to make the logic behind machine learning models in H2O more transparent. This new ability to inspect tree-based algorithms is a key step towards more thorough governance surrounding the results of machine learning endeavors.

Data Science and Machine Learning News, November 2018

On a monthly basis, I will be rounding up key news associated with the Data Science Platforms space for Amalgam Insights. Companies covered will include: Alteryx, Amazon, Anaconda, Cambridge Semantics, Cloudera, Databricks, Dataiku, DataRobot, Datawatch, DominoElastic, H2O.ai, IBM, Immuta, Informatica, KNIME, MathWorks, Microsoft, Oracle, Paxata, RapidMiner, SAP, SAS, SnapLogic, Tableau, Talend, Teradata, TIBCO, Trifacta, TROVE.

Continue reading “Data Science and Machine Learning News, November 2018”

Data Science Platforms News Roundup, September 2018

On a monthly basis, I will be rounding up key news associated with the Data Science Platforms space for Amalgam Insights. Companies covered will include: Alteryx, Anaconda, Cambridge Semantics, Cloudera, Databricks, Dataiku, DataRobot, Datawatch, DominoElastic, H2O.ai, IBM, Immuta, Informatica, KNIME, MathWorks, Microsoft, Oracle, Paxata, RapidMiner, SAP, SAS, Tableau, Talend, Teradata, TIBCO, Trifacta, TROVE.

Please register or log into your Free Amalgam Insights Community account to read more.
Log In Register

Data Science Platforms News Roundup, July 2018

On a monthly basis, I will be rounding up key news associated with the Data Science Platforms space for Amalgam Insights. Companies covered will include: Alteryx, Anaconda, Cloudera, Databricks, Dataiku, DataRobotDatawatch, Domino, H2O.ai, IBM, Immuta, Informatica, KNIME, MathWorks, Microsoft, Oracle, Paxata, RapidMiner, SAP, SAS, Tableau, Talend, Teradata, TIBCO, Trifacta.

Please register or log into your Free Amalgam Insights Community account to read more.
Log In Register

Data Science Platforms News Roundup, June 2018

On a monthly basis, I will be rounding up key news associated with the Data Science Platforms space for Amalgam Insights. Companies covered will include:

Please register or log into your Free Amalgam Insights Community account to read more.
Log In Register

Market Milestone: Oracle Builds Data Science Gravity By Purchasing DataScience.com

Bridging the Gap

Industry: Data Science Platforms

Key Stakeholders: IT managers, data scientists, data analysts, database administrators, application developers, enterprise statisticians, machine learning directors and managers, current DataScience.com customers, current Oracle customers

Why It Matters: Oracle released a number of AI tools in Q4 2017, but until now, it lacked a data science platform to support complete data science workflows. With this acquisition, Oracle now has an end-to-end platform to manage these workflows and support collaboration among teams of data scientists and business users, and it joins other major enterprise software companies in being able to operationalize data science.

Top Takeaways: Oracle acquired DataScience.com to retain customers with data science needs in-house rather than risk losing their data science-based business to competitors. However, Oracle has not yet not defined a timeline for rolling out the unified data science platform, or its future availability on the Oracle Cloud.

Oracle Acquires DataScience.com

On May 16, 2018, Oracle announced that it had agreed to acquire DataScience.com, an enterprise data science platform that Oracle expects to add to the Oracle Cloud environment. With Oracle’s debut of a number of AI tools last fall, this latest acquisition telegraphs Oracle’s intent to expedite its entrance into the data science platform market by buying its way in.

Oracle is reviewing DataScience.com’s existing product roadmap and will supply guidance in the future, but they mean to provide a single unified data science platform in concert with Oracle Cloud Infrastructure and its existing SaaS and PaaS offerings, empowering customers with a broader suite of machine learning tools and a complete workflow.

Please register or log into your Free Amalgam Insights Community account to read more.
Log In Register

Lynne Baer: Clarifying Data Science Platforms for Business

Word cloud of data science software and terms

My name is Lynne Baer, and I’ll be covering the world of data science software for Amalgam Insights. I’ll investigate data science platforms and apps to solve the puzzle of getting the right tools to the right people and organizations.

“Data science” is on the tip of every executive’s tongue right now. The idea that new business initiatives (and improvements to existing ones) can be found in the data a company is already collecting is compelling. Perhaps your organization has already dipped its toes in the data discovery and analysis waters – your employees may be managing your company’s data in Informatica, or performing statistical analysis in Statistica, or experimenting with Tableau to transform data into visualizations.

But what is a Data Science Platform? Right now, if you’re looking to buy software for your company to do data science-related tasks, it’s difficult to know which applications will actually suit your needs. Do you already have a data workflow you’d like to build on, or are you looking to the structure of an end-to-end platform to set your data science initiative up for success? How do you coordinate a team of data scientists to take better advantages of existing resources they’ve already created? Do you have coders in-house already who can work with a platform designed for people writing in Python, R, Scala, Julia? Are there more user-friendly tools out there your company can use if you don’t? What do you do if some of your data requires tighter security protocols around it? Or if some of your data models themselves are proprietary and/or confidential?

All of these questions are part and parcel of the big one: How can companies tell what makes a good data science platform for their needs before investing time and money? Are traditional enterprise software vendors like IBM, Microsoft, SAP, SAS dependable in this space? What about companies like Alteryx, H2O.ai, KNIME, RapidMiner? Other popular platforms under consideration should also include Anaconda, Angoss (recently acquired by Datawatch), Domino, Databricks, Dataiku, MapR, Mathworks, Teradata, TIBCO. And then there’s new startups like Sentenai, focused on streaming sensor data, and slightly more established companies like Cloudera looking to expand from their existing offerings.

Over the next several months, I’ll be digging deeply to answer these questions, speaking with vendors, users, and investors in the data science market. I would love to speak with you, and I look forward to continuing this discussion. And if you’ll be at Alteryx Inspire in June, I’ll see you there.