Four Key Announcements from H2O World San Francisco

Last week at H2O World San Francisco, H2O.ai announced a number of improvements to Driverless AI, H2O, Sparkling Water, and AutoML, as well as several new partnerships for Driverless AI. The improvements provide incremental improvements across the platform, while the partnerships reflect H2O.ai expanding their audience and capabilities. This piece is intended to provide guidance to data analysts, data scientists, and analytic professionals working on including machine learning in their workflows.

Announcements

H2O.ai has integrated H2O Driverless AI with Alteryx Designer; the connector is available for download in the Alteryx Analytics Gallery. This will permit Alteryx users to implement more advanced and automatic machine learning algorithms into analytic workflows in Designer, as well as doing automatic feature engineering for their machine learning models. In addition, Driverless AI models can be deployed to Alteryx Promote for model management and monitoring, reducing time to deployment. Both of these new capabilities provide Alteryx-using business analysts and citizen data scientists more direct and expanded access to machine learning via H2O.ai.

H2O.ai is integrating Kx’s time-series database, kdb+, into Driverless AI. This will extend Driverless AI’s ability to process large datasets, resulting in faster identification of more performant predictive capabilities and machine learning models. Kx users will be able to perform feature engineering for machine learning models on their time series datasets within Driverless AI, and create time-series specific queries.

H2O.ai also announced a collaboration with Intel that will focus on accelerating H2O.ai technology on Intel platforms, including the Intel Xeon Scalable processor and H2O.ai’s implementation of XGBoost. Driverless AI on Intel, globally.  Accelerating H2O on Intel will help establish Intel’s credibility in machine learning and artificial intelligence for heavy compute loads. Other aspects of this collaboration will include expanding the reach of data science and machine learning by supporting efforts to integrate AI into analytics workflows and using Intel’s AI Academy to teach relevant skills. The details of the technical projects will remain under wraps until spring.

Finally, H2O.ai announced numerous improvements to both Driverless AI and their open-source H2O, Sparkling Water, and AutoML, mostly focused on expanding support for more algorithms and heavier workloads among their product suite. Among the improvements that caught my eye was the new ability to inspect trees thoroughly for all of the tree-based algorithms that the open-source H2O platform supports. With concern about “black-box” models and lack of insight around how a given model performs its analysis and why it yields the results it does for any given experiment, providing an API for tree inspection is a practical step towards making the logic behind model performance and output more transparent for at least some machine learning models.

Recommendations

Alteryx users seeking to implement machine learning models into analytic workflows should take advantage of increased access to H2O Driverless AI. Providing more machine learning capabilities to business analysts and citizen data scientists enhances the capabilities available to their data analytics workflows; Driverless AI’s existing AutoDoc capability will be particularly useful for ensuring Alteryx users understand the results of the more advanced techniques they now have access to.

If your organization collects time-series data but has not yet pursued analytics of this data with machine learning yet, consider trialing KX’s kdb+ and H2O’s Driverless AI. With this integration, Driverless AI will be able to quickly and automatically process time series data stored in kdb+, allowing swift identification of performant models and predictive capabilities.

If your organization is considering making significant investments in heavy-duty computing assets for heavy machine learning loads in the medium-term future, keep an eye on the work Intel will be doing to design chips for specific types of machine learning workloads. NVIDIA has its GPUs and Google its TPUs; by partnering with H2O, Intel is declaring its intentions to remain relevant in this market.

If your organization is concerned about the effects of “black box” machine learning models, the ability to inspect tree-based models in H2O, along with the AutoDoc functionality in Driverless AI, are starting to make the logic behind machine learning models in H2O more transparent. This new ability to inspect tree-based algorithms is a key step towards more thorough governance surrounding the results of machine learning endeavors.

Data Science and Machine Learning News Roundup, January 2019

On a monthly basis, I will be rounding up key news associated with the Data Science Platforms space for Amalgam Insights. Companies covered will include: Alteryx, Amazon, Anaconda, Cambridge Semantics, Cloudera, Databricks, Dataiku, DataRobot, Datawatch, DominoElastic, Google, H2O.ai, IBM, Immuta, Informatica, KNIME, MathWorks, Microsoft, Oracle, Paxata, RapidMiner, SAP, SAS, Tableau, Talend, Teradata, TIBCO, Trifacta, TROVE.

Cloudera and Hortonworks Complete Planned Merger

In early January, Cloudera and Hortonworks completed their planned merger. With this, Cloudera becomes the default machine learning ecosystem for Hadoop-based data, while providing an easy pathway for expanding into  machine learning and analytics capabilities for Hortonworks customers.

Study: 89 Percent of Finance Teams Yet to Embrace Artificial Intelligence

A study conducted by the Association of International Certified Professional Accountants (AICPA) and Oracle revealed that 89% of organizations have not deployed AI to their finance groups. Although a correlation exists between companies with revenue growth and companies that are using AI, the key takeaway is that artificial intelligence is still in the early adopter phase for most organizations.

Gartner Magic Quadrant for Data Science and Machine Learning Platforms

In late January, Gartner released its Magic Quadrant for Data Science and Machine Learning Platforms. New to the Data Science and Machine Learning MQ this year are both DataRobot and Google – two machine learning offerings with completely different audiences and scope. DataRobot offers an automated machine learning service targeted towards “citizen data scientists,” while Google’s machine learning tools, though part of Google Cloud Platform, are more of a DIY data pipeline targeted towards developers. By contrast, I find it curious that Amazon’s SageMaker machine learning platform – and its own collection of task-specific machine learning tools, despite their similarity to Google’s – failed to make the quadrant, given this quadrant’s large umbrella.

While data science and machine learning are still emerging markets, the contrasting demands of these technologies made by citizen data scientists and by cutting-edge developers warrants splitting the next Data Science and Machine Learning Magic Quadrant into separate reports targeted to the considerations of each of these audiences. In particular, the continued growth of automated machine learning technologies will likely drive such a split, as citizen data scientists pursue a “good enough” solution that provides quick results.

Data Science and Machine Learning News, November 2018

On a monthly basis, I will be rounding up key news associated with the Data Science Platforms space for Amalgam Insights. Companies covered will include: Alteryx, Amazon, Anaconda, Cambridge Semantics, Cloudera, Databricks, Dataiku, DataRobot, Datawatch, DominoElastic, H2O.ai, IBM, Immuta, Informatica, KNIME, MathWorks, Microsoft, Oracle, Paxata, RapidMiner, SAP, SAS, SnapLogic, Tableau, Talend, Teradata, TIBCO, Trifacta, TROVE.

Continue reading “Data Science and Machine Learning News, November 2018”

Data Science and Machine Learning News, October 2018

On a monthly basis, I will be rounding up key news associated with the Data Science Platforms space for Amalgam Insights. Companies covered will include: Alteryx, Anaconda, Cambridge Semantics, Cloudera, Databricks, Dataiku, DataRobot, Datawatch, DominoElastic, H2O.ai, IBM, Immuta, Informatica, KNIME, MathWorks, Microsoft, Oracle, Paxata, RapidMiner, SAP, SAS, Tableau, Talend, Teradata, TIBCO, Trifacta, TROVE.

Please register or log into your Free Amalgam Insights Community account to read more.
Log In Register

Data Science Platforms News Roundup, September 2018

On a monthly basis, I will be rounding up key news associated with the Data Science Platforms space for Amalgam Insights. Companies covered will include: Alteryx, Anaconda, Cambridge Semantics, Cloudera, Databricks, Dataiku, DataRobot, Datawatch, DominoElastic, H2O.ai, IBM, Immuta, Informatica, KNIME, MathWorks, Microsoft, Oracle, Paxata, RapidMiner, SAP, SAS, Tableau, Talend, Teradata, TIBCO, Trifacta, TROVE.

Please register or log into your Free Amalgam Insights Community account to read more.
Log In Register

Why It Matters that IBM Announced Trust and Transparency Capabilities for AI


Note: This blog is a followup to Amalgam Insights’ visit to the “Change the Game” event held by IBM in New York City.

On September 19th, IBM announced its launch of a portfolio of AI trust and transparency capabilities. This announcement got Amalgam Insight’s attention because of IBM’s relevance and focus in the enterprise AI market throughout this decade.  To understand why IBM’s specific launch matters, take a step back in considering IBM’s considerable role in building out the current state of the enterprise AI market.

IBM AI in Context

Since IBM’s public launch of IBM Watson on Jeopardy! in 2011, IBM has been a market leader in enterprise artificial intelligence and spent billions of dollars in establishing both IBM Watson and AI. This has been a challenging path to travel as IBM has had to balance this market-leading innovation with the financial demands of supporting a company that brought in $107 billion in revenue in 2011 and has since seen this number shrink by almost 30%.

In addition, IBM had to balance its role as an enterprise technology company focused on the world’s largest workloads and IT challenges with launching an emerging product better suited for highly innovative startups and experimental enterprises. And IBM also faced the “cloudification” of enterprise IT in general, where the traditional top-down purchase of multi-million dollar IT portfolios is being replaced by piecemeal and business-driven purchases and consumption of best-in-breed technologies.

Seven years later, the jury is still out on how AI will ultimately end up transforming enterprises. What we do know is that a variety of branches of AI are emerging, including

Please register or log into your Free Amalgam Insights Community account to read more.
Log In Register

Data Science Platforms News Roundup, August 2018

On a monthly basis, I will be rounding up key news associated with the Data Science Platforms space for Amalgam Insights. Companies covered will include: Alteryx, Anaconda, Cloudera, Databricks, Dataiku, DataRobot, Datawatch, Domino, H2O.ai, IBM, Immuta, Informatica, KNIME, MathWorks, Microsoft, Oracle, Paxata, RapidMiner, SAP, SAS, Tableau, Talend, Teradata, TIBCO, Trifacta.

Please register or log into your Free Amalgam Insights Community account to read more.
Log In Register

Code-Free to Code-Based: The Power Spectrum of Data Science Platforms

Codeless to Code-Based

The spectrum of code-centricity on data science platforms ranges from “code-free” to “code-based.” Data science platforms frequently boast that they provide environments that require no coding, and that are code-friendly as well. Where a given platform falls along this spectrum affects who can successfully use a given data science platform, and what tasks they are…

Please register or log into your Free Amalgam Insights Community account to read more.
Log In Register

Oracle GraphPipe: Expediting and Standardizing Model Deployment and Querying

On August 15, 2018, Oracle announced the availability of GraphPipe, a network protocol designed to transmit machine learning data between remote processes in a standardized manner, with the goal of simplifying the machine learning model deployment process. The spec is now available on Oracle’s GitHub, along with clients and servers that have implemented the spec for Python and Go (with a Java client soon to come); and a TensorFlow plugin that allows remote models to be included inside TensorFlow graphs.

Oracle’s goal with GraphPipe is to standardize the process of model deployment regardless of the frameworks utilized in the model creation stage.

Please register or log into your Free Amalgam Insights Community account to read more.
Log In Register

Growing Your Data Science Team: Diversifying Beyond Unicorns

A herd of cloned data scientist unicorns

If your organization already has a data scientist, but your data science workload has grown beyond their capacity, you’re probably thinking about hiring another data scientist. Perhaps even a team of them. But cloning your existing data scientist isn’t the best way to grow your organization’s capacity for doing data science.

Why not simply hire more data scientists? First, so many of the tasks listed above are actually well outside the core competency of data scientists’ statistical work, and other roles (some of whom likely already exist in your organization) can perform these tasks much more efficiently. Second, data scientists who can perform all of these tasks well are a rare find; hoping to find their clones in sufficient numbers on the open market is a losing proposition. Third, though your organization’s data science practice continues to expand, the amount of time your original domain expert is able to spend with the data scientist on a growing pool of data science projects does not; it’s time to start delegating some tasks to operational specialists.

Please register or log into your Free Amalgam Insights Community account to read more.
Log In Register