Data Science Platforms News Roundup, June 2018

On a monthly basis, I will be rounding up key news associated with the Data Science Platforms space for Amalgam Insights. Companies covered will include: Alteryx, Anaconda, Cloudera, Databricks, Dataiku, Datawatch, Domino,, IBM, Immuta, Informatica, KNIME, MathWorks, Microsoft, Oracle, Paxata, RapidMiner, SAP, SAS, Tableau, Talend, Teradata, TIBCO, Trifacta.

Databricks Conquers AI Dilemma with Unified Analytics

At Spark + AI, Databricks announced new capabilities for its Unified Analytics Platform, including MLFlow and Databricks Runtime for ML. MLFlow is an open source multi-cloud framework intended to standardize and simplify machine learning workflows to ensure machine learning gets put into production. Databricks Runtime for ML scales deep learning with new GPU support for AWS and Azure, along with preconfigured environments for the most popular machine learning frameworks such as scikit-learn, Tensorflow, Keras, and XGBoost. The net result of these new capabilities is that Databricks users will be able to get their machine learning work done faster.

IBM and Partnership Aims to Accelerate Adoption of AI in the Enterprise and IBM announced a partnership in early June that permits the use of H2O’s Driverless AI on IBM PowerSystems, Driverless AI has automated machine learning capabilities, while PowerAI is a machine learning and deep learning toolkit, and the combination will permit significantly faster processing overall. This builds on the pre-existing integration H2O’s open source libraries into IBM’s Data Science Experience analytics solution, though when this announcement was made, IBM had not yet debuted PowerAI Enterprise, so the availability of Driverless AI on PowerAI Enterprise remains TBD.

Announcing PowerAI Enterprise: Bringing Data Science into Production

This month, IBM announced the release of PowerAI Enterprise, which runs on IBM PowerSystems. It’s an expansion of IBM’s PowerAI applied AI offering that extends its coverage to include the entire data science workflow. IBM continues to cover its bases by diversifying its data science offerings, adding PowerAI to their existing Data Science Experience and Watson Studio offerings, but this also creates confusion as companies seek to determine which data science platform product suits their needs. We look forward to covering and clarifying this in greater detail.

Alteryx Reveals Newest Platform Release at Inspire 2018

At Alteryx Inspire, Alteryx announced the latest release (2018.2) of the Alteryx Analytics Platform, with improvements such as making common analytic tasks even easier via templates, extending community search across the entire platform, and enhanced onboarding for new users. I detail the new features in my earlier post, Alter(yx)ing Everything at Inspire 2018; the upshot is that Alteryx continues to focus on ease of use for analytics end users.

Introducing Dask for Scalable Machine Learning

Anaconda released Dask, a new Python-based tool for processing large datasets. Python libraries like NumPy, pandas, and scikit-learn are designed to work with data in-memory on a single core; Dask will let data scientists process large datasets in parallel, even on a single computer, without needing to use Spark or another distributed computing framework. This expedites machine learning workflows on large datasets in Python, with the added convenience of being able to remain in your Python work environment.

Finally, I’m also working on a Vendor SmartList for the Data Science Platforms space this summer. If you’d like to learn more about this research initiative, or set up a briefing with Amalgam Insights for potential inclusion, please email me at

What Data Science Platform Suits Your Organization’s Needs?

This summer, my Amalgam Insights colleague Hyoun Park and I will be teaming up to address that question. When it comes to data science platforms, there’s no such thing as “one size fits all.” We are writing this landscape because understanding the processes of scaling data science beyond individual experiments and integrating it into your business is difficult. By breaking down the key characteristics of the data science platform market, this landscape will help potential buyers choose the appropriate platform for your organizational needs. We will examine the following questions that serve as key differentiators to determine appropriate data science platform purchasing solutions to figure out which characteristics, functionalities, and policies differentiate platforms supporting introductory data science workflows from those supporting scaled-up enterprise-grade workflows.

Amalgam’s Assumptions: The baseline order of operations for conducting data science experiments begins with understanding the business problem you’re trying to address. Gathering, prepping, and exploring the data are the next steps, done to extract appropriate features and start creating your model. The modeling process is iterative, and data scientists will adjust their model throughout the process based on feedback. Finally, if and when a model is deemed satisfactory, it can be deployed in some form.

How do these platforms support reproducibility of data, workflows, and results?

One advantage some data science platforms provide is the ability to track and save the data and hyperparameters run in each experiment, so that that experiment can be re-run at any time. Individual data scientists running ad hoc experiments need to do this tracking manually, if they even know to bother with it.

How secure, governable, and compliant are these platforms compared to corporate, standards-based, and legislative needs?

Data access is fragmented, and in early-stage data science setups, it’s not uncommon for data scientists to copy and paste and store the data they need on their own laptop, because they lack the ability to use that data directly while keeping it secure in an IT-approved manner. Data science platforms can help make this secure access process easier.

How do these platforms support collaboration between data scientists, data analysts, IT, and line-of-business departments?

Your data scientists should be able to share their reports in a usable form with the rest of the business, whether this looks likes reports, dashboards, microservices, or apps. In addition, the consumers of these data outputs need to be able to give feedback to the producers to improve results. To capitalize on data science experiments being done in a company, some level of collaboration is necessary, but this may mean different things to different organizations. Some have shared code repositories. Some use chat. Effectively scaling up data science operations requires a more consistent experience across the board, so that everybody knows where to find what they need to get their work done. Centralizing feedback on models into the platform, associated with the models and their outputs, is one example of enabling the consistency necessary.

How do these platforms support a consistent view of data science based on the user interfaces and user experiences that the platforms provide to all users?

This consistency isn’t just limited to creating a model catalog with centralized feedback – the process of going from individual data scientists operating ad hoc and using their specific preferred tools to a standardized experience can meet resistance. Data science platforms often support a wide variety of such tools, which can ease this transition, but not all data science platforms support the same sets of tools. But moving to a unified experience makes it easier to onboard new data scientists into your environment.

What do data science teams look like when they are using data science platforms?

Some teams consist of a couple of people constructing skunkworks pipelines out of code as an initial or side project. Others may do enough ongoing data science work that they work with line of business stakeholders, perhaps with the assistance of a project manager. If data science is core business for your organization, that’s a large team relative to your company size no matter how large your company is, and these teams have different needs. A focus of this research is to categorize typical experiences across the spectrum by team size and complexity, code-centricness, and other measures.

Team ComplexityCode-Based or Codeless

By exploring the people, processes, and technological functionalities associated with data science platforms over this summer, Amalgam Insights looks forward to bringing clarity to the market and providing directional recommendations to the enterprise community. This Vendor SmartList on Data Science Platforms will explore these questions and more in differentiating between a variety of Data Science Platforms currently in the market including, but not limited to: Alteryx, Anaconda, Cloudera, Databricks, Dataiku, Domino,, IBM, KNIME, Mathworks, Oracle, RapidMiner, SAP, SAS Viya, Teradata, TIBCO, and other startups and new entrants in this space that establish themselves over the Summer of 2018.

If you’d like to learn more about this research initiative, or set up a briefing with Amalgam Insights for potential inclusion, please email me at


Workday Surprises the IPO Market and Acquires Adaptive Insights

Key Stakeholders: Chief Information Officers, Chief Financial Officers, Chief Operating Officers, Chief Digital Officers, Chief Technology Officer, Accounting Directors and Managers, Sales Operations Directors and Managers, Controllers, Finance Directors and Managers, Corporate Planning Directors and Managers Why It Matters: Workday snatched Adaptive Insights away from the public markets only days before IPO, acquiring a proven enterprise planning…

Please register or log into your Free Amalgam Insights Community account to read more.
Log In Register

Market Milestone: Oracle Builds Data Science Gravity By Purchasing

Bridging the Gap

Industry: Data Science Platforms Key Stakeholders: IT managers, data scientists, data analysts, database administrators, application developers, enterprise statisticians, machine learning directors and managers, current customers, current Oracle customers Why It Matters: Oracle released a number of AI tools in Q4 2017, but until now, it lacked a data science platform to support complete data…

Please register or log into your Free Amalgam Insights Community account to read more.
Log In Register

DBAs, Update Your Resumes! Oracle Announces the Availability of Oracle Autonomous Data Warehouse Cloud

On March 27th, Oracle announced availability of the Oracle Autonomous Data Warehouse Cloud, a service that will spin up a data warehouse and provide automated security, high availability, performance tuning, scaling, patching, and administration at a cost guaranteeed to be half of equivalent Amazon Web Services resources through May 2019. Built on Oracle Database 18c, this new service is both a godsend and a warning call for IT.

As Amalgam said last December, Oracle’s push towards what they are calling the “Autonomous Database” and “Autonomous Cloud” is an important step forward in envisioning an new generation of IT where the operational tasks of rules-based administration, monitoring, and iterative performance tuning are handled without direct human intervention. This will allow IT departments to drive more infrastructure into the cloud and reduce the overall Total Cost of Ownership. This is a fundamental change and differs radically from cloud providers such as Amazon and Microsoft that are providing granular services, but are not replacing the management of those services.

Here’s what you should expect

Please register or log into your Free Amalgam Insights Community account to read more.
Log In Register

Anaplan States Planning Is Dead, Focuses on the Era of Real-Time Decision

Recommended Reading for: Finance, Sales Operations, Supply Chain Management, IT Management, and Enterprise Strategy Personnel
Companies Mentioned: Anaplan, IBM, SAP, Oracle, Microstrategy, Tableau, DataRobot, TROVE Data, Louis Vuitton, Premji Invest, Salesforce Ventures, Top Tier Capital Partners, Baillie Gifford, Granite Ventures, Industry Ventures, Meritech Capital, Constellation Research, Ventana Research, IDC, Mint Jutras, ISG, Gartner, Apps Run the World, TechVentive

On March 6th and 7th, 2018, Amalgam Insights attended Anaplan Hub 18. Anaplan has been on Amalgam analysts’ radar for several years, as we consider Anaplan’s Hyperblock foundation and ability to serve multi-departmental planning in enterprises without a year or more of setup to be fundamental advantages. As we have covered this company, we have been waiting for Anaplan to reach its breakthrough moment where it takes its place as one of the true market leaders in enterprise applications. It is in this context that we attended Anaplan Hub and judged our interactions with Anaplan executives, customers, and partners.

This report provides updates on Anaplan’s key business metrics, executive insights from an analyst-only panel, keynote and product announcements, a 2018 perspective on customer success stories with Anaplan, and Amalgam’s expectations for Anaplan in 2018 and beyond as both a real-time planning application and a Platform as a Service.

Anaplan Key Business Updates

Please register or log into your Free Amalgam Insights Community account to read more.
Log In Register

Blockchain! What is it Good For?

Diamond - Immutable and Hardened
Tom Petrocelli, Amalgam Insights Contributing Analyst

Blockchain looks to be one of those up and coming technologies that is constantly being talked about. Many of the largest IT companies – IBM, Microsoft, and Oracle to name few – plus a not-for-profit or two are heavily promoting blockchain. Clearly, there is intense interest, much of it fueled by exotic-sounding cryptocurrencies such as Bitcoin and Ethereum. The big question I get asked – and analysts are supposed to be able to answer the big questions – is “What can I use blockchain for?”
Continue reading “Blockchain! What is it Good For?”

Market Milestone: Red Hat Acquires CoreOS Changing the Container Landscape

We have just published a new document from Tom Petrocelli analyzing Red Hat’s $250 million acquisition of CoreOS and why it matters for DevOps and Systems Architecture managers. This report is recommended for CIOs, System Architects, IT Managers, System Administrators, and Operations Managers who are evaluating CoreOS and Red Hat as container solutions to support…

Please register or log into your Free Amalgam Insights Community account to read more.
Log In Register

The Brain Science of Effective Corporate Soft Skills Training

Cognitive and Behavior Systems of Learning

Companies Mentioned: Deloitte, Salesforce, SAP, Cornerstone, Saba, Skillsoft, Fivel, PageUp, PeopleFluent, Talentsoft, Oracle, SilkRoad, IBM, Lumesse, Litmos, D2L, LearnCore, and Lessonly

Soft skills are “people skills”, and they are extremely important in the commercial sector. They involve showing and feeling empathy, embracing diversity, and understanding that we all have biases that we need to be aware of and keep in check. They involve effective interpersonal interactions and real-time communication skills and are relevant at all corporate levels. Whether office staff who interface with clients, office managers who interface with employees and their superiors, or the C-suite who provide the leadership and vision for the company, effective soft skills matter. An individual with strong soft skills can be an effective collaborator, leader, and “good” citizen. They not only know “what” behaviors are appropriate and inappropriate, but they know “how” to generate those behaviors and do so in a highly effective manner.
Continue reading “The Brain Science of Effective Corporate Soft Skills Training”

Dual Learning Systems in the Brain: Implications for Corporate Training

Effective training is critical in all business sectors. In 2017, over $360 billion was spent on training worldwide, with over $160 billion spent in the U.S. alone. Given the ever-changing nature of the corporate landscape, as new technologies are introduced (e.g., AI) or upgraded (e.g., constant software upgrades), and as new challenges arise (e.g., sexual harassment in the workplace) corporate training must evolve to meet the growing need.
Continue reading “Dual Learning Systems in the Brain: Implications for Corporate Training”