Data Science Platforms News Roundup, June 2018

On a monthly basis, I will be rounding up key news associated with the Data Science Platforms space for Amalgam Insights. Companies covered will include: Alteryx, Anaconda, Cloudera, Databricks, Dataiku, Datawatch, Domino,, IBM, Immuta, Informatica, KNIME, MathWorks, Microsoft, Oracle, Paxata, RapidMiner, SAP, SAS, Tableau, Talend, Teradata, TIBCO, Trifacta.

Databricks Conquers AI Dilemma with Unified Analytics

At Spark + AI, Databricks announced new capabilities for its Unified Analytics Platform, including MLFlow and Databricks Runtime for ML. MLFlow is an open source multi-cloud framework intended to standardize and simplify machine learning workflows to ensure machine learning gets put into production. Databricks Runtime for ML scales deep learning with new GPU support for AWS and Azure, along with preconfigured environments for the most popular machine learning frameworks such as scikit-learn, Tensorflow, Keras, and XGBoost. The net result of these new capabilities is that Databricks users will be able to get their machine learning work done faster.

IBM and Partnership Aims to Accelerate Adoption of AI in the Enterprise and IBM announced a partnership in early June that permits the use of H2O’s Driverless AI on IBM PowerSystems, Driverless AI has automated machine learning capabilities, while PowerAI is a machine learning and deep learning toolkit, and the combination will permit significantly faster processing overall. This builds on the pre-existing integration H2O’s open source libraries into IBM’s Data Science Experience analytics solution, though when this announcement was made, IBM had not yet debuted PowerAI Enterprise, so the availability of Driverless AI on PowerAI Enterprise remains TBD.

Announcing PowerAI Enterprise: Bringing Data Science into Production

This month, IBM announced the release of PowerAI Enterprise, which runs on IBM PowerSystems. It’s an expansion of IBM’s PowerAI applied AI offering that extends its coverage to include the entire data science workflow. IBM continues to cover its bases by diversifying its data science offerings, adding PowerAI to their existing Data Science Experience and Watson Studio offerings, but this also creates confusion as companies seek to determine which data science platform product suits their needs. We look forward to covering and clarifying this in greater detail.

Alteryx Reveals Newest Platform Release at Inspire 2018

At Alteryx Inspire, Alteryx announced the latest release (2018.2) of the Alteryx Analytics Platform, with improvements such as making common analytic tasks even easier via templates, extending community search across the entire platform, and enhanced onboarding for new users. I detail the new features in my earlier post, Alter(yx)ing Everything at Inspire 2018; the upshot is that Alteryx continues to focus on ease of use for analytics end users.

Introducing Dask for Scalable Machine Learning

Anaconda released Dask, a new Python-based tool for processing large datasets. Python libraries like NumPy, pandas, and scikit-learn are designed to work with data in-memory on a single core; Dask will let data scientists process large datasets in parallel, even on a single computer, without needing to use Spark or another distributed computing framework. This expedites machine learning workflows on large datasets in Python, with the added convenience of being able to remain in your Python work environment.

Finally, I’m also working on a Vendor SmartList for the Data Science Platforms space this summer. If you’d like to learn more about this research initiative, or set up a briefing with Amalgam Insights for potential inclusion, please email me at

Informatica Prepares Enterprise Data for the Era of Machine Learning and the Internet of Things

From May 21st to May 24th, Amalgam Insights attended Informatica World. Both my colleague Tom Petrocelli and I were able to attend and gain insights on present and future of Informatica. Based on discussions with Informatica executives, customers, and partners, I gathered the following takeaways.

Informatica made a number of announcements that fit well into the new era of Machine Learning that is driving enterprise IT in 2018. Tactically, Informatica’s announcement of providing its Intelligent Cloud Services, its Integration Platform as a Service offering, natively on Azure represents a deeper partnership with Microsoft. Informatica’s data integration, synchronization, and migration services go hand-in-hand with Microsoft’s strategic goal of getting more data into the Azure cloud and shifting the data gravity of the cloud. Amalgam believes that this integration will also help increase the value of Azure Machine Learning Studio, which now will have more access to enterprise data.

Lynne Baer: Clarifying Data Science Platforms for Business

Word cloud of data science software and terms

My name is Lynne Baer, and I’ll be covering the world of data science software for Amalgam Insights. I’ll investigate data science platforms and apps to solve the puzzle of getting the right tools to the right people and organizations.

“Data science” is on the tip of every executive’s tongue right now. The idea that new business initiatives (and improvements to existing ones) can be found in the data a company is already collecting is compelling. Perhaps your organization has already dipped its toes in the data discovery and analysis waters – your employees may be managing your company’s data in Informatica, or performing statistical analysis in Statistica, or experimenting with Tableau to transform data into visualizations.

But what is a Data Science Platform? Right now, if you’re looking to buy software for your company to do data science-related tasks, it’s difficult to know which applications will actually suit your needs. Do you already have a data workflow you’d like to build on, or are you looking to the structure of an end-to-end platform to set your data science initiative up for success? How do you coordinate a team of data scientists to take better advantages of existing resources they’ve already created? Do you have coders in-house already who can work with a platform designed for people writing in Python, R, Scala, Julia? Are there more user-friendly tools out there your company can use if you don’t? What do you do if some of your data requires tighter security protocols around it? Or if some of your data models themselves are proprietary and/or confidential?

All of these questions are part and parcel of the big one: How can companies tell what makes a good data science platform for their needs before investing time and money? Are traditional enterprise software vendors like IBM, Microsoft, SAP, SAS dependable in this space? What about companies like Alteryx,, KNIME, RapidMiner? Other popular platforms under consideration should also include Anaconda, Angoss (recently acquired by Datawatch), Domino, Databricks, Dataiku, MapR, Mathworks, Teradata, TIBCO. And then there’s new startups like Sentenai, focused on streaming sensor data, and slightly more established companies like Cloudera looking to expand from their existing offerings.

Over the next several months, I’ll be digging deeply to answer these questions, speaking with vendors, users, and investors in the data science market. I would love to speak with you, and I look forward to continuing this discussion. And if you’ll be at Alteryx Inspire in June, I’ll see you there.

28 Hours as an Industry Analyst at Strata Data


Last week, I attended Strata Data Conference at the Javitz Center in New York City to catch up with a wide variety of data science and machine learning users, enablers, and thought leaders. In the process, I had the opportunity to listen to some fantastic keynotes and to chat with 30+ companies looking for solutions, 30+ vendors presenting at the show, and attend with a number of luminary industry analysts and thought leaders including Ovum’s Tony Baer, EMA’s John Myers, Aberdeen Group’s Mike Lock, and Hurwitz & Associates’ Judith Hurwitz.

From this whirwind tour of executives, I took a lot of takeaways from the keynotes and vendors that I can share and from end users that I unfortunately have to keep confidential. To give you an idea of what an industry analyst notes, following are a short summary of takeaways I took from the keynotes and from each vendor that I spoke to:

Keynotes: The key themes that really got my attention is the idea that AI requires ethics, brought up by Joanna Bryson, and that all data is biased, which danah boyd discussed. This idea that data and machine learning have their own weaknesses that require human intervention, training, and guidance is incredibly important. Over the past decade, technologists have put their trust in Big Data and the idea that data will provide answers, only to find that a naive and “unbiased” analysis of data has its own biases. Context and human perspective are inherent to translating data into value: this does not change just because our analytic and data training tools are increasingly nuanced and intelligent in nature.

Behind the hype of data science, Big Data, analytic modeling, robotic process automation, DevOps, DataOps, and artifical intelligence is this fundamental need to understand that data, algorithms, and technology all have inherent biases as the following tweet shows:
Continue reading “28 Hours as an Industry Analyst at Strata Data”

Informatica Unleashes AI, Brand, Cloud, and Data-Driven Disruption at Informatica World 2017

New Informatica Brand for New Informatica Aspirations
Amalgam Insights (AI) recently attended Informatica World 2017, where executives, partners, and customers provided backing for Informatica’s ability to support “The Disruptive Power of Data,” (an Informatica-trademarked phrase) as well as its positioning as the Enterprise Cloud Data Management leader.

