Data Science Platforms News Roundup, June 2018

On a monthly basis, I will be rounding up key news associated with the Data Science Platforms space for Amalgam Insights. Companies covered will include: Alteryx, Anaconda, Cloudera, Databricks, Dataiku, Datawatch, Domino,, IBM, Immuta, Informatica, KNIME, MathWorks, Microsoft, Oracle, Paxata, RapidMiner, SAP, SAS, Tableau, Talend, Teradata, TIBCO, Trifacta.

Databricks Conquers AI Dilemma with Unified Analytics

At Spark + AI, Databricks announced new capabilities for its Unified Analytics Platform, including MLFlow and Databricks Runtime for ML. MLFlow is an open source multi-cloud framework intended to standardize and simplify machine learning workflows to ensure machine learning gets put into production. Databricks Runtime for ML scales deep learning with new GPU support for AWS and Azure, along with preconfigured environments for the most popular machine learning frameworks such as scikit-learn, Tensorflow, Keras, and XGBoost. The net result of these new capabilities is that Databricks users will be able to get their machine learning work done faster.

IBM and Partnership Aims to Accelerate Adoption of AI in the Enterprise and IBM announced a partnership in early June that permits the use of H2O’s Driverless AI on IBM PowerSystems, Driverless AI has automated machine learning capabilities, while PowerAI is a machine learning and deep learning toolkit, and the combination will permit significantly faster processing overall. This builds on the pre-existing integration H2O’s open source libraries into IBM’s Data Science Experience analytics solution, though when this announcement was made, IBM had not yet debuted PowerAI Enterprise, so the availability of Driverless AI on PowerAI Enterprise remains TBD.

Announcing PowerAI Enterprise: Bringing Data Science into Production

This month, IBM announced the release of PowerAI Enterprise, which runs on IBM PowerSystems. It’s an expansion of IBM’s PowerAI applied AI offering that extends its coverage to include the entire data science workflow. IBM continues to cover its bases by diversifying its data science offerings, adding PowerAI to their existing Data Science Experience and Watson Studio offerings, but this also creates confusion as companies seek to determine which data science platform product suits their needs. We look forward to covering and clarifying this in greater detail.

Alteryx Reveals Newest Platform Release at Inspire 2018

At Alteryx Inspire, Alteryx announced the latest release (2018.2) of the Alteryx Analytics Platform, with improvements such as making common analytic tasks even easier via templates, extending community search across the entire platform, and enhanced onboarding for new users. I detail the new features in my earlier post, Alter(yx)ing Everything at Inspire 2018; the upshot is that Alteryx continues to focus on ease of use for analytics end users.

Introducing Dask for Scalable Machine Learning

Anaconda released Dask, a new Python-based tool for processing large datasets. Python libraries like NumPy, pandas, and scikit-learn are designed to work with data in-memory on a single core; Dask will let data scientists process large datasets in parallel, even on a single computer, without needing to use Spark or another distributed computing framework. This expedites machine learning workflows on large datasets in Python, with the added convenience of being able to remain in your Python work environment.

Finally, I’m also working on a Vendor SmartList for the Data Science Platforms space this summer. If you’d like to learn more about this research initiative, or set up a briefing with Amalgam Insights for potential inclusion, please email me at

Lynne Baer: Clarifying Data Science Platforms for Business

Word cloud of data science software and terms

My name is Lynne Baer, and I’ll be covering the world of data science software for Amalgam Insights. I’ll investigate data science platforms and apps to solve the puzzle of getting the right tools to the right people and organizations.

“Data science” is on the tip of every executive’s tongue right now. The idea that new business initiatives (and improvements to existing ones) can be found in the data a company is already collecting is compelling. Perhaps your organization has already dipped its toes in the data discovery and analysis waters – your employees may be managing your company’s data in Informatica, or performing statistical analysis in Statistica, or experimenting with Tableau to transform data into visualizations.

But what is a Data Science Platform? Right now, if you’re looking to buy software for your company to do data science-related tasks, it’s difficult to know which applications will actually suit your needs. Do you already have a data workflow you’d like to build on, or are you looking to the structure of an end-to-end platform to set your data science initiative up for success? How do you coordinate a team of data scientists to take better advantages of existing resources they’ve already created? Do you have coders in-house already who can work with a platform designed for people writing in Python, R, Scala, Julia? Are there more user-friendly tools out there your company can use if you don’t? What do you do if some of your data requires tighter security protocols around it? Or if some of your data models themselves are proprietary and/or confidential?

All of these questions are part and parcel of the big one: How can companies tell what makes a good data science platform for their needs before investing time and money? Are traditional enterprise software vendors like IBM, Microsoft, SAP, SAS dependable in this space? What about companies like Alteryx,, KNIME, RapidMiner? Other popular platforms under consideration should also include Anaconda, Angoss (recently acquired by Datawatch), Domino, Databricks, Dataiku, MapR, Mathworks, Teradata, TIBCO. And then there’s new startups like Sentenai, focused on streaming sensor data, and slightly more established companies like Cloudera looking to expand from their existing offerings.

Over the next several months, I’ll be digging deeply to answer these questions, speaking with vendors, users, and investors in the data science market. I would love to speak with you, and I look forward to continuing this discussion. And if you’ll be at Alteryx Inspire in June, I’ll see you there.

Cloudera Analyst Conference Makes The Case for Analytic & AI Insights at Scale

On April 9th and 10th, Amalgam Insights attended the fifth Cloudera’s Industry Analyst and Influencer Conference (which I’ll self-servingly refer to as the Analyst Conference since I attended as an industry analyst) in Santa Monica. Cloudera sought to make the case that it was evolving beyond the market offerings that it is currently best known for as a Hadoop distribution and commercial data lake in becoming a machine learning and analytics platform. In doing so, Cloudera was extremely self-aware of its need to progress beyond the role of multi-petabyte storage at scale to a machine learning solution.
Cloudera’s Challenges in Enterprise Machine Learning 
Please register or log into your Free Amalgam Insights Community account to read more.
Log In Register

Blockchain! What is it Good For?

Diamond - Immutable and Hardened
Tom Petrocelli, Amalgam Insights Contributing Analyst

Blockchain looks to be one of those up and coming technologies that is constantly being talked about. Many of the largest IT companies – IBM, Microsoft, and Oracle to name few – plus a not-for-profit or two are heavily promoting blockchain. Clearly, there is intense interest, much of it fueled by exotic-sounding cryptocurrencies such as Bitcoin and Ethereum. The big question I get asked – and analysts are supposed to be able to answer the big questions – is “What can I use blockchain for?”
Continue reading “Blockchain! What is it Good For?”

Data and Analytic Strategies for Developing Ethical IT: a BrightTALK webinar

Recommended Audience: CIOs, Enterprise Architects, Data Managers, Analytics Managers, Data Scientists, IT Managers Vendors Mentioned: Trifacta, Paxata, Datameer, Datawatch, Lavastorm, Alation, Tamr, Unifi, 1010Data, Podium Data, IBM, Domo, Microsoft, Information Builders, Board, Microstrategy, Cloudera,, RapidMiner, Domino Data Lab, Dataiku, TIBCO, SAS, Amazon Web Services, Google, DataRobot. In case you missed it, I just finished…

Please register or log into your Free Amalgam Insights Community account to read more.
Log In Register

Microsoft: The New Player in Quantum Computing


On the week of September 25th, 2017, Microsoft made a huge announcement at its annual Ignite and Envision conference. Microsoft has become one of a small number of companies that is demonstrating quantum computing. IBM is another company that is also pursuing this rather futuristic computing model.

For those who are not up-to-date on quantum computing, it uses quantum properties such as superposition and entanglement to develop a new way of computing. Current computers are built around tiny electron switches called transistors that allow for two states, which represent the binary system we have today. Quantum computers leverage quantum states that give us ones, zeros, and combinations of one and zero. This means a single qubit, the quantum equivalent of a bit, can represent many more states than the bit can. This is, of course, a gross oversimplification but quantum computing promises to deliver more dense and exponentially faster computing.

There are a number of problems with practical quantum computing. The hardware is still in a nascent stage and must be cooled to a temperature that is quite a bit colder than deep space. This makes it much more likely that quantum computing will be purchased via a cloud model than on-premises. The other inhibitor is that there is no standard programming model for quantum computing. IBM has demonstrated a visual programming model that shows how quantum computing works but is clearly not going to be a serious way to write real programs. Microsoft, on the other hand, showed a more standard looking curly bracket programming language. This application layer makes quantum computing more accessible to existing programmers who are more used to the current model of computing.

When quantum computing becomes practical – I would predict that is at least 5 years away, perhaps longer – it won’t be for everyday computing tasks. The current model is already more than adequate for those tasks. It’s also unlikely that the capabilities of quantum computers, especially the information dense qubit, and costs will have much a place in transactional computing. Instead, quantum computing will be used for analyzing very large and complex data sets for simulation and AI. That’s fine because the AI and analytics market is still new and the future needs are not yet completely known. That future computing needs is what quantum computing is meant to address. Even today’s big data applications can stretch computing capabilities and force batch analytics instead of real-time for some use cases.
Microsoft’s entry into what has been an otherwise esoteric corner of the computing world signals that quantum computing is on the path to being real. It has a long way to go and many obstacles to overcome but it’s no longer just science fiction or academic. It will be years but it is on the way to becoming mainstream.

Note: This post was originally posted on Tom’s Take

Microsoft Infuses Products with Machine Learning and the Social Graph

This past week (September 25 – 27, 2017) Microsoft held its Ignite and Envision Conferences. The co-conferences encompass both technology (Ignite) and the business of technology (Envision). Microsoft’s announcements reflected that duality with esoteric technology subjects such as mixed reality and quantum computing on equal footing with digital transformation, a mainstay of modern business transformation projects. There were two announcements that, in my opinion, will have the most impact in the short-term because they were more foundational.

The first announcement was that machine learning was being integrated into every Microsoft productivity and business product. Most large software companies are adding machine learning to their platforms but no company has Microsoft’s reach into modern businesses. Like IBM, SAP and Oracle, Microsoft can embed machine learning in business applications such as CRM. Microsoft can also integrate machine learning into productivity applications as can Google. IBM can do both but IBM’s office applications aren’t close to having the market penetration of Microsoft Office 365. Microsoft has the opportunity to embed machine learning everywhere in a business, a capability that none of their competitors have.
Continue reading “Microsoft Infuses Products with Machine Learning and the Social Graph”

4 Key Executive ASC 606 Lessons Microsoft Is Teaching Us

Microsoft OnPrem Annuity Revenue
Drawing of Revenue Curve
Revenue (from Pixabay)

Note: To read Part 1 of Amalgam’s coverage of Microsoft’s ASC 606 adoption, please check how Microsoft Early Adopts New ASC 606 Revenue Recognition Standard.

Recommended Audience: CFO, Chief Revenue Officers, CIOs, COOs, IT Finance, Sales Operations seeking to understand how ASC 606 revenue recognition changes will affect their responsibilities.

On August 3rd, 2017, Microsoft held an investor metrics conference call led by:

  • Chris Suh – GM, Investor Relations
  • Frank Brod, Chief Accounting Officer
  • John Seethoff, Deputy General Counsel and Corporate Secretary

This call was focused on its implementation of new accounting standards, including ASC 606 for revenue recognition and ASC 842 for lease accounting.

There have been multiple acquisitions and announcements in the revenue recognition space as IT vendors ensure that they can support the ASC 606 standard including:
Continue reading “4 Key Executive ASC 606 Lessons Microsoft Is Teaching Us”

Microsoft “Early Adopts” New ASC 606 Revenue Recognition Standard

The ASC 606 Apocalypse is at hand!
Apocalypse by Michael Lehenbauer on Flickr

Note: This topic is of key importance for CFOs using or considering a subscription-based business model and for CIOs tasked with aligning technology to revenue recognition. Part 2 of this topic is 4 Key Executive ASC 606 Lessons Microsoft is Teaching Us.

On July 20, 2017, Microsoft announced a very successful Q4 FY17 where they announced both successful GAAP and non-GAAP results.

· Revenue was $23.3 billion GAAP, and $24.7 billion non-GAAP
· Operating income was $5.3 billion GAAP, and $7.0 billion non-GAAP
· Net income was $6.5 billion GAAP, and $7.7 billion non-GAAP
· Diluted earnings per share was $0.83 GAAP, and $0.98 non-GAAP

But the part that got my attention was a relatively minor 2 paragraph note near the bottom of the earnings announcement on ASC 606 revenue recognition:
Continue reading “Microsoft “Early Adopts” New ASC 606 Revenue Recognition Standard”