On April 9th and 10th, Amalgam Insights attended the fifth Cloudera’s Industry Analyst and Influencer Conference (which I’ll self-servingly refer to as the Analyst Conference since I attended as an industry analyst) in Santa Monica. Cloudera sought to make the case that it was evolving beyond the market offerings that it is currently best known for as a Hadoop distribution and commercial data lake in becoming a machine learning and analytics platform. In doing so, Cloudera was extremely self-aware of its need to progress beyond the role of multi-petabyte storage at scale to a machine learning solution.
Cloudera’s Challenges in Enterprise Machine Learning
Cloudera’s conference came in the face of multiple challenges. First, Cloudera’s stock had just plunged about 40% after lowering its revenue guidance over the rest of the year. As an industry analyst, this market activity tends to only be important if it fundamentally affects the company’s operations or ability to maintain product and service roadmap progress. Frankly, it is hard to truly understand how, in a rational market, a $20 million change in annual revenue could realistically lead to a billion dollar drop in market cap. Amalgam is not a financial analysis firm, but from my non-financial perspective, it seems extremely odd to make that kind of valuation change on a cloud company that is in transition: either you believe in the evolution from data provider to analytic provider or you have already decided that Cloudera’s future is solely as a data provider. But changing your mind based on $20 million in 2018 business is an extremely short-sighted reason to decide one way or the other.
But enough about that; the market is what it is. The more interesting set of challenges from Amalgam’s perspective is that Cloudera seeks to gain traction as a machine learning workbench and analytic store. These are the value-added use cases that will drive Cloudera’s long-term growth and evolution. The Cloudera Data Science Experience, in particular, is interesting because this framework directly competes against a number of well-funded startups seeking to also manage the future of data science, including Domino Data, Databricks, Dataiku, H20.ai, MapR, and Rapidminer as well as established companies such as Alteryx (once a startup darling, but now a publicly traded company), IBM, Mathworks, Microsoft, SAS, Teradata, and TIBCO. In short, there are a number of companies competing for the mindshare of data scientists seeking to either code or model predictive analytics for enterprise use cases. And this is the case that Cloudera sought to make at this Analyst Conference.
OK, so did Cloudera make its case?
Over the past calendar year, Cloudera has made several key moves towards increased machine learning and analytic support.
- Cloudera has launched its Altus Analytic DB to support BI and SQL based data in the cloud based on Impala
- Altus Data Engineering supports scale-out data sets both for Amazon and Microsoft Azure.
- Apache Kudu provides a fast analytics capability for IoT and log data. Cloudera’s Data Science Workbench provides a straightforward capability for data scientists to bring R, Python, Scala, and other data science tools into a compliant and secure environment.
- Cloudera started its Data Science Experience to support complex data applications based on a shared data catalog.
- Cloudera has also acquired Fast Forward Labs, Hilary Mason’s applied research firm focused on best practices for data science.
The competitive stance would be to say that these capabilities, by and large, are not new or unique to Cloudera and, for the most part, that would be a fair statement. The biggest differentiation, though, is in incorporating Cloudera’s analytics and machine learning with its existing base of 400+ Global 2000 companies as part of approximately 700 large global enterprises and a variety of data-devouring startups requiring a combination of strong data governance, shared data collaboration, and relevant guidance and services.
This stance comes in context of Cloudera’s continued focus of targeting companies that are focused on data monetization and the investments needed to treat data as a protected asset that is always available, massively scalable, and can be delivered based on customer demand. Amalgam notes that this messaging was consistent across all of Cloudera’s executive presenters. In particular, Amalgam notes that Amy O’Connor’s presentation as Cloudera’s Chief Data and Information Officer was especially helpful in showing how Cloudera executes on delivering its products and how Cloudera uses its own technologies to support sales, marketing, support, and security use cases.
Cloudera’s Analytics and Machine Learning Businesses
The big picture perspective is that Cloudera has executed well on the analytic side and that the machine learning story is still a work in progress based both on Cloudera’s existing products and the state of this market. On the analytics side, Cloudera has an analytic database business of over $100 million in revenue built on data optimization, discovery-based data marts for text and advance analytics, and operational data marts for log data, web data, and IoT-based data. This is important because analytic data is an important foundation for developing reporting, analytics, and machine learning on a shared and consistent set of data.
Based on the metrics Cloudera shared, the analytic database business is both mature and has significant growth potential as Cloudera continues to evolve the database with automated workload management and metadata management capabilities to both reduce the total cost of ownership and increase the business context associated with this data. It is hard to argue that Cloudera is not executing on this front or that Cloudera is not well positioned to provide analytics at massive scale both for standalone reporting and for managing workloads and applications.
Cloudera’s Machine Learning story was relatively comprehensive between its support of a wide variety of data and cloud environments through the Shared Data Experience, Cloudera’s Data Science Workbench, and the integration of Fast Forward Labs to provide primary education across topics such as natural language generation, image analytics, probabilistic programming, semantic recommendations, real-time streams, and other key topics. Cloudera’s combination of products, services, and applied research is a model that more vendors should seek to emulate in emerging technology areas to educate business communities on how to support practical application.
But Amalgam’s doubts come from the inherent dynamics of an emerging market. Although Cloudera has a significant percentage of large analytic workloads and a working toolkit, Cloudera’s buying audience has traditionally been focused on those responsible for data warehouses, data marts, and enterprise analytics environments. To cross into the machine learning world, Cloudera will need to bridge the gap between analytic workloads and the worlds of data science and machine learning development. From a practical perspective, this means shifting brand from operational database administration teams to the agile and experimental lone wolves of devs, coding, and massive experimentation. Because data science is still a relatively new enterprise practice with little to no formal governance practices, large organizations are currently not driven to manage data science with the same rigor and structure that core enterprise data is held to. In addition, early adopter enterprises that have built out data science teams will likely have developers who are set in their own approaches and toolkits, which can make standardization and internal adoption of any tool difficult.
Amalgam believes that the need for enterprise-grade data science exists and will grow over time. But the honest truth is that it will take time for this market to mature to the point that a majority of large global enterprises will have a team of distributed data scientists collaborating with each other on key business challenges in the same way that businesses have developed application development teams. This evolution is inevitable in a global business environment where analytics and automation are key drivers for improvement, but this change will realistically take another two-three years as global enterprises increase their pool of data scientists and the need for data science management, lineage, and collaboration increases over time. The growth of Cloudera’s data science business will track the enterprise adoption of data science teams that expand beyond two or three data scientists and beyond a single location. This is where Cloudera will both excel and be able to take advantage of the “data gravity” associated with its existing data.
Overall, Cloudera’s combined data science offering is headed towards “where the puck is headed,” to use the often-cited Wayne Gretzky quote. But this approach is slightly ahead of the current data science market, which means that the short-term prospects for Cloudera data science offerings will be based on the uptake of the Data Science Workbench by individual data scientists in the Cloudera ecosystem and research components based on the demand for Fast Forward Labs. Expect 2018 to be a consolidation year for Cloudera on this front where its data science and machine learning offerings will continue to merge together into converged packages of education, services, deployment, and workload management to provide a scalable approach for enterprise machine learning.
Recommendations based on Cloudera Analyst Conference
Cloudera is shifting from a data company to an analytics company. Enterprises that understand that their data archives and operational data are potential assets and not just archived liabilities should consider Cloudera’s capabilities as an analytic store both as an on-prem and a cloud-based solution.
Cloudera is now a starting point to consolidate data science teams as enterprise data science initiatives scale and operationalize over the next few years. Cloudera Data Science Workbench is a capable data science tool in its own right, but realistically, small data science teams will likely make independent decisions regarding their initial toolkits and portfolios.
Amalgam believes that Cloudera’s data science offerings provide their greatest value when data science becomes operationalized and enterprises seek to gain insight on all of their trusted data. As data science teams grow and need to consolidate their R, Python, Scala, and other code in a consistent and collaborative environment, Cloudera will be one of the few options available for developing a DevOps-like rigor around data science and it will likely be augmented with Fast Forward-based best practices and comprehensive tools for ongoing workload management.
Conclusion
At Cloudera’s Analyst Conference, Cloudera made its case as an analytics and machine learning provider based on its DNA as an enterprise data provider. Amalgam’s biggest takeaway is that Cloudera is taking a long-term approach to its product development with the assumption that both Big Data analytics and Machine Learning will become core capabilities in enterprise IT that require both a well-governed platform and enterprise-grade support. Cloudera is not positioning itself to compete directly with any particular machine learning startup, but rather as a comprehensive enterprise solution that could potentially partner with niche partners along the machine learning value chain. In this regard, Amalgam believes that Cloudera successfully presented its vision for the future and provided realistic guidance for what to expect from Cloudera in the near future.
If you would like more detail on Cloudera’s machine learning efforts and how particular aspects of Cloudera’s Data Science Workbench and Data Science Experience match up against enterprise competitors, please feel free to set up a free initial inquiry with Amalgam Insights at info@amalgaminsights.com
[…] All of these questions are part and parcel of the big one: How can companies tell what makes a good data science platform for their needs before investing time and money? Are traditional enterprise software vendors like IBM, Microsoft, SAP, SAS dependable in this space? What about companies like Alteryx, H2O.ai, KNIME, RapidMiner? Other popular platforms under consideration should also include Anaconda, Angoss (recently acquired by Datawatch), Domino, Databricks, Dataiku, MapR, Mathworks, Teradata, TIBCO. And then there’s new startups like Sentenai, focused on streaming sensor data, and slightly more established companies like Cloudera looking to expand from their existing offerings. […]