With Cloudera’s S-1, Hadoop and Big Data Finally Come of Age

On Friday, March 31st, Cloudera filed its S-1 with intention to IPO. The timing looks good considering the recent successful IPOs of Alteryx, Mulesoft, and Snap. But how does Cloudera actually match up with other tech companies in terms of being successful in the short and medium term?

Cloudera’s S-1 filing starts by describing the near-term growth potential of the Internet of Things and IDC’s estimate of 30 billion internet-connected mobile devices in 2020. Every analyst and consulting firm has some idea of whether this is going to be 20 billion, 30 billion, or 40 billion, but the most important aspects of this growth are that:

  • this growth is happening exponentially
  • only about 10% of these devices will have some sort of cellular connection, with the rest being dependent on other forms of connectivitiy such as Wi-Fi, Bluetooth, or RFID
  • this expansion is leading to a multiple orders of magnitude increase in transactional data traffic

Cloudera also seeks to position itself as part of three IDC-defined markets expected to grow rapidly:

(i) Dynamic Data Management Systems;

(ii) Cognitive/AI Systems and Content Analytics Software; and

(iii) Advanced and Predictive Analytics Software

For those of us who have covered Cloudera over the years, the combination of these three markets seems like a bit of a stretch as Cloudera excels as an Enterprise Hadoop solution for data and content storage and has built out over $260 million in annual revenue, albeit at a current net annual loss of $187 million. However, technology analysts do not typically categorize Cloudera as a market-leading artificial intelligence or predictive analytics solution. For these future-facing technologies, Cloudera certainly holds a lot of the data that will be analyzed, but Cloudera is not a standalone solution for machine learning.

I would guess that Cloudera would argue that the analytic tools needed for machine learning and advanced analytics are open source and readily available, making data gravity the true competitive differentiator for enterprise machine learning. However, given the enterprise’s relative comfort with archiving and data ingestion at scale vs. setting up machine learning at scale, I believe that the real challenge still currently lies in creating the advanced analytic models and machine learning capabilities that can unlock the massive scale of current enterprise data environments.

Despite this negativity, I don’t mean to denigrate Cloudera. This company has been instrumental in defining Big Data and currently has 1,470 employees focused on being the Enterprise Big Data platform. Over the past couple of years, Cloudera’s subscription revenue has steadily risen from 67% in 2014 to 72% in 2015 to 77% in 2016 of its revenue while growing revenues overall by over 50% per year. Cloudera’s lack of profitability is due to a common trait of subscription startups: sales and marketing expenses that were roughly 100% of total revenue in 2014 and 2015 and were at 78% of total revenues in 2016.

Overall, this looks like another IPO out of the Box playbook. Despite some uncertainty, Box has been a mostly-stable stock in terms of providing guidance and in growing at market-expected pace. Given that Cloudera will continue to be a market leader for enterprise data in the near future, investors should expect Cloudera to be a reliable holding.

However, the real potential for Cloudera is to step into the machine learning ring more actively, either through services or investing in a technology toolkit that would allow Cloudera customers to quickly adopt machine learning capabilities that are aligned to line-of-business use cases such as sales, marketing, and supply chain. This is a roadmap that analytic and enterprise application market leaders are providing in 2017 and Cloudera has the foundational technology to move up a level in enterprise value from data platform to context platform. As this occurs, Cloudera will start unlocking the full asset value of the data that it has been collecting over the past nine years and fulfilling the expectations that “Big Data” and Hadoop have had in the enterprise over the past decade.