Data Prep – Amalgam Insights

On February 7, 2022, Alteryx completed its acquisition of Trifacta, a data engineering company known for its promotion of “data wrangling” and in bringing to the forefront the challenge of cleansing data in making Big Data useful and supporting machine learning. Alteryx announced its intention to acquire on January 6th for $400 million with an additional $75 million dedicated to an employee retention pool.

Trifacta was founded in 2012 by Stanford Ph.D Sean Kandel, then-Stanford professor Jeffrey Heer, and Berkeley Professor Joe Hellerstein as a data preparation solution at a time when Big Data started to become a common enterprise technology. The company was formed based on Wrangler, a visualization of data transforms that tackled a fundamental problem of reducing the estimated 50-80% of worktime that data analysts and data scientists spent preparing data for analytical use.

Over the past decade, Trifacta raised $224 million with its last round being a $100 million round raised in September 2019. Trifacta quickly established itself as a top solution for data professionals seeking to cleanse data. In a report I wrote in 2015, one of my recommendations was “Consider Trifacta as a general data cleansing and transformation solution. Trifacta is best known for supporting both Hadoop and Big Data environments, including support for JSON, Avro, ORC, and Parquet.” (MarketShare Selects a Data Transformation Platform to Enhance Analyst Productivity, Blue Hill Research, February 2015)

Over the next seven years, Trifacta continued to advance as a data preparation and data engineering solution as it evolved to support major cloud platforms. During this time, three key trends emerged in the data preparation space starting in 2018.

First, data preparation companies focused on the major cloud platforms starting with Amazon Web Services, then Microsoft Azure and Google Cloud. This focus reflected the gravity of net-new analytic and AI data shifting from on-premises resources into the cloud and was a significant portion of Trifacta’s product development efforts over the past few years.

Second, data preparation firms started to be acquired by larger analytic and machine learning providers, such as Altair’s 2018 acquisition of Datawatch and DataRobot’s 2019 acquisition of Paxata. Trifacta was the last remaining market leading data preparation company left on the market for acquisition after having developed the data preparation and wrangling market.

Third, the task of data preparation evolved into a new role of data engineering as enterprises grew to understand that the structure, quality, and relationships of data had to be well defined to get the insights and directional guidance that Big Data had been presumed to hold. As this role became more established, data preparation solutions had to shift towards workflows defined by DataOps and data engineering best practices. It was no longer enough for data cleansing and preparation to be done, but for them to be part of governed process workflows and automation within a larger analytic ecosystem.

All this is to provide guidance on what to expect as Trifacta now joins Alteryx. Although Trifacta and Alteryx are both often grouped as “data preparation” solutions, their roles in data engineering are significantly different enough that I rarely see situations where both solutions are equally suited for a specific use case. Trifacta excels as a visual tool to support data preparation and transformation on the top cloud platforms while Alteryx has long been known for its support of low-code and no-code analytic workflows that help automate complex analytic transformations of data. Alteryx has developed leading products across process automation, the analytic blending in Designer, location-based analytics in Location, as well as machine learning support and Alteryx Server to support analytics at scale.

Although Alteryx provides data cleansing capabilities, its interface does not provide the same level of immediate visual feedback at scale that Trifacta provides, which is why organizations often use both Trifacta and Alteryx. With this acquisition, Trifacta can be used by technical audiences to identify, prepare, and cleanse data and develop highly trusted data sources so that line-of-business data analysts can spend less time finding data and more time providing guidance to the business at large.

Recommendations and Insights for the Data Community

Alteryx clients that consider using Trifacta should be aware that this will likely result in an increased number of analytically accessible data sources. More always sounds better, but this also means that from a practical perspective, your organization may require a short-term reassessment of the data sources, connections, and metrics that are being used for business analysis based on this new data preparation and engineering capability. In addition, this merger can be used as an opportunity to bring data engineering and data analyst communities closer together as they coordinate responsibilities for data cleansing and data source curation. Trifacta provides some additional scalability in this regard that can be leveraged by organizations that optimize their data preparation capabilities.

This acquisition will also accelerate Alteryx’s move to the cloud, as Trifacta provides both an entry point for accessing a variety of cloud data sources and a team of developers, engineers, and product managers with deep knowledge of the major cloud data platforms. Given that Trifacta was purchased for roughly 10% of Alteryx’ market capitalization, the value of moving to the cloud more quickly could potentially justify this acquisition all on its own as an acquihire.

Look at DataOps, analytic workflows, and MLOps as part of a continuum of data usage rather than a set of silos. Trifacta has its 12,000 customers with a mean average of four seats per customer focused on data preparation and engineering. With this acquisition, the Trifacta and Alteryx teams can work together more closely in aligning those four data engineers to the ~30 analytic users that Alteryx averages for each of its 7,000+ customers. The net result is an opportunity to bring DataOps, RPA, analytic workflows, and MLOps together into an integrated environment rather than the current set of silos that often prevent companies from understanding how data changes can affect analytic results.

It has been a pleasure seeing Trifacta become one of the few startups that successfully defines an emerging market of data prep and to coin a term “data wrangling” that was successful enough that it gained market acceptance both with users and with competitors. Many firms try to do this with little success, but Trifacta’s efforts represent the notable exception where its efforts will outlive its time as a standalone company. Trifacta leaves a legacy of establishing the importance of data quality, preparation, and transformation in the enterprise data environment in a world where raw data is imperfect, but necessary to support business guidance. And as Trifacta joins Alteryx, this combined ability to support data from its raw starting point to machine learning models and outputs across a hybrid cloud will continue to be a strong starting point for organizations seeking to provide employees with more control and choice over their analytic inputs and outputs.

If you are currently evaluating Alteryx or Trifacta and need additional guidance, please feel free to contact us at research@amalgaminsights.com to discuss your current selection process and how you are estimating the potential business value of your purchase.

Tag: Data Prep

Alteryx Acquires Trifacta: Considerations for DataOps, MLOps, & the Analytic Community

Recommendations and Insights for the Data Community

Recent Posts

Categories