Posted on

Alteryx Acquires Trifacta: Considerations for DataOps, MLOps, & the Analytic Community

On February 7, 2022, Alteryx completed its acquisition of Trifacta, a data engineering company known for its promotion of “data wrangling” and in bringing to the forefront the challenge of cleansing data in making Big Data useful and supporting machine learning. Alteryx announced its intention to acquire on January 6th for $400 million with an additional $75 million dedicated to an employee retention pool.

Trifacta was founded in 2012 by Stanford Ph.D Sean Kandel, then-Stanford professor Jeffrey Heer, and Berkeley Professor Joe Hellerstein as a data preparation solution at a time when Big Data started to become a common enterprise technology. The company was formed based on Wrangler, a visualization of data transforms that tackled a fundamental problem of reducing the estimated 50-80% of worktime that data analysts and data scientists spent preparing data for analytical use.

Over the past decade, Trifacta raised $224 million with its last round being a $100 million round raised in September 2019. Trifacta quickly established itself as a top solution for data professionals seeking to cleanse data. In a report I wrote in 2015, one of my recommendations was “Consider Trifacta as a general data cleansing and transformation solution. Trifacta is best known for supporting both Hadoop and Big Data environments, including support for JSON, Avro, ORC, and Parquet.” (MarketShare Selects a Data Transformation Platform to Enhance Analyst Productivity, Blue Hill Research, February 2015)

Over the next seven years, Trifacta continued to advance as a data preparation and data engineering solution as it evolved to support major cloud platforms. During this time, three key trends emerged in the data preparation space starting in 2018.

First, data preparation companies focused on the major cloud platforms starting with Amazon Web Services, then Microsoft Azure and Google Cloud. This focus reflected the gravity of net-new analytic and AI data shifting from on-premises resources into the cloud and was a significant portion of Trifacta’s product development efforts over the past few years.

Second, data preparation firms started to be acquired by larger analytic and machine learning providers, such as Altair’s 2018 acquisition of Datawatch and DataRobot’s 2019 acquisition of Paxata. Trifacta was the last remaining market leading data preparation company left on the market for acquisition after having developed the data preparation and wrangling market.

Third, the task of data preparation evolved into a new role of data engineering as enterprises grew to understand that the structure, quality, and relationships of data had to be well defined to get the insights and directional guidance that Big Data had been presumed to hold. As this role became more established, data preparation solutions had to shift towards workflows defined by DataOps and data engineering best practices. It was no longer enough for data cleansing and preparation to be done, but for them to be part of governed process workflows and automation within a larger analytic ecosystem.

All this is to provide guidance on what to expect as Trifacta now joins Alteryx. Although Trifacta and Alteryx are both often grouped as “data preparation” solutions, their roles in data engineering are significantly different enough that I rarely see situations where both solutions are equally suited for a specific use case. Trifacta excels as a visual tool to support data preparation and transformation on the top cloud platforms while Alteryx has long been known for its support of low-code and no-code analytic workflows that help automate complex analytic transformations of data. Alteryx has developed leading products across process automation, the analytic blending in Designer, location-based analytics in Location, as well as machine learning support and Alteryx Server to support analytics at scale.

Although Alteryx provides data cleansing capabilities, its interface does not provide the same level of immediate visual feedback at scale that Trifacta provides, which is why organizations often use both Trifacta and Alteryx. With this acquisition, Trifacta can be used by technical audiences to identify, prepare, and cleanse data and develop highly trusted data sources so that line-of-business data analysts can spend less time finding data and more time providing guidance to the business at large.

Recommendations and Insights for the Data Community

Alteryx clients that consider using Trifacta should be aware that this will likely result in an increased number of analytically accessible data sources. More always sounds better, but this also means that from a practical perspective, your organization may require a short-term reassessment of the data sources, connections, and metrics that are being used for business analysis based on this new data preparation and engineering capability. In addition, this merger can be used as an opportunity to bring data engineering and data analyst communities closer together as they coordinate responsibilities for data cleansing and data source curation. Trifacta provides some additional scalability in this regard that can be leveraged by organizations that optimize their data preparation capabilities.

This acquisition will also accelerate Alteryx’s move to the cloud, as Trifacta provides both an entry point for accessing a variety of cloud data sources and a team of developers, engineers, and product managers with deep knowledge of the major cloud data platforms. Given that Trifacta was purchased for roughly 10% of Alteryx’ market capitalization, the value of moving to the cloud more quickly could potentially justify this acquisition all on its own as an acquihire.

Look at DataOps, analytic workflows, and MLOps as part of a continuum of data usage rather than a set of silos. Trifacta has its 12,000 customers with a mean average of four seats per customer focused on data preparation and engineering. With this acquisition, the Trifacta and Alteryx teams can work together more closely in aligning those four data engineers to the ~30 analytic users that Alteryx averages for each of its 7,000+ customers. The net result is an opportunity to bring DataOps, RPA, analytic workflows, and MLOps together into an integrated environment rather than the current set of silos that often prevent companies from understanding how data changes can affect analytic results.

It has been a pleasure seeing Trifacta become one of the few startups that successfully defines an emerging market of data prep and to coin a term “data wrangling” that was successful enough that it gained market acceptance both with users and with competitors. Many firms try to do this with little success, but Trifacta’s efforts represent the notable exception where its efforts will outlive its time as a standalone company. Trifacta leaves a legacy of establishing the importance of data quality, preparation, and transformation in the enterprise data environment in a world where raw data is imperfect, but necessary to support business guidance. And as Trifacta joins Alteryx, this combined ability to support data from its raw starting point to machine learning models and outputs across a hybrid cloud will continue to be a strong starting point for organizations seeking to provide employees with more control and choice over their analytic inputs and outputs.

If you are currently evaluating Alteryx or Trifacta and need additional guidance, please feel free to contact us at research@amalgaminsights.com to discuss your current selection process and how you are estimating the potential business value of your purchase.

Posted on

February 4: From BI to AI (Alteryx, Citrix, DataRobot, Informatica, Microsoft Azure, Onehouse, Pecan, Teradata, TIBCO, Yellowfin)

If you would like your announcement to be included in Amalgam Insights’ weekly data and analytics roundups, please email lynne@amalgaminsights.com.

Funding

Our Opportunity to Build Something Even Bigger: Series C Funding Announcement – Pecan AI

On February 3, Pecan, a low-code predictive analytics platform, raised $66M in Series C funding. Insight Partners led the round, with new investor GV also participating, as well as existing investors Dell Technologies Capital, GGV Capital, Mindset Ventures, S-Capital, and Vintage Investment Partners. The funding will be used to accelerate R+D and increase headcount.

Amalgam’s Insight: Pecan epitomizes the idea of helping companies move from BI to AI with its capability to help SQL-savvy data analysts to conduct data science. As a bridge technology between BI and AI, Pecan’s approach to providing predictive models for general use is a capability enterprises will need to pursue (whether with Pecan or another vendor) to empower their data analysts for the emerging era of machine learning that has been in progress for the last half-decade.

Onehouse Supercharges Data Lakes for AI and Machine Learning With $8 Million in Seed Funding From Greylock and Addition

On February 2, Onehouse, a lakehouse service built atop Apache Hudi to make data lakes faster, cheaper, and easier to access, emerged from stealth with $8M in seed funding. Investment firms Greylock and Addition co-led the funding round; the money will be used for R+D. Onehouse is fully managed and cloud-native, accelerating the speed at which data lakes can be set up. Amalgam Insights’ Hyoun Park is quoted in the press release announcing the launch of Onehouse.

Amalgam’s Insight: The lakehouse, an amalgamation of data lake and data warehouse, is an important construct for data architects seeking to unlock the value of the “Big Data” they have collected over the past decade. The overwhelming volume and variety of enterprise data makes a traditional data warehouse approach challenging to support for all relevant data. However, lakehouses are challenging to support and Onehouse’s approach of providing a managed service for lakehouses will be valuable for companies seeking to take this approach but lacking the personnel to access the analytic value of semi-structured data.

Acquisitions

Idera, Inc. Acquires Yellowfin International Pty Ltd

On January 28, Idera announced that they had acquired Yellowfin International, an embedded data analytics and BI platform. Yellowfin will join Idera’s Developer Tools business, expanding the capabilities of that suite in a new direction, enhancing the ability of Idera to cross-sell BI and analytics functionality to existing and new customers.

Amalgam’s Insight: Yellowfin has been a long-time favorite of Amalgam Insights with its market-leading visualization and user-focused data exploration capabilities combined with its extreme scalability. In joining Idera, Yellowfin now joins a suite of solutions that will enhance Yellowfin’s embedded business intelligence capabilities and provide developers with tools for more robust and user-friendly applications.

Teradata Announces Global Partnership with Microsoft

On February 2, Teradata announced a global partnership with Microsoft where it would more fully integrate the Teradata Vantage platform with Microsoft Azure. Though Teradata is already significantly integrated with over 60 existing Azure data services, this announcement signals a deepening of the existing relationship between the two companies.


Amalgam’s Insight: This partnership shows Microsoft Azure’s continued partnership with analytic and data companies that compete with other areas of Microsoft. For Teradata, this partnership helps current clients to migrate to an enterprise cloud that is developer-friendly while Microsoft gains more data as it competes against Amazon in the cloud infrastructure market.

Citrix to be Acquired by Vista Equity Partners and Evergreen Coast Capital for $16.5 Billion | TIBCO Software

On January 31, Vista Equity Partners and Evergreen Coast Capital Corporation announced that they would be acquiring Citrix, a digital workspace and application delivery platform, for $16.5B. As part of the transaction, Citrix will merge with TIBCO, which is currently owned by Vista, bringing together Citrix’s secure digital workspace and app delivery capabilities with TIBCO’s data and analytics under one roof, with the goal of accelerating Citrix’s SaaS transition while creating a company that serves 98% of the Fortune 500.

Amalgam’s Insight: We will be working on a deeper exploration of this acquisition, which at first glance mirrors Idera’s acquisition of Yellowfin in creating a larger enterprise application company with a variety of capabilities across data management, security, and IT management. Given that Vista Equity Partners acquired TIBCO in 2014 for $4.3 billion, this will provide to be a busy year for TIBCO in quickly integrating Citrix and presenting this combined company for an impending acquisition or IPO.

Updates and Launches

Alteryx introduces the newest version of the Alteryx Platform (2021.4)

Alteryx launched the latest version of the Alteryx Platform, 2021.4, on February 3. Key improvements include enhanced server APIs to allow for further administrative automation; the Named Entity Recognition text mining tool which automatically extracts data from images; new connectors for Anaplan, Google Drive, Outlook 365, and Automation Anywhere; and the Data Connection Manager, which will simplify sharing data sources across an organization.

Amalgam’s Insight: Alteryx’s market leadership as an analytic workflow platform is enhanced with this combination of connectors, data sharing, and automation capabilities. This version update comes at a time when Alteryx’s next stage of growth is dependent on supporting enterprise-wide use cases for analytic insight and providing the administrative governance necessary to quickly deploy these use cases.

Informatica Announces New PoD in UK to Support Growing Demand for Data Sovereignty | Informatica

On February 3, Informatica announced a new UK Point of Delivery for its Intelligent Data Management Cloud. Brexit has complicated the understanding and enforcement of data privacy and locality requirements, especially in regulated industries.

Amalgam’s Insight: Informatica’s debuting a geographically appropriate cloud to support organizations doing business in the UK helps said orgs respect relevant data-related laws and regulations. This delivery site will continue to be a trend in the data industry where global organizations will need to increase their investment in the UK or risk losing business to better-prepared competitors.

Hiring

Alteryx Announces Leadership Changes to Accelerate Next Phase of Cloud Growth | Alteryx

On February 1, Alteryx announced several personnel changes. Paula Hansen has been promoted to President and Chief Revenue Officer, while Keith Pearce has been named as the company’s new CMO. Previously, Pearce was the SVP of Corporate Marketing for Genesys. In addition, COO Scott Davidson will step down from his role as of mid-March.

Amalgam’s Insight: We covered the hiring of Paula Hansen in our May 2021 update. This promotion made sense as Alteryx has had a President/Chief Revenue Officer in the past. Keith Pearce has a strong record of solutions and vertical marketing across his career which fits Alteryx’ need to dig further into each vertical now that it has reached a critical mass of accounts. Alteryx’ challenge is no longer name recognition, but account development and education: two areas where Pearce has excelled in his past roles.

DataRobot Hires Google’s Debanjan Saha as Chief Operating Officer – DataRobot AI Cloud

On February 2, DataRobot welcomed Debanjan Saha as their new Chief Operating Officer. Saha was previously the VP and GM of Data Analytics at Google, overseeing analytics on Google Cloud and BigQuery; before that, Saha developed and launched the Amazon Aurora relational database at AWS.

Amalgam’s Insight: Saha has a long record of managing cutting-edge cloud solutions at IBM, Amazon, and Google across virtualization, database, and data management technologies. As DataRobot has quickly grown from a machine learning automation solution to a broader MLOps and engineering platform, Saha’s managerial background will be valuable in pushing DataRobot’s development and monetization of the end-to-end needs for enterprise machine learning.

Posted on

January 28: From BI to AI (anch.AI, Dataiku, DataRobot, Domino, Dremio, Firebolt, Informatica, Meta)

If you would like your announcement to be included in Amalgam Insights’ weekly data and analytics roundups, please email lynne@amalgaminsights.com.

Funding

Dremio Doubles Valuation to $2 Billion with $160M Investment Towards Reinventing SQL for Data Lakes – Dremio

On January 25, Dremio announced that it had raised $160M in a Series E funding round. This new round comes only a year after a $135M Series D round from last January 2021. Adams Street Partners led the funding round, joined by fellow new investors DTCP and StepStone Group. Existing investor participation came from Cisco Investments, Insight Partners, Lightspeed Venture Partners, Norwest Venture Partners, and Sapphire Ventures. The funding will go towards R+D, customer service, customer education and community-building, and contributions to open source initiatives. Amalgam Insights’ Hyoun Park was quoted in TechTarget on the Dremio investment: Dremio raises $160M for cloud data lake platform technology.

Firebolt Announces Series C Round at $1.4 billion Valuation

On January 26, Firebolt, a cloud data warehouse, announced a $100M Series C round. Aikeon Capital led the round, with participation from new investors Glynn Capital and Sozo Ventures, and existing investors Angular Ventures, Bessemer Venture Partners, Dawn Capital, K5 Global, TLV Partners, and Zeev Ventures. The funds will primarily go towards expanding the product and engineering teams. Firebolt also announced that Mosha Pasumansky would assume the CTO position, coming over from Google BigQuery, and that Firebolt would be opening a Seattle office.

anch.AI, former AI Sustainability Center, Secures $2.1M in Seed Funding to Launch Ethical AI Governance Platform

On January 26, anch.AI launched its ethical AI governance platform, and secured $2.1M in seed funding. Benhamou Global Ventures led the round, with participation from Terrain Invest, Frederik Andersson, Kent Janer, and Magnus Rausing. The funding will go towards further development of the platform.

Updates and Enhancements

Domino Data Lab Unveils Platform to Accelerate Model Velocity

On January 26, Domino Data Lab debuted Domino 5.0., a major new release of their MLOps platform. Key new capabilities include autoscaling clusters to give data science teams easier access to compute infra; data collectors that will allow teams to securely share and reuse common data access patterns; and integrated monitoring of models in production, with automated insights that compare production data to training data to assess and diagnose model drift. The latest version is available immediately to existing Domino customers, with a trial version available for new customers.

Dataiku Achieves ISO 27001 Certification | Dataiku

On January 27, Dataiku announced that they were now ISO 27001 certified, citing it as a “business imperative” to protect sensitive customer data from improper access and security breaches. ISO 27001 certification is a consideration for enterprises needing to not only prevent security breaches, but also ensure data is appropriately domiciled to comply with regulations like GDPR and CCPA.

DataRobot Launches MoreIntelligent.ai to Share Untold Stories on the Future of AI – DataRobot AI Cloud

DataRobot continues its AI education efforts with this week’s launch of MoreIntelligent.ai, an expansion of their More Intelligent Tomorrow podcast. Content will include research and analysis, prescriptive takeaways to inform AI practitioner action, and interviews with prominent AI leaders. The prominence DataRobot is giving More Intelligent works suggests that education about AI continues to be key to growing the AI market.

Introducing Meta’s Next-Gen AI Supercomputer | Meta

On January 24, Meta unveiled the AI Research SuperCluster, aiming to be the fastest supercomputer in the world when it’s completed in mid-2022. Meta plans to use the RSC to build stronger AI models which will analyze text, images, and video together in hundreds of languages, as a step on the path towards the metaverse.

Hiring

Informatica Appoints Jim Kruger as Chief Marketing Officer to Accelerate Cloud Growth | Informatica

On January 24, Informatica appointed Jim Kruger as the Chief Marketing Officer. Kruger was previously the CMO at Veeam Software, Intermedia, and Polycom, bringing years of experience in the CMO role as an experienced marketer who understands how to communicate around complex technologies.

Posted on

January 21: From BI to AI (DataRobot, Diversio, Domino, Prophecy, Vectice)

If you would like your announcement to be included in Amalgam Insights’ weekly data and analytics roundups, please email lynne@amalgaminsights.com.

Funding

Prophecy raises Series A to Industrialize Data Refining

Prophecy, a low-code data engineering platform, raised $25M in Series A funding this week. The round was led by Insight Partners, with other participants from existing investors Berkeley SkyDeck and SignalFire, and new investor Dig Ventures. The funding will go towards building out the platform, as well as investing in the go-to-market side. Prophecy seeks to standardize data refinement for use at scale, making the process more predictable and visible.

Vectice Announces $15.6M in Seed and Series A Funding

On January 18, Vectice, a data science knowledge capture company, announced it had raised a $12.6M series A round. The round was co-led by Crosslink Capital and Sorenson Ventures. Additional participants included Global Founders Capital (GFC), Silicon Valley Bank, and Spider Capital. Vectice will use the funds to further expand its team, and to onboard select accounts into their Beta program. Vectice automatically captures the assets that data science teams generate throughout a project, and generates documentation throughout the project lifecycle.

Diversio Announces Series A Funding

Also this week, Diversio, a diversity, equality, and inclusion platform, raised $6.5M in series A funding. Participants included Chandaria Family Holdings, First Round Capital, and Golden Ventures. Plans for the funding include expanding the sales and client success teams, accelerating product development, and amplifying marketing efforts. Diversio combines analytics, AI, and subject matter expertise to understand where DEI efforts at organizations are getting derailed, and offer action plans for setting and meeting DEI goals.

Updates

DataRobot’s State of AI Bias Report Reveals 81% of Technology Leaders Want Government Regulation of AI Bias – DataRobot AI Cloud

On January 18, DataRobot released its State of AI Bias Report, assessing how AI bias can impact organizations, along with ways to mitigate said bias. Common challenges organizations face include the inability to understand the reasons for a specific AI decision, or the correlation between inputs and outputs, along with the difficulty of developing trustworthy algorithms and determining what data is used to train a given model. All of these challenges have led to some combination of lost revenue, customers, and employees, along with legal fees and reputation damage to the company; organizations are seeking guidance to avoid these issues.

Events

Domino Data Lab Hosts January 26 Virtual Event: Unleashing Exceptional Performance with Data Science

On Wednesday, January 26, Domino Data Lab will host a free one-hour virtual event: “Unleashing Exceptional Performance,” focusing on data science. Featured speakers include surgeon and author Dr. Atul Gawande, and Janssen Research and Development’s Chief Data Science Officer and Global Head of Strategy and Operations Dr. Najat Khan. There will be two sessions to accommodate various timezones, one at 1300 GMT and one at 11 am PT/2 pm ET. To register for the event, please visit the Domino event registration site.

Posted on

Taking a More Analytic Approach to Wordle

Taking a More Analytic Approach to Wordle

The hottest online game of January 2022 is Wordle, a deceptively addictive online game where one tries to guess a five-letter word starting from scratch. Perhaps you’ve started seeing a lot of posts that look like this:

In the unlikely case you haven’t tried Wordle out yet, let me help enable you with this link: https://www.powerlanguage.co.uk/wordle/

OK, that said, the rules of this game are fairly simple: you have six chances to guess the word of the day. This game, created by software developer Josh Wardle, was adorably created as a game for his partner to enjoy. But its simplicity has made it a welcome online escape in the New Year. The website isn’t trying to sell you anything. It isn’t designed to “go viral.” All it does is ask you to guess a word.

But for those who have played the game, the question quickly comes up on how to play this game better. Are there quantitative tricks that can be used to make our Wordle attempts more efficient? How do we avoid that stressful sixth try where the attempt is “do or die?”

For the purposes of this blog, we will not be going directly into any direct Wordle sources because what fun would that be?

Here’s a few tips for Wordle based on some basic analytic data problem solving strategies.

Step 1: identify the relevant universe of data

One way to model an initial guess is to think about the distribution of letters in the English language. Any fan of the popular game show “Wheel of Fortune” has learned to identify R, S, T, L, N, and E as frequently used letters. But how common are those letters?

One analysis of the Oxford English Dictionary done by Lexico.com shows that the relative frequency of letters in the English language is as follows:

LetterFrequencyLetterFrequency
A8.50%N6.65%
B2.07%O7.16%
C4.54%P3.17%
D3.38%Q0.20%
E11.16%R7.58%
F1.81%S5.74%
G2.47%T6.95%
H3.00%U3.63%
I7.54%V1.01%
J0.20%W1.29%
K1.10%X0.29%
L5.49%Y1.78%
M3.01%Z0.27%

This is probably a good enough starting point. Or is it?

Step 2: Augment or improve data, if possible

Stanford GraphBase has a repository of 5757 five letter words used as a starting point for analysis. We know this isn’t exactly the Wordle word bank, as the New York Times wrote an article describing how Wardle and his partner Palak Shah whittled down the word bank to a 2,500 word pool. We can use this to come up with a more specific distribution of letters. So, how does that differ?

Surprisingly, there’s enough of a difference that we need to decide on which option to use. We know that a lot of plural worlds end in s, for instance, which is reflected in the Stanford data. If I were doing this for work, I would look at all of the s-ending words and determine which of those were plural, then cleanse that data since I assume Wordle does not have duplicate plurals. But since Wordle is not a mission-critical project, I’ll stick with using the Stanford data as it has a number of other useful insights.

Step 3: Identify the probable outcomes

So, what are the chances that a specific letter will show up in each word? Wordle isn’t just about the combination of potential letters that can be translated into words. In a theoretical sense, there are 26^5 potential combinations of words that exist or 11,881,376 words. But in reality, we know that AAAAA and ZZZZZ are not words.

Here’s a quick breakdown of how often each letter shows up in each position in the Stanford five-letter data along with a few highlights of letter positions that stand out as being especially common or especially rare.

The 30.64% of words ending in “s” are overwhelmingly plural nouns or singular verbs which leads to the big question of whether one believes that “s-ending” words are in Wordle or not. If they are, this chart works well. If not, we can use the Oxford estimate instead, which will give us less granular information.

1 – (1-[probability])^5

But with the Stanford data, we can do one better and look both at the possibility of each letter in each position as well as to get an idea of the overall odds that a letter might be used by looking at

  1. – [(1 – (First)) * (1 – (Second)) * (1 – (Third)) * (1 – (Forth)) * (1 – (Fifth))]

To figure out the chances that a letter will be used. And we come to the following table and chart.

I highlighted the three letters most likely to show up. I didn’t show off the next tier only because I was trying to highlight what stood out most. In general, I try to highlight the top 10% of data that stands out just because I assume that more than that means that nothing really stands out. My big caveat here is that I’m not a visual person and have always loved data tables more than any type of visualization, but I realize that is not common.

Step 4: Adjust analysis based on updated conditions

As we gain a better understanding of our Wordle environment, the game provides clues on which letters are associated with the word in question. Letters that are in the word of the day but are not in the right position are highlighted in yellow. Based on the probabilities we have, we can now adjust our assumptions. For instance, let’s look at the letter “a”

If we are looking at a word that has the letter “a”, but we know it is not in the first position, we know now we’ve cut down the percentage of words we’re thinking of by about 10%. We can also see that if that “a” isn’t in the second position, it’s probably in the third position.

Step 5: Provide results that will lead to making a decision

Based on the numbers, we can now guess that there’s a 50% chance that “a” is in the second position as 16% of five-letter words have an “a” out of the 31.57% of words that have an “a” but not in the first position. That is just one small example of the level of detail that can be made based on the numbers. But if I am providing this information with the goal of helping with guidance, I am probably not going to provide these tables as a starting point. Rather, I would start by providing guidance on what action to take. The starting point would likely be something like:

The letters used more than 20% of the time in five-letter words are the vowels a, e, i, and o and the consonants l, n, r, s, & t, much as one would expect from watching Wheel of Fortune. Top words to start with based on this criteria include “arise,” “laser,” and “rates.”

In contrast, if one wishes to make the game more challenging, one should start with words that are unlikely to provide an initial advantage. Words such as “fuzzy” and “jumpy” are relatively poor starting points from a statistical perspective.

Conclusion

First, this common approach to data definitely showed me a lot about Wordle that I wouldn’t have known otherwise. I hope this approach helps you both in thinking about your own Wordle approach and to further explore the process of Wordle and other data. And it all started with some basic steps:

So, having done all this analysis, how much do analytics help the Wordle experience? One of the things that I find most amazing about the process of playing Wordle is how our brains approximate the calculations made here from a pattern recognition perspective that reflects our use of language. Much as our brain is effectively solving the parallax formula every time we catch a ball thrown in the air, our brains also intuitively make many of these probabilistic estimates based on our vocabulary every time we play a game of Wordle.

I think that analytic approaches like this help to demonstrate the types of “hidden” calculations that often are involved in the “gut reactions” that people make in their decision-making. Gut reactions and analytic reactions have often been portrayed as binary opposites in the business world, but gut reactions can also be the amalgamation of intelligence, knowledge, past experiences, and intuitive feelings all combined to provide a decision that can be superior or more innovative in comparison to pure analytic decisions. Analytics are an important part of all decision-making, but it is important not to discount the human component of judgment in the decision-making process.

And as far as Wordle goes, I think it is fun to try the optimized version of Wordle a few times to see how it contrasts with your standard process. On the flip side, this data also provides guidance on how to make Wordle harder by using words that are less likely to be helpful. But ultimately, Wordle is a way for you to have fun and analytics is best used to help you have more fun and not to just turn Wordle into an engineering exercise. Happy word building and good luck!

Posted on

Observable raises a $35 million B round for data collaboration

On January 13, 2022, Observable raised a $35.6 million Series round led by Menlo Ventures with participation from existing investors Sequoia Capital and Acrew Capital. This round increases the total amount raised by Observable to $46.1 million. Observable is interesting to the enterprise analytics community because it provides a platform to help data users to collaborate throughout the data workflow of data discovery, analysis, and visualization.

Traditionally, data discovery, contextualization, analytics, and visualization can potentially be supported by different solutions within an organization. This complexity is multiplied by the variety of data sources and platforms that have to be supported and the number of people who need to be involved at each stage which leads to an unwieldy number of handoffs, the potential issue of using the wrong tool for the wrong job, and an extended development process that results from the inability for multiple people to simultaneously work on creating a better version of the truth. Observable provides a single solution to help data users to connect, analyze, and display data along with a library of data visualizations that help provide guidance on potentially new ways to present data.

From a business perspective, one of the biggest challenges of business intelligence and analytics has traditionally been the inability to engage relevant stakeholders to share and contextualize data for business decisions. The 2020s are going to be a decade of consolidation for analytics where enterprises have to make thousands of data sources available and contextualized. Businesses have to bridge the gaps between business intelligence and artificial intelligence, which are mainly associated with the human aspects of data: departmental and vertical context, categorization, decision intelligence, and merging business logic with analytic workflows.

This is where the opportunity lies for Observable in allowing the smartest people across all aspects of the business to translate, annotate, and augment a breadth of data sources into directional and contextualized decisions while using the head start of visualizations and analytic processes that have been shared by a community of over five million users. And then by allowing users to share these insights across all relevant applications and websites, these insights can drive decisions in all relevant places by bringing insights to the users.

Observable goes to market with a freemium model that allows companies to try out Observable for free and then to add editors at tiers of $12/user/month and $40/user/month (pricing as of January 13, 2022). This level of pricing makes Observable relatively easy to try out.

Amalgam Insights currently recommends Observable for enterprises and organizations with three or more data analysts, data scientists, and developers who are collaboratively working on complex data workflows that lead to production-grade visualization. Although it can be more generally used for building analytic workflows collaboratively, Observable provides one of the most seamless and connected collaborative experiences for creating and managing complex visualizations that Amalgam Insights has seen.

Posted on

January 7: From BI to AI (Alteryx, Databricks, Fractal, Meta, Qlik, Trifacta, WEKA)

If you would like your announcement to be included in Amalgam Insights’ weekly data and analytics roundups, please email lynne@amalgaminsights.com.

Acquisitions and Partnerships

Alteryx Announces Acquisition of Trifacta

Yesterday, January 6, Alteryx announced that it has acquired Trifacta for $400M in a cash offer. Trifacta and Alteryx have historically been viewed as competitors, but Trifacta’s greater depth of capability re data engineering and cleansing complements Alteryx’ strengths in analytic workflows.

Product Launches and Updates

AI that understands speech by looking as well as hearing

Today, January 7, Meta debuted Audio-Visual Hidden Unit BERT (AV-HuBERT), a self-supervised framework for understanding speech that combines video input from lip movements and audio input from speech, both as raw unlabeled data. The goal is to improve accuracy even in environments where audio input may be compromised, such as from loud background noise.

Financial Transactions

Qlik Announces Confidential Submission of Draft Registration Statement Related to Proposed Public Offering

On Thursday, January 6, Qlik announced that it had confidentially submitted its draft regulation statement related to a proposed IPO. The expected IPO comes over five years after private equity investment firm Thoma Bravo purchased Qlik and took them private.

Fractal announces US$ 360 million investment from TPG

On Wednesday, January 5, Fractal, an AI and advanced analytics provider, announced that TPG, a global asset firm, will be investing $360M in Fractal. Puneet Bhatia and Vivek Mohan of TPG will join Fractal’s board of directors as part of the deal.

WEKA Increases Funding to $140 Million to Accelerate AI Data Platform Adoption in the Enterprise

WEKA, a data storage platform, announced on Tuesday, January 4, that they have raised $73M in a Series C funding round, raising total funding to $140M. The oversubscribed round was led by Hitachi Ventures. Other participants include Cisco, Hewlett Packard Enterprise, Ibex Investors, Key 1 Capital, Micron, MoreTech Ventures, and NVIDIA. The funding will go towards accelerating go-to-market activities, operations, and engineering.

Hiring

Databricks Appoints Naveen Zutshi as Chief Information Officer

Finally, Wednesday, January 5, Databricks announced that it had appointed Naveen Zutshi as their new Chief Information Officer. Zutshi joins Databricks from Palo Alto Networks, where he was the CIO for six years, expanding Palo Alto Networks into new security categories and scaling up at speed. Prior to that, Zutshi was the SVP of Technology at Gap Inc, overseeing global infrastructure, ops, and security for the retailer.

Posted on

November 12: From BI to AI (Domino, H2O.ai, IBM, Informatica, Tableau)

Product Launches and Enhancements

IBM to Add New Natural Language Processing Enhancements to Watson Discovery

On November 10, IBM revealed new natural language processing enhancements planned for IBM Watson Discovery. Business users will be able to train Watson Discovery to surface insights more quickly on a corpus of industry-specific documents without needing traditional data science skills. Specific capability enhancements include pre-trained document structure understanding, automatic text pattern detection, and a custom entity extractor feature that will help identify industry-specific words and phrases with specific contexts. The announced enhancements are forthcoming, though IBM did not announce a target release date.

Informatica Announces Cloud Data Marketplace

On November 11, Informatica debuted their Cloud Data Marketplace. The Cloud Data Marketplace will allow Informatica business users to “shop” for both datasets and AI and analytics models, surfacing existing assets to encourage reuse of more-vetted resources rather than duplicating efforts by re-gathering data or building a model from scratch. Informatica Cloud Data Marketplace is available today with consumption-based pricing on Informatica’s Intelligent Data Management Cloud.

Tableau Outlines Product Vision and the Future of Analytics at Tableau Conference 2021

On November 9, at Tableau Conference 2021, Tableau announced a host of innovations for the Tableau platform and ecosystem, focused on bringing analytic capabilities to the workflows and environments workers already use. Highlights include Model Builder, a new feature in Tableau Business Science that allows Tableau users to build predictive models using Einstein Discovery; and Scenario Planning, another new Tableau Business Science feature to compare scenarios and “what-ifs,” supported by Einstein AI.

Partnerships

Domino Data Lab Expands Collaboration with NVIDIA and TCS with New Enterprise MLOps Solutions for Modern IT Stacks

On November 9, Domino Data Lab announced a fully-managed offering with solutions partner Tata Consultancy Services that allows Domino customers to run high-performance computing and data science workloads on NVIDIA DGX systems, hosted in the TCS Enterprise Cloud. This marks the next step in a deepening relationship between Domino and NVIDIA, with the Domino integration into the NVIDIA AI Enterprise suite on the horizon.

Funding

H2O.ai Closes $100 Million in Funding Led by Customer Commonwealth Bank of Australia

On November 8, H2O.ai closed $100M in Series E funding. The round was led by customer Commonwealth Bank of Australia, with participation by existing investors Crane Venture Partners and Goldman Sachs Asset Management and new investor Pivot Investment Partners. The funding will be used to scale up partnerships, sales, marketing, and customer success at a global level.

Posted on

August 20: From BI to AI (Adapdix, Apollo GraphQL, Cloudera, Databricks, Edge Intelligence, Monte Carlo, SnapLogic, TigerGraph)

If you would like your announcement to be included in Amalgam Insights’ weekly data and analytics roundups, please email lynne@amalgaminsights.com.

Product Launches and Updates

Cloudera Introduces Cloudera DataFlow for the Public Cloud

On August 16, Cloudera launched Cloudera DataFlow for the Public Cloud to better manage customer data flows. When too many data flows are deployed into a single cluster, performance often falters, yet choosing larger infrastructure footprints “just in case” is expensive. Cloudera DataFlow was created to automate and manage complex cloud-native data flow operations, automatically scale up and down said streaming data flows more efficiently, and cut customers’ cloud costs. Cloudera DataFlow is generally available on AWS now.

Latest Release of the SnapLogic Platform: Self-service Integration and Automation

On August 17, SnapLogic announced its August 2021 product release, introducing no-code SnapLogic Flows for business users, ELT support to Databricks’ Delta Lake, and zero downtime upgrades, along with updating its API lifecycle and development portal. SnapLogic Flows will enable business users to construct data flows and apps to integrate into popular business software such as Salesforce without needing to know how to code, while allowing IT to provide guiderails and requirements to oversee said apps. New features in SnapLogic API lifecycle management include the abilities to maintain, improve, unpublish, deprecate, and retire APIs, ensuring that older versions aren’t used in error.

Funding

Apollo GraphQL Announces $130 Million Series D Investment to Power the Future of Graph and Application Development

On August 17, Apollo GraphQL announced a $130M Series D funding round. Insight Partners led the round, with participation from existing funders Andreessen Horowitz, Matrix Partners, and Trinity Ventures, and new investor Next47. The funding will be used on continuing R+D of open source graph technology to make app development faster and more accessible.

Monte Carlo Raises Series C, Brings Funding to $101M to Help Companies Trust Their Data

On August 17, Monte Carlo, a data reliability company, announced a $60M Series C funding round, led by ICONIQ Growth. Salesforce Ventures, along with existing investors Accel, GGV Capital, and Redpoint Ventures, all participated. Monte Carlo will use the funds to expand its product offerings, support more use cases, and open up to new markets.

Acquisitions

Adapdix acquires Edge Intelligence to bring data and AI closer together

Adapdix, an edge AI/ML platform, announced the acquisition of Edge Intelligence, a data management platform, on August 16. Edge Intelligence will improve Adapdix’ existing EdgeOps Data Mesh with better data management capabilities, and allow Adapdix to expand its existing offerings in edge automation.

Hiring

Fermín Serna Joins Databricks as Chief Security Officer

On August 19, Databricks announced that they had appointed Fermín Serna as the company’s new Chief Security Officer. Serna is coming over from Citrix, where he was the Chief Information Security Officer; before this, Serna was the Head of Product Security at Google. At Databricks, Serna will lead the network, platform and user security programs, as well as governance and compliance efforts.

TigerGraph Adds Industry Leader and Trailblazer to its Executive Team; Announces Fall Graph + AI Summits

On August 19, TigerGraph, a graph analytics platform, announced that they had hired Dr. Jay Yu as Vice President of Product Innovation, and as GM at the San Diego Innovation Center for TigerGraph. Dr. Yu comes to TigerGraph from 18 years at Intuit, where he led the Financial Knowledge Graph project and encouraged graph technology adoption in large commercial cases. TigerGraph also announced the Graph + AI Summit for this fall on two dates, October 5 in San Francisco and October 19th in New York. Both hybrid events will be livestreamed to virtual attendees, as well as including in-person attendance.

Posted on 1 Comment

August 13: From BI to AI (DataRobot, Mindtech, NodeGraph, Oracle, Qlik, Snorkel AI, Talend)

Funding

Snorkel AI Raises $85 Million at $1 Billion Valuation for Data-Centric AI

On August 9, Snorkel AI, a programmatic data labeling platform, snagged an $85M Series C round at a $1B valuation. Addition and various BlackRock funds and accounts led the round, with participation from previous investors Greylock, GV, Lightspeed Venture Partners, Nepenthe Capital, and Walden. The funding will go towards scaling Snorkel AI’s engineering team and growing its go-to-market team for global sales.

Product Launches and Updates

Mindtech Chameleon 21.1

On August 11, Mindtech announced updates to Chameleon, their synthetic image creation and curation platform for training visual AI systems. Data scientists and machine learning engineers will be able to create the exact annotated images they need to train their visual AI models. Key new features and enhancements include Simulator, which uses real-world behavior modeling to create synthetic data sets, and Curation Manager, which performs visual analysis of synthetic and real datasets to identify diversity and bias. Chameleon 21.1 is available for immediate licensing.

Oracle Announces MySQL Autopilot for MySQL HeatWave Service

On August 10, Oracle announced MySQL Autopilot, a new component of Oracle’s MySQL HeatWave service. Autopilot automates HeatWave, a MySQL query acceleration engine in the Oracle cloud, by building machine learning models to help it learn how to perform optimally. Oracle also debuted MySQL Scale-out Data Management at the same time to improve the performance of reloading data into HeatWave by 100x.

Talend Announces Latest Innovations to Support Journey to Healthier Data

On August 11, Talend announced updates to Talend Data Fabric, its data integration and governance platform. Key innovations include native integration with Databricks 7.3 and AWS EMR 6.2 on Apache Spark 3 to enable faster advanced analytics at scale, private connectivity between Talend and AWS or Azure to support HIPAA and PCI compliance, and adding read/write capabilities to a campaign directly from a data pipeline.

Acquisitions

Qlik Acquires NodeGraph To Enhance End-to-End Analytics Data Pipelines With Interactive Data Lineage and Drive ‘Explainable BI’

On August 12, Qlik acquired NodeGraph, a metadata management platform. NodeGraph’s interactive data lineage function will contribute to Qlik’s “explainable BI” capabilities, while the governance aspects will enhance the Qlik data fabric, and NodeGraph’s impact analysis capabilities will expand Qlik’s SaaS offerings.

Hiring

Customer-Focused C-Suite Appointments Bolster DataRobot’s Executive Leadership Team

On August 12, DataRobot welcomed three new appointments to their C-Suite. Jay Schuren moves up as DataRobot’s first Chief Data Science Officer, having come over in 2017 with the Nutonian acquisition. Sirisha Kadamalakalva joined DataRobot as their first Chief Strategy Officer from Bank of America, where she was the Managing Director and Global Head of AI/ML, Analytics, and CRM Software Investment Banking. Steve Jenner came over from Zscaler, where he was the Vice President of Worldwide Sales Engineering.