Posted on

February 18: From BI to AI (Arcion, Atakama, BigID, Dataiku, H20.ai, HPE, Kyligence, Informatica, Striim, Sway AI)

If you would like your announcement to be included in Amalgam Insights’ weekly data and analytics roundups, please email lynne@amalgaminsights.com.

Funding and Financial

Arcion, Formerly Blitzz, Secures $13M Series A To Transform Data Infrastructure

Arcion, a data mobility platform, secured a series A funding round of $13M. Bessemer Venture Partners led the round, with participation from Databricks. Bessemer VP Sakib Dadi joined the board of directors as part of the process. The funding is likely to go towards building both the product and team. Arcion also brought on board CEO Gary Hagmueller from an EIR position at Redpoint Ventures. Prior to that, Hagmueller was the CEO at Dgraph Labs, the President and CEO at Clara Analytics, and spent most of a decade at Ayasdi as first CFO, then as COO.

Amalgam’s Insight: Cloud data pipelines continue to be both an important enabler for conducting analytics at scale and a capability that is still difficult to operationalize and automate. This funding round is part of a continuing wave of investment focused on improving access to large stores of data. In addition, Hagmueller’s experience in productizing previously unmanageable data challenges is a noted strength that will prove to be useful for Arcion.

Informatica Reports Fourth Quarter and Full-Year 2021 Financial Results

Enterprise cloud data management company Informatica reported its Q4 and 2021 financial results this week. While revenue was up, particularly subscription revenue – a known target as they move customers to the cloud and onto subscription plans – Informatica missed on earnings, and had a rough ride in the stock market this week.

Amalgam’s Insight: Tech vendors often go private as they are converting to the cloud exactly because public markets overreact to very tactical margin changes and struggle to value long-term business changes correctly. This happens time and time again where financial markets misread vendors, then suddenly wake up to see the results when a vendor beats expectations. There is no real fundamental difference between Informatica currently at $20 per share vs. when it was at $38 per share two months ago other than external geopolitical and economic forces as well as that Informatica is farther along in its progression towards migrating clients towards the cloud. At this point, Informatica is currently valued similarly to Fivetran which is an interesting comparison when one looks at the revenue of these two companies.

Product Launches and Enhancements

H2O.ai Democratizes Deep Learning with H2O Hydrogen Torch

H2O.ai, an AI cloud company, debuted H2O Hydrogen Torch, a no-code deep learning training engine. Hydrogen Torch is focused on making image, video, and natural language processing models with deep learning via a code-free interface, dealing with unstructured data that companies have not generally been able to analyze sufficiently to derive value from.

Amalgam’s Insight: Enterprises have the ability to analyze images and videos to support a wide variety of customer service, logistics, sales, and marketing use cases but still struggle to build models off these quirky and unstructured data sources that are often larger than the entirety of the structured data that is currently being managed. H2O Hydrogen Torch should prove to be a valuable solution for companies seeking to translate media libraries into business value.

Kyligence’s Intelligent Data Cloud Platform Now Available on Google Cloud

Kyligence, a data services and management platform provider, announced that its data cloud platform Kyligence Cloud was now available in beta on Google Cloud. Kyligence Cloud is a big data OLAP solution, providing high-performance analytic capability in a cloud-native environment while allowing analysts and business users to connect to it with familiar tools.

Amalgam’s Insight: Moving structured and performant data into the cloud continues to be important, but this announcement also demonstrates the continued progression of Google Cloud as a location for analytic data to reside. Google continues to gain access to high performance analytic data as both startups and enterprises see it as a cost-effective and user-interface friendly solution for data.

Sway AI Announces Its No-Code Artificial Intelligence (AI) Platform

On February 15, Sway AI announced its no-code AI platform, targeted at both technical and business users. Sway AI’s focus is on allowing enterprises to build and deploy AI without needing to invest heavily in expensive hiring or complex toolkits upfront.

Amalgam’s Insight: This announcement hits many of the buzzwords businesses are hearing for model building: AI, no-code, business-friendly, rapid deployment. But beyond this hype, the critical challenge here is in helping the people who understand business data best to create relevant models.

Partnerships

Striim and Hewlett Packard Enterprise partner to offer high performance, mission-critical solutions with real-time analytics

Striim, a real-time data integration platform, announced a partnership with HPE in the form of Striim for HPE NonStop, a high-performance distributed data transaction solution that allows organizations to analyze streaming data for business insights.

Amalgam’s Insight: Streaming data is reaching mainstream visibility as the need for immediate analysis continues to grow. HPE has another tool to help sell its high-performance hardware while Striim gains another channel with a sales team with a strong solutions selling background.

BigID + Atakama: Data-Centric File Encryption | BigID

BigID, a data intelligence platform, announced a partnership with Atakama, an encryption company. From this partnership, Atakama is building a solution on the BigID platform that will read BigID’s data labeling and tagging, and automatically encrypt files based on the sensitivity of the data contained within.

Amalgam’s Insight: Data identity and trust are massive themes driving new markets associated with blockchain, NFT’s, and the general Web3 experience. But practical aspects of data trust and governance are still formidable challenges for the enterprise, including metadata management and data governance. This partnership helps bring more order to the chaos of existing Big Data environments.

Hiring

Dataiku Announces Edward Bush as Chief Operating Officer to Support Rapid Growth and Bolster Employee Experience

On February 16, Dataiku announced the elevation of Edward Bush as Chief Operating Officer. Bush joined Dataiku in 2017 as the VP of Finance. Prior to Dataiku, Bush was the VP of Finance and Business Operations at VTS.

Amalgam’s Insight: This promotion occurs soon after Dataiku upgraded its board with former Tableau CMO Elissa Fink and former Mimecast CFO Peter Campbell as well as a $400 million funding round. This serves as a strong vote of confidence in Bush to support the employee culture as Dataiku pushes for its next order-of-magnitude growth push.

Posted on

February 11: From BI to AI (Alteryx, Census, DvSum, Qwak, ScaleUp:AI, Scandit, Starburst, Superconductive, Trifacta, Wallaroo, ZL Tech)

If you would like your announcement to be included in Amalgam Insights’ weekly data and analytics roundups, please email lynne@amalgaminsights.com.

Acquisitions

Alteryx Closes Acquisition of Trifacta | Alteryx

On February 7, Alteryx announced that it had closed its acquisition of Trifacta, as noted in the January 7 From BI to AI. Amalgam Insights’ Hyoun Park provided recommendations and insights on Alteryx’ acquisition of Trifacta earlier this week.

Funding

Starburst Raises $250 Million to Lead the Market Shift to Faster Analytics on Decentralized Data | Starburst

Starburst, a data mesh analytics company, announced at its Datanova conference this week that it had raised a $250M Series D funding round. Alkeon Capital led the round, with participation from new investors Altimeter and B Capital Group as well as existing investors Andreessen Horowitz, Coatue Management, Index Ventures and Salesforce Ventures.

Amalgam’s Insight: One of the top challenges of this decade is in accelerating time-to-value on scattered and distributed data. The race to both develop this market and achieve market leadership is occurring quickly and Starburst’s focus on distributed query provides a flexible approach to consider.

Scandit, the Smart Data Capture Leader, Announces $150m Series D Investment Led by Warburg Pincus – Scandit

Scandit, a computer vision-based edge data capture company, announced on February 9 that it had completed a Series D funding round of $150M. Warburg Pincus led the round, with additional participation from existing investors Atomico, Forestay Capital, G2VP, GV, Kreos, NGP Capital, Schneider Electric, Sony Innovation Fund by IGV and Swisscom Ventures. The funding will go towards continuing Scandit’s global expansion, with a particular focus on APAC, specifically Japan, Singapore, and South Korea.

Amalgam’s Insight: Visual data such as barcodes, QR codes, and text continue to provide valuable capabilities in supporting the On-Demand economy. The ability to translate visual data into workflows, documentation, and transactions based on a single scan is still maturing and Scandit’s new round of funding will allow it to support e-commerce, support, and logistics challenges in the APAC region.

$60m to make operational analytics a reality | Census

Census, a business-user targeted data layer, has raised $60M in a Series B round led by Tiger. Previous investors Andreessen Horowitz and Sequoia and new investor Insight Partners also participated in the round. The funding will go towards expanding the product, including adding more data connectors to their library, adding new CI/CD (continuous integration and delivery) features, and building governance into their platform via a business-wide knowledge graph.

Amalgam’s Insight: Although $60 million is not a huge amount for the likes of Tiger Global, A16z, Sequoia, and Insight Partners, it is notable that four of the biggest venture capital firms with a data background saw the value of Census to enable business knowledge graphs. The holy grail of the business graph has been a goal for over a decade since the days that “Social Business” was still a buzzword. However, it has been difficult in practice to translate the vision of a fully interconnected data graph within the business into a reality. If this round goes as planned, don’t be surprised to see Census become a unicorn startup in the next couple of years.

Superconductive Raises $40M in Series B Funding to Revolutionize the Speed and Integrity of Data Collaboration

Superconductive, the provider of open source data quality tool Great Expectations, has raised a $40M Series B round. Tiger Global led the round, with participation from CRV, Index, and Root Ventures. The funds will be used for R+D towards releasing their first commercial product, as well as growing the organization through hiring.

Amalgam’s Insight: One of the new key phrases in today’s world of machine learning is that analytic and machine learning models are only as good as the data they are based on. Superconductive seeks to create a more collaborative experience to create better data pipelines so that all relevant data experts and developers can do their part to keep data clean.

Machine Learning Innovator Wallaroo Wins Backing from Microsoft’s M12 in $25M Series A Round | Business Wire

Wallaroo, a machine learning operationalization company, has closed a $25M Series A round of funding. M12, Microsoft’s venture arm, led the round, along with participation from existing investors Boldstart Ventures, Contour Venture Partners, Eniac Ventures, and Greycroft, as well as new investors NSS Advisors and Ridgeline Partners. Wallaroo will use the funds to both improve their existing product and release a free version, as well as grow sales and marketing.

Amalgam’s Insight: Wallaroo seeks to reduce the cost of operationalizing machine learning. This has obvious repercussions for Microsoft, which is in a race with Amazon and Google to put as many demanding workloads onto its cloud as soon as possible. Wallaroo’s claims of reducing time to production by over 90% should prove to be valuable in getting more models into production.

Qwak looks to automate MLOps processes | VentureBeat

MLOps company Qwak raised $15M in a Series A round, joining a number of companies eager to help companies operationalize their machine learning models. New investors include Leaders Fund and StageOne Ventures, while previous investor Amiti Ventures and individual investors also participated. The funds will go towards product development, as well as expanding sales and marketing.

Amalgam’s Insight: The pain point of animal-named companies getting models into production is a continuing theme those week. Qwak seeks to be a holistic machine learning engineering solution with its goal of being a destination for data scientists to build, test, and deploy models within a single platform and its claims of being able to get a model from script to production in less than five minutes.

Product Launches and Updates

Starburst Unveils New Data Product Functionality to Accelerate Data Mesh Journey | Starburst

Starburst also announced new capabilities for its Starburst Enterprise product at Datanova. New and improved features include access control to secure data products for consistent governance, the ability for data engineers and producers to define relevant metadata in said data products, and rating and sharing of said data products to make data accessible as quickly as possible.

Amalgam’s Insight: The battle for context continues to be a massive challenge and Starburst’s additions of data governance and metadata definitions align to the need for business users to trust the data that they are providing to their customers.

ZL Tech Introduces New Solution to Transform Out-of-Sight Corporate Knowledge to Business Insight – ZL Tech

On February 8, ZL Tech announced improvements to ZL People Analytics, its SaaS solution for unstructured information management. Instead of being confined to a “sandboxed” structured database, ZL People Analytics includes unstructured data such email, documents, and company chat in its purview while allowing that data to remain in-place to address governance and regulatory concerns, making the data search process more efficient.

Amalgam’s Insight: Text analytics can be both cumbersome to support and provide governance nightmares as GDPR, CCPA, and other personal information management laws have become standard practice. By supporting analytics on top of semi-structured and unstructured data, this offering helps companies to get more information while keeping data centralized and in-location.

DvSum Launches its Next Generation Data Catalog | Business Wire

On February 7, DvSum debuted its augmented data catalog solution. Key features include automatic cataloging, classification, and curation of data, as well as recommending new entities and business terms for an organization’s business glossary. There is a free tier available; premium plans based on the number of data sources and users start at $1k/month.

Amalgam’s Insight: The data catalog has become an important part of the data manager’s toolkit in defining the business view of the world. However, data curation is still a relatively expensive endeavor and DvSum is seeking to provide context while maintaining a cost-efficient offering.

Events

April 6-7, 2022: ScaleUp AI

On April 6 and 7, Insight Partners will host ScaleUp:AI, an AI industry conference, in New York and virtually. Confirmed speakers include Databricks CEO Ali Ghodsi; Allie K. Miller, Global Head of Machine Learning Business Development, Startups, and Venture Capital at AWS; Google Brain cofounder Andrew Ng; Humana Chief Digital Health and Analytics Officer Heather Carroll Cox; Fiddler AI CEO Krishna Gabe; and SentinelOne CEO and cofounder Tomer Weingarten. The in-person event is sold out, but virtual passes are still available; register for the event at ScaleUp:AI.

Posted on

Alteryx Acquires Trifacta: Considerations for DataOps, MLOps, & the Analytic Community

On February 7, 2022, Alteryx completed its acquisition of Trifacta, a data engineering company known for its promotion of “data wrangling” and in bringing to the forefront the challenge of cleansing data in making Big Data useful and supporting machine learning. Alteryx announced its intention to acquire on January 6th for $400 million with an additional $75 million dedicated to an employee retention pool.

Trifacta was founded in 2012 by Stanford Ph.D Sean Kandel, then-Stanford professor Jeffrey Heer, and Berkeley Professor Joe Hellerstein as a data preparation solution at a time when Big Data started to become a common enterprise technology. The company was formed based on Wrangler, a visualization of data transforms that tackled a fundamental problem of reducing the estimated 50-80% of worktime that data analysts and data scientists spent preparing data for analytical use.

Over the past decade, Trifacta raised $224 million with its last round being a $100 million round raised in September 2019. Trifacta quickly established itself as a top solution for data professionals seeking to cleanse data. In a report I wrote in 2015, one of my recommendations was “Consider Trifacta as a general data cleansing and transformation solution. Trifacta is best known for supporting both Hadoop and Big Data environments, including support for JSON, Avro, ORC, and Parquet.” (MarketShare Selects a Data Transformation Platform to Enhance Analyst Productivity, Blue Hill Research, February 2015)

Over the next seven years, Trifacta continued to advance as a data preparation and data engineering solution as it evolved to support major cloud platforms. During this time, three key trends emerged in the data preparation space starting in 2018.

First, data preparation companies focused on the major cloud platforms starting with Amazon Web Services, then Microsoft Azure and Google Cloud. This focus reflected the gravity of net-new analytic and AI data shifting from on-premises resources into the cloud and was a significant portion of Trifacta’s product development efforts over the past few years.

Second, data preparation firms started to be acquired by larger analytic and machine learning providers, such as Altair’s 2018 acquisition of Datawatch and DataRobot’s 2019 acquisition of Paxata. Trifacta was the last remaining market leading data preparation company left on the market for acquisition after having developed the data preparation and wrangling market.

Third, the task of data preparation evolved into a new role of data engineering as enterprises grew to understand that the structure, quality, and relationships of data had to be well defined to get the insights and directional guidance that Big Data had been presumed to hold. As this role became more established, data preparation solutions had to shift towards workflows defined by DataOps and data engineering best practices. It was no longer enough for data cleansing and preparation to be done, but for them to be part of governed process workflows and automation within a larger analytic ecosystem.

All this is to provide guidance on what to expect as Trifacta now joins Alteryx. Although Trifacta and Alteryx are both often grouped as “data preparation” solutions, their roles in data engineering are significantly different enough that I rarely see situations where both solutions are equally suited for a specific use case. Trifacta excels as a visual tool to support data preparation and transformation on the top cloud platforms while Alteryx has long been known for its support of low-code and no-code analytic workflows that help automate complex analytic transformations of data. Alteryx has developed leading products across process automation, the analytic blending in Designer, location-based analytics in Location, as well as machine learning support and Alteryx Server to support analytics at scale.

Although Alteryx provides data cleansing capabilities, its interface does not provide the same level of immediate visual feedback at scale that Trifacta provides, which is why organizations often use both Trifacta and Alteryx. With this acquisition, Trifacta can be used by technical audiences to identify, prepare, and cleanse data and develop highly trusted data sources so that line-of-business data analysts can spend less time finding data and more time providing guidance to the business at large.

Recommendations and Insights for the Data Community

Alteryx clients that consider using Trifacta should be aware that this will likely result in an increased number of analytically accessible data sources. More always sounds better, but this also means that from a practical perspective, your organization may require a short-term reassessment of the data sources, connections, and metrics that are being used for business analysis based on this new data preparation and engineering capability. In addition, this merger can be used as an opportunity to bring data engineering and data analyst communities closer together as they coordinate responsibilities for data cleansing and data source curation. Trifacta provides some additional scalability in this regard that can be leveraged by organizations that optimize their data preparation capabilities.

This acquisition will also accelerate Alteryx’s move to the cloud, as Trifacta provides both an entry point for accessing a variety of cloud data sources and a team of developers, engineers, and product managers with deep knowledge of the major cloud data platforms. Given that Trifacta was purchased for roughly 10% of Alteryx’ market capitalization, the value of moving to the cloud more quickly could potentially justify this acquisition all on its own as an acquihire.

Look at DataOps, analytic workflows, and MLOps as part of a continuum of data usage rather than a set of silos. Trifacta has its 12,000 customers with a mean average of four seats per customer focused on data preparation and engineering. With this acquisition, the Trifacta and Alteryx teams can work together more closely in aligning those four data engineers to the ~30 analytic users that Alteryx averages for each of its 7,000+ customers. The net result is an opportunity to bring DataOps, RPA, analytic workflows, and MLOps together into an integrated environment rather than the current set of silos that often prevent companies from understanding how data changes can affect analytic results.

It has been a pleasure seeing Trifacta become one of the few startups that successfully defines an emerging market of data prep and to coin a term “data wrangling” that was successful enough that it gained market acceptance both with users and with competitors. Many firms try to do this with little success, but Trifacta’s efforts represent the notable exception where its efforts will outlive its time as a standalone company. Trifacta leaves a legacy of establishing the importance of data quality, preparation, and transformation in the enterprise data environment in a world where raw data is imperfect, but necessary to support business guidance. And as Trifacta joins Alteryx, this combined ability to support data from its raw starting point to machine learning models and outputs across a hybrid cloud will continue to be a strong starting point for organizations seeking to provide employees with more control and choice over their analytic inputs and outputs.

If you are currently evaluating Alteryx or Trifacta and need additional guidance, please feel free to contact us at research@amalgaminsights.com to discuss your current selection process and how you are estimating the potential business value of your purchase.

Posted on

February 4: From BI to AI (Alteryx, Citrix, DataRobot, Informatica, Microsoft Azure, Onehouse, Pecan, Teradata, TIBCO, Yellowfin)

If you would like your announcement to be included in Amalgam Insights’ weekly data and analytics roundups, please email lynne@amalgaminsights.com.

Funding

Our Opportunity to Build Something Even Bigger: Series C Funding Announcement – Pecan AI

On February 3, Pecan, a low-code predictive analytics platform, raised $66M in Series C funding. Insight Partners led the round, with new investor GV also participating, as well as existing investors Dell Technologies Capital, GGV Capital, Mindset Ventures, S-Capital, and Vintage Investment Partners. The funding will be used to accelerate R+D and increase headcount.

Amalgam’s Insight: Pecan epitomizes the idea of helping companies move from BI to AI with its capability to help SQL-savvy data analysts to conduct data science. As a bridge technology between BI and AI, Pecan’s approach to providing predictive models for general use is a capability enterprises will need to pursue (whether with Pecan or another vendor) to empower their data analysts for the emerging era of machine learning that has been in progress for the last half-decade.

Onehouse Supercharges Data Lakes for AI and Machine Learning With $8 Million in Seed Funding From Greylock and Addition

On February 2, Onehouse, a lakehouse service built atop Apache Hudi to make data lakes faster, cheaper, and easier to access, emerged from stealth with $8M in seed funding. Investment firms Greylock and Addition co-led the funding round; the money will be used for R+D. Onehouse is fully managed and cloud-native, accelerating the speed at which data lakes can be set up. Amalgam Insights’ Hyoun Park is quoted in the press release announcing the launch of Onehouse.

Amalgam’s Insight: The lakehouse, an amalgamation of data lake and data warehouse, is an important construct for data architects seeking to unlock the value of the “Big Data” they have collected over the past decade. The overwhelming volume and variety of enterprise data makes a traditional data warehouse approach challenging to support for all relevant data. However, lakehouses are challenging to support and Onehouse’s approach of providing a managed service for lakehouses will be valuable for companies seeking to take this approach but lacking the personnel to access the analytic value of semi-structured data.

Acquisitions

Idera, Inc. Acquires Yellowfin International Pty Ltd

On January 28, Idera announced that they had acquired Yellowfin International, an embedded data analytics and BI platform. Yellowfin will join Idera’s Developer Tools business, expanding the capabilities of that suite in a new direction, enhancing the ability of Idera to cross-sell BI and analytics functionality to existing and new customers.

Amalgam’s Insight: Yellowfin has been a long-time favorite of Amalgam Insights with its market-leading visualization and user-focused data exploration capabilities combined with its extreme scalability. In joining Idera, Yellowfin now joins a suite of solutions that will enhance Yellowfin’s embedded business intelligence capabilities and provide developers with tools for more robust and user-friendly applications.

Teradata Announces Global Partnership with Microsoft

On February 2, Teradata announced a global partnership with Microsoft where it would more fully integrate the Teradata Vantage platform with Microsoft Azure. Though Teradata is already significantly integrated with over 60 existing Azure data services, this announcement signals a deepening of the existing relationship between the two companies.


Amalgam’s Insight: This partnership shows Microsoft Azure’s continued partnership with analytic and data companies that compete with other areas of Microsoft. For Teradata, this partnership helps current clients to migrate to an enterprise cloud that is developer-friendly while Microsoft gains more data as it competes against Amazon in the cloud infrastructure market.

Citrix to be Acquired by Vista Equity Partners and Evergreen Coast Capital for $16.5 Billion | TIBCO Software

On January 31, Vista Equity Partners and Evergreen Coast Capital Corporation announced that they would be acquiring Citrix, a digital workspace and application delivery platform, for $16.5B. As part of the transaction, Citrix will merge with TIBCO, which is currently owned by Vista, bringing together Citrix’s secure digital workspace and app delivery capabilities with TIBCO’s data and analytics under one roof, with the goal of accelerating Citrix’s SaaS transition while creating a company that serves 98% of the Fortune 500.

Amalgam’s Insight: We will be working on a deeper exploration of this acquisition, which at first glance mirrors Idera’s acquisition of Yellowfin in creating a larger enterprise application company with a variety of capabilities across data management, security, and IT management. Given that Vista Equity Partners acquired TIBCO in 2014 for $4.3 billion, this will provide to be a busy year for TIBCO in quickly integrating Citrix and presenting this combined company for an impending acquisition or IPO.

Updates and Launches

Alteryx introduces the newest version of the Alteryx Platform (2021.4)

Alteryx launched the latest version of the Alteryx Platform, 2021.4, on February 3. Key improvements include enhanced server APIs to allow for further administrative automation; the Named Entity Recognition text mining tool which automatically extracts data from images; new connectors for Anaplan, Google Drive, Outlook 365, and Automation Anywhere; and the Data Connection Manager, which will simplify sharing data sources across an organization.

Amalgam’s Insight: Alteryx’s market leadership as an analytic workflow platform is enhanced with this combination of connectors, data sharing, and automation capabilities. This version update comes at a time when Alteryx’s next stage of growth is dependent on supporting enterprise-wide use cases for analytic insight and providing the administrative governance necessary to quickly deploy these use cases.

Informatica Announces New PoD in UK to Support Growing Demand for Data Sovereignty | Informatica

On February 3, Informatica announced a new UK Point of Delivery for its Intelligent Data Management Cloud. Brexit has complicated the understanding and enforcement of data privacy and locality requirements, especially in regulated industries.

Amalgam’s Insight: Informatica’s debuting a geographically appropriate cloud to support organizations doing business in the UK helps said orgs respect relevant data-related laws and regulations. This delivery site will continue to be a trend in the data industry where global organizations will need to increase their investment in the UK or risk losing business to better-prepared competitors.

Hiring

Alteryx Announces Leadership Changes to Accelerate Next Phase of Cloud Growth | Alteryx

On February 1, Alteryx announced several personnel changes. Paula Hansen has been promoted to President and Chief Revenue Officer, while Keith Pearce has been named as the company’s new CMO. Previously, Pearce was the SVP of Corporate Marketing for Genesys. In addition, COO Scott Davidson will step down from his role as of mid-March.

Amalgam’s Insight: We covered the hiring of Paula Hansen in our May 2021 update. This promotion made sense as Alteryx has had a President/Chief Revenue Officer in the past. Keith Pearce has a strong record of solutions and vertical marketing across his career which fits Alteryx’ need to dig further into each vertical now that it has reached a critical mass of accounts. Alteryx’ challenge is no longer name recognition, but account development and education: two areas where Pearce has excelled in his past roles.

DataRobot Hires Google’s Debanjan Saha as Chief Operating Officer – DataRobot AI Cloud

On February 2, DataRobot welcomed Debanjan Saha as their new Chief Operating Officer. Saha was previously the VP and GM of Data Analytics at Google, overseeing analytics on Google Cloud and BigQuery; before that, Saha developed and launched the Amazon Aurora relational database at AWS.

Amalgam’s Insight: Saha has a long record of managing cutting-edge cloud solutions at IBM, Amazon, and Google across virtualization, database, and data management technologies. As DataRobot has quickly grown from a machine learning automation solution to a broader MLOps and engineering platform, Saha’s managerial background will be valuable in pushing DataRobot’s development and monetization of the end-to-end needs for enterprise machine learning.

Posted on

January 28: From BI to AI (anch.AI, Dataiku, DataRobot, Domino, Dremio, Firebolt, Informatica, Meta)

If you would like your announcement to be included in Amalgam Insights’ weekly data and analytics roundups, please email lynne@amalgaminsights.com.

Funding

Dremio Doubles Valuation to $2 Billion with $160M Investment Towards Reinventing SQL for Data Lakes – Dremio

On January 25, Dremio announced that it had raised $160M in a Series E funding round. This new round comes only a year after a $135M Series D round from last January 2021. Adams Street Partners led the funding round, joined by fellow new investors DTCP and StepStone Group. Existing investor participation came from Cisco Investments, Insight Partners, Lightspeed Venture Partners, Norwest Venture Partners, and Sapphire Ventures. The funding will go towards R+D, customer service, customer education and community-building, and contributions to open source initiatives. Amalgam Insights’ Hyoun Park was quoted in TechTarget on the Dremio investment: Dremio raises $160M for cloud data lake platform technology.

Firebolt Announces Series C Round at $1.4 billion Valuation

On January 26, Firebolt, a cloud data warehouse, announced a $100M Series C round. Aikeon Capital led the round, with participation from new investors Glynn Capital and Sozo Ventures, and existing investors Angular Ventures, Bessemer Venture Partners, Dawn Capital, K5 Global, TLV Partners, and Zeev Ventures. The funds will primarily go towards expanding the product and engineering teams. Firebolt also announced that Mosha Pasumansky would assume the CTO position, coming over from Google BigQuery, and that Firebolt would be opening a Seattle office.

anch.AI, former AI Sustainability Center, Secures $2.1M in Seed Funding to Launch Ethical AI Governance Platform

On January 26, anch.AI launched its ethical AI governance platform, and secured $2.1M in seed funding. Benhamou Global Ventures led the round, with participation from Terrain Invest, Frederik Andersson, Kent Janer, and Magnus Rausing. The funding will go towards further development of the platform.

Updates and Enhancements

Domino Data Lab Unveils Platform to Accelerate Model Velocity

On January 26, Domino Data Lab debuted Domino 5.0., a major new release of their MLOps platform. Key new capabilities include autoscaling clusters to give data science teams easier access to compute infra; data collectors that will allow teams to securely share and reuse common data access patterns; and integrated monitoring of models in production, with automated insights that compare production data to training data to assess and diagnose model drift. The latest version is available immediately to existing Domino customers, with a trial version available for new customers.

Dataiku Achieves ISO 27001 Certification | Dataiku

On January 27, Dataiku announced that they were now ISO 27001 certified, citing it as a “business imperative” to protect sensitive customer data from improper access and security breaches. ISO 27001 certification is a consideration for enterprises needing to not only prevent security breaches, but also ensure data is appropriately domiciled to comply with regulations like GDPR and CCPA.

DataRobot Launches MoreIntelligent.ai to Share Untold Stories on the Future of AI – DataRobot AI Cloud

DataRobot continues its AI education efforts with this week’s launch of MoreIntelligent.ai, an expansion of their More Intelligent Tomorrow podcast. Content will include research and analysis, prescriptive takeaways to inform AI practitioner action, and interviews with prominent AI leaders. The prominence DataRobot is giving More Intelligent works suggests that education about AI continues to be key to growing the AI market.

Introducing Meta’s Next-Gen AI Supercomputer | Meta

On January 24, Meta unveiled the AI Research SuperCluster, aiming to be the fastest supercomputer in the world when it’s completed in mid-2022. Meta plans to use the RSC to build stronger AI models which will analyze text, images, and video together in hundreds of languages, as a step on the path towards the metaverse.

Hiring

Informatica Appoints Jim Kruger as Chief Marketing Officer to Accelerate Cloud Growth | Informatica

On January 24, Informatica appointed Jim Kruger as the Chief Marketing Officer. Kruger was previously the CMO at Veeam Software, Intermedia, and Polycom, bringing years of experience in the CMO role as an experienced marketer who understands how to communicate around complex technologies.

Posted on

January 21: From BI to AI (DataRobot, Diversio, Domino, Prophecy, Vectice)

If you would like your announcement to be included in Amalgam Insights’ weekly data and analytics roundups, please email lynne@amalgaminsights.com.

Funding

Prophecy raises Series A to Industrialize Data Refining

Prophecy, a low-code data engineering platform, raised $25M in Series A funding this week. The round was led by Insight Partners, with other participants from existing investors Berkeley SkyDeck and SignalFire, and new investor Dig Ventures. The funding will go towards building out the platform, as well as investing in the go-to-market side. Prophecy seeks to standardize data refinement for use at scale, making the process more predictable and visible.

Vectice Announces $15.6M in Seed and Series A Funding

On January 18, Vectice, a data science knowledge capture company, announced it had raised a $12.6M series A round. The round was co-led by Crosslink Capital and Sorenson Ventures. Additional participants included Global Founders Capital (GFC), Silicon Valley Bank, and Spider Capital. Vectice will use the funds to further expand its team, and to onboard select accounts into their Beta program. Vectice automatically captures the assets that data science teams generate throughout a project, and generates documentation throughout the project lifecycle.

Diversio Announces Series A Funding

Also this week, Diversio, a diversity, equality, and inclusion platform, raised $6.5M in series A funding. Participants included Chandaria Family Holdings, First Round Capital, and Golden Ventures. Plans for the funding include expanding the sales and client success teams, accelerating product development, and amplifying marketing efforts. Diversio combines analytics, AI, and subject matter expertise to understand where DEI efforts at organizations are getting derailed, and offer action plans for setting and meeting DEI goals.

Updates

DataRobot’s State of AI Bias Report Reveals 81% of Technology Leaders Want Government Regulation of AI Bias – DataRobot AI Cloud

On January 18, DataRobot released its State of AI Bias Report, assessing how AI bias can impact organizations, along with ways to mitigate said bias. Common challenges organizations face include the inability to understand the reasons for a specific AI decision, or the correlation between inputs and outputs, along with the difficulty of developing trustworthy algorithms and determining what data is used to train a given model. All of these challenges have led to some combination of lost revenue, customers, and employees, along with legal fees and reputation damage to the company; organizations are seeking guidance to avoid these issues.

Events

Domino Data Lab Hosts January 26 Virtual Event: Unleashing Exceptional Performance with Data Science

On Wednesday, January 26, Domino Data Lab will host a free one-hour virtual event: “Unleashing Exceptional Performance,” focusing on data science. Featured speakers include surgeon and author Dr. Atul Gawande, and Janssen Research and Development’s Chief Data Science Officer and Global Head of Strategy and Operations Dr. Najat Khan. There will be two sessions to accommodate various timezones, one at 1300 GMT and one at 11 am PT/2 pm ET. To register for the event, please visit the Domino event registration site.

Posted on

Taking a More Analytic Approach to Wordle

Taking a More Analytic Approach to Wordle

The hottest online game of January 2022 is Wordle, a deceptively addictive online game where one tries to guess a five-letter word starting from scratch. Perhaps you’ve started seeing a lot of posts that look like this:

In the unlikely case you haven’t tried Wordle out yet, let me help enable you with this link: https://www.powerlanguage.co.uk/wordle/

OK, that said, the rules of this game are fairly simple: you have six chances to guess the word of the day. This game, created by software developer Josh Wardle, was adorably created as a game for his partner to enjoy. But its simplicity has made it a welcome online escape in the New Year. The website isn’t trying to sell you anything. It isn’t designed to “go viral.” All it does is ask you to guess a word.

But for those who have played the game, the question quickly comes up on how to play this game better. Are there quantitative tricks that can be used to make our Wordle attempts more efficient? How do we avoid that stressful sixth try where the attempt is “do or die?”

For the purposes of this blog, we will not be going directly into any direct Wordle sources because what fun would that be?

Here’s a few tips for Wordle based on some basic analytic data problem solving strategies.

Step 1: identify the relevant universe of data

One way to model an initial guess is to think about the distribution of letters in the English language. Any fan of the popular game show “Wheel of Fortune” has learned to identify R, S, T, L, N, and E as frequently used letters. But how common are those letters?

One analysis of the Oxford English Dictionary done by Lexico.com shows that the relative frequency of letters in the English language is as follows:

LetterFrequencyLetterFrequency
A8.50%N6.65%
B2.07%O7.16%
C4.54%P3.17%
D3.38%Q0.20%
E11.16%R7.58%
F1.81%S5.74%
G2.47%T6.95%
H3.00%U3.63%
I7.54%V1.01%
J0.20%W1.29%
K1.10%X0.29%
L5.49%Y1.78%
M3.01%Z0.27%

This is probably a good enough starting point. Or is it?

Step 2: Augment or improve data, if possible

Stanford GraphBase has a repository of 5757 five letter words used as a starting point for analysis. We know this isn’t exactly the Wordle word bank, as the New York Times wrote an article describing how Wardle and his partner Palak Shah whittled down the word bank to a 2,500 word pool. We can use this to come up with a more specific distribution of letters. So, how does that differ?

Surprisingly, there’s enough of a difference that we need to decide on which option to use. We know that a lot of plural worlds end in s, for instance, which is reflected in the Stanford data. If I were doing this for work, I would look at all of the s-ending words and determine which of those were plural, then cleanse that data since I assume Wordle does not have duplicate plurals. But since Wordle is not a mission-critical project, I’ll stick with using the Stanford data as it has a number of other useful insights.

Step 3: Identify the probable outcomes

So, what are the chances that a specific letter will show up in each word? Wordle isn’t just about the combination of potential letters that can be translated into words. In a theoretical sense, there are 26^5 potential combinations of words that exist or 11,881,376 words. But in reality, we know that AAAAA and ZZZZZ are not words.

Here’s a quick breakdown of how often each letter shows up in each position in the Stanford five-letter data along with a few highlights of letter positions that stand out as being especially common or especially rare.

The 30.64% of words ending in “s” are overwhelmingly plural nouns or singular verbs which leads to the big question of whether one believes that “s-ending” words are in Wordle or not. If they are, this chart works well. If not, we can use the Oxford estimate instead, which will give us less granular information.

1 – (1-[probability])^5

But with the Stanford data, we can do one better and look both at the possibility of each letter in each position as well as to get an idea of the overall odds that a letter might be used by looking at

  1. – [(1 – (First)) * (1 – (Second)) * (1 – (Third)) * (1 – (Forth)) * (1 – (Fifth))]

To figure out the chances that a letter will be used. And we come to the following table and chart.

I highlighted the three letters most likely to show up. I didn’t show off the next tier only because I was trying to highlight what stood out most. In general, I try to highlight the top 10% of data that stands out just because I assume that more than that means that nothing really stands out. My big caveat here is that I’m not a visual person and have always loved data tables more than any type of visualization, but I realize that is not common.

Step 4: Adjust analysis based on updated conditions

As we gain a better understanding of our Wordle environment, the game provides clues on which letters are associated with the word in question. Letters that are in the word of the day but are not in the right position are highlighted in yellow. Based on the probabilities we have, we can now adjust our assumptions. For instance, let’s look at the letter “a”

If we are looking at a word that has the letter “a”, but we know it is not in the first position, we know now we’ve cut down the percentage of words we’re thinking of by about 10%. We can also see that if that “a” isn’t in the second position, it’s probably in the third position.

Step 5: Provide results that will lead to making a decision

Based on the numbers, we can now guess that there’s a 50% chance that “a” is in the second position as 16% of five-letter words have an “a” out of the 31.57% of words that have an “a” but not in the first position. That is just one small example of the level of detail that can be made based on the numbers. But if I am providing this information with the goal of helping with guidance, I am probably not going to provide these tables as a starting point. Rather, I would start by providing guidance on what action to take. The starting point would likely be something like:

The letters used more than 20% of the time in five-letter words are the vowels a, e, i, and o and the consonants l, n, r, s, & t, much as one would expect from watching Wheel of Fortune. Top words to start with based on this criteria include “arise,” “laser,” and “rates.”

In contrast, if one wishes to make the game more challenging, one should start with words that are unlikely to provide an initial advantage. Words such as “fuzzy” and “jumpy” are relatively poor starting points from a statistical perspective.

Conclusion

First, this common approach to data definitely showed me a lot about Wordle that I wouldn’t have known otherwise. I hope this approach helps you both in thinking about your own Wordle approach and to further explore the process of Wordle and other data. And it all started with some basic steps:

So, having done all this analysis, how much do analytics help the Wordle experience? One of the things that I find most amazing about the process of playing Wordle is how our brains approximate the calculations made here from a pattern recognition perspective that reflects our use of language. Much as our brain is effectively solving the parallax formula every time we catch a ball thrown in the air, our brains also intuitively make many of these probabilistic estimates based on our vocabulary every time we play a game of Wordle.

I think that analytic approaches like this help to demonstrate the types of “hidden” calculations that often are involved in the “gut reactions” that people make in their decision-making. Gut reactions and analytic reactions have often been portrayed as binary opposites in the business world, but gut reactions can also be the amalgamation of intelligence, knowledge, past experiences, and intuitive feelings all combined to provide a decision that can be superior or more innovative in comparison to pure analytic decisions. Analytics are an important part of all decision-making, but it is important not to discount the human component of judgment in the decision-making process.

And as far as Wordle goes, I think it is fun to try the optimized version of Wordle a few times to see how it contrasts with your standard process. On the flip side, this data also provides guidance on how to make Wordle harder by using words that are less likely to be helpful. But ultimately, Wordle is a way for you to have fun and analytics is best used to help you have more fun and not to just turn Wordle into an engineering exercise. Happy word building and good luck!

Posted on

Observable raises a $35 million B round for data collaboration

On January 13, 2022, Observable raised a $35.6 million Series round led by Menlo Ventures with participation from existing investors Sequoia Capital and Acrew Capital. This round increases the total amount raised by Observable to $46.1 million. Observable is interesting to the enterprise analytics community because it provides a platform to help data users to collaborate throughout the data workflow of data discovery, analysis, and visualization.

Traditionally, data discovery, contextualization, analytics, and visualization can potentially be supported by different solutions within an organization. This complexity is multiplied by the variety of data sources and platforms that have to be supported and the number of people who need to be involved at each stage which leads to an unwieldy number of handoffs, the potential issue of using the wrong tool for the wrong job, and an extended development process that results from the inability for multiple people to simultaneously work on creating a better version of the truth. Observable provides a single solution to help data users to connect, analyze, and display data along with a library of data visualizations that help provide guidance on potentially new ways to present data.

From a business perspective, one of the biggest challenges of business intelligence and analytics has traditionally been the inability to engage relevant stakeholders to share and contextualize data for business decisions. The 2020s are going to be a decade of consolidation for analytics where enterprises have to make thousands of data sources available and contextualized. Businesses have to bridge the gaps between business intelligence and artificial intelligence, which are mainly associated with the human aspects of data: departmental and vertical context, categorization, decision intelligence, and merging business logic with analytic workflows.

This is where the opportunity lies for Observable in allowing the smartest people across all aspects of the business to translate, annotate, and augment a breadth of data sources into directional and contextualized decisions while using the head start of visualizations and analytic processes that have been shared by a community of over five million users. And then by allowing users to share these insights across all relevant applications and websites, these insights can drive decisions in all relevant places by bringing insights to the users.

Observable goes to market with a freemium model that allows companies to try out Observable for free and then to add editors at tiers of $12/user/month and $40/user/month (pricing as of January 13, 2022). This level of pricing makes Observable relatively easy to try out.

Amalgam Insights currently recommends Observable for enterprises and organizations with three or more data analysts, data scientists, and developers who are collaboratively working on complex data workflows that lead to production-grade visualization. Although it can be more generally used for building analytic workflows collaboratively, Observable provides one of the most seamless and connected collaborative experiences for creating and managing complex visualizations that Amalgam Insights has seen.

Posted on

January 7: From BI to AI (Alteryx, Databricks, Fractal, Meta, Qlik, Trifacta, WEKA)

If you would like your announcement to be included in Amalgam Insights’ weekly data and analytics roundups, please email lynne@amalgaminsights.com.

Acquisitions and Partnerships

Alteryx Announces Acquisition of Trifacta

Yesterday, January 6, Alteryx announced that it has acquired Trifacta for $400M in a cash offer. Trifacta and Alteryx have historically been viewed as competitors, but Trifacta’s greater depth of capability re data engineering and cleansing complements Alteryx’ strengths in analytic workflows.

Product Launches and Updates

AI that understands speech by looking as well as hearing

Today, January 7, Meta debuted Audio-Visual Hidden Unit BERT (AV-HuBERT), a self-supervised framework for understanding speech that combines video input from lip movements and audio input from speech, both as raw unlabeled data. The goal is to improve accuracy even in environments where audio input may be compromised, such as from loud background noise.

Financial Transactions

Qlik Announces Confidential Submission of Draft Registration Statement Related to Proposed Public Offering

On Thursday, January 6, Qlik announced that it had confidentially submitted its draft regulation statement related to a proposed IPO. The expected IPO comes over five years after private equity investment firm Thoma Bravo purchased Qlik and took them private.

Fractal announces US$ 360 million investment from TPG

On Wednesday, January 5, Fractal, an AI and advanced analytics provider, announced that TPG, a global asset firm, will be investing $360M in Fractal. Puneet Bhatia and Vivek Mohan of TPG will join Fractal’s board of directors as part of the deal.

WEKA Increases Funding to $140 Million to Accelerate AI Data Platform Adoption in the Enterprise

WEKA, a data storage platform, announced on Tuesday, January 4, that they have raised $73M in a Series C funding round, raising total funding to $140M. The oversubscribed round was led by Hitachi Ventures. Other participants include Cisco, Hewlett Packard Enterprise, Ibex Investors, Key 1 Capital, Micron, MoreTech Ventures, and NVIDIA. The funding will go towards accelerating go-to-market activities, operations, and engineering.

Hiring

Databricks Appoints Naveen Zutshi as Chief Information Officer

Finally, Wednesday, January 5, Databricks announced that it had appointed Naveen Zutshi as their new Chief Information Officer. Zutshi joins Databricks from Palo Alto Networks, where he was the CIO for six years, expanding Palo Alto Networks into new security categories and scaling up at speed. Prior to that, Zutshi was the SVP of Technology at Gap Inc, overseeing global infrastructure, ops, and security for the retailer.

Posted on

November 12: From BI to AI (Domino, H2O.ai, IBM, Informatica, Tableau)

Product Launches and Enhancements

IBM to Add New Natural Language Processing Enhancements to Watson Discovery

On November 10, IBM revealed new natural language processing enhancements planned for IBM Watson Discovery. Business users will be able to train Watson Discovery to surface insights more quickly on a corpus of industry-specific documents without needing traditional data science skills. Specific capability enhancements include pre-trained document structure understanding, automatic text pattern detection, and a custom entity extractor feature that will help identify industry-specific words and phrases with specific contexts. The announced enhancements are forthcoming, though IBM did not announce a target release date.

Informatica Announces Cloud Data Marketplace

On November 11, Informatica debuted their Cloud Data Marketplace. The Cloud Data Marketplace will allow Informatica business users to “shop” for both datasets and AI and analytics models, surfacing existing assets to encourage reuse of more-vetted resources rather than duplicating efforts by re-gathering data or building a model from scratch. Informatica Cloud Data Marketplace is available today with consumption-based pricing on Informatica’s Intelligent Data Management Cloud.

Tableau Outlines Product Vision and the Future of Analytics at Tableau Conference 2021

On November 9, at Tableau Conference 2021, Tableau announced a host of innovations for the Tableau platform and ecosystem, focused on bringing analytic capabilities to the workflows and environments workers already use. Highlights include Model Builder, a new feature in Tableau Business Science that allows Tableau users to build predictive models using Einstein Discovery; and Scenario Planning, another new Tableau Business Science feature to compare scenarios and “what-ifs,” supported by Einstein AI.

Partnerships

Domino Data Lab Expands Collaboration with NVIDIA and TCS with New Enterprise MLOps Solutions for Modern IT Stacks

On November 9, Domino Data Lab announced a fully-managed offering with solutions partner Tata Consultancy Services that allows Domino customers to run high-performance computing and data science workloads on NVIDIA DGX systems, hosted in the TCS Enterprise Cloud. This marks the next step in a deepening relationship between Domino and NVIDIA, with the Domino integration into the NVIDIA AI Enterprise suite on the horizon.

Funding

H2O.ai Closes $100 Million in Funding Led by Customer Commonwealth Bank of Australia

On November 8, H2O.ai closed $100M in Series E funding. The round was led by customer Commonwealth Bank of Australia, with participation by existing investors Crane Venture Partners and Goldman Sachs Asset Management and new investor Pivot Investment Partners. The funding will be used to scale up partnerships, sales, marketing, and customer success at a global level.