Posted on

Analyzing Tableau Next for the Era of AI

As we start our farewell tour of the analyst world, Amalgam Insights had the opportunity to attend Tableau Conference, which has consistently been one of the key events for data and analytics throughout my analyst career. The most influential Tableau Conference from a personal level was when I attended in 2013 and then-CEO Christian Chabot gave an inspiring speech on the role of discovery. It was so long ago that Tableau was just starting to get a Mac-native version and had just launched a data connection interface!

But even then, Tableau stood for a fundamental transformation in business intelligence and data discovery, one that would end up up-ending the definition of the business intelligence market and seeing monolithic billion-dollar revenue companies fall to the vision of Tableau. It is one of the few times in my career that I have seen the analyst industry unanimously agree that a market needed to be effectively redefined. And this redefinition came from the understanding that the data analyst job was changing from the order-taking report builder that I had done early in my career to a data discoverer, contextualizer, and storyteller. That role has continued to evolve to this day and every analytics vendor has had to describe its ability to empower the data analyst.

Contextualizing the Background for Tableau Conference 2025

We face another fundamental shift in the data analyst world, this time driven by AI. Over the past three years, the emergence of generative AI and related agentic AI capabilities have created a new interface for computing that has led to terms including retrieval augmented generation, vibe coding, and agentic analytics to come into the technical vernacular. And in this light, vendors have sought to state that AI is here to help the data analyst and that no jobs will be lost.


Here at Amalgam Insights, I’m here to provide a more realistic perspective. There are jobs in the data analyst world that are focused on the basic creation of dashboards and some basic data cleansing. And these jobs may have been steady and reliable jobs for many years through the last era of the data analyst and the rise of the data preparation solutions that emerged over the past decade. But those jobs are going to disappear as agents start taking over basic capabilities. And I believe this is a trend that will hold across analytic platforms and solutions. Let’s be honest about what AI can do.

At the same time, AI struggles with maintaining attention across multi-stage tasks, lacks the creativity to look outside its training data or to proactively find new data, and still maintains both strategy and storytelling weaknesses when trying to work outside of a generic context. All this provides context for this year’s Tableau Conference 2025 and the big announcement of Tableau Next, which lends some potential answers for the future of the data analyst as well as the future of Tableau as a Salesforce-based solution.

Understanding Tableau Next

In 2025, Tableau’s current CEO, Ryan Aytay, kicked off Tableau Conference 2025 with an honest appeal to the data analyst on the future of Tableau and the commitment that exists for supporting Tableau as a set of solutions as well as a community. Since taking over as CEO of Tableau two years ago, Aytay has faced the interesting challenge of integrating Tableau with Salesforce in a way that takes advantage of Salesforce’s massive investments in data and AI without losing the magic that has made Tableau one of the most influential technologies and tech communities of the 21st century.

Tableau Next, which is currently available as part of the Tableau+ SKU, has been developed as three interrelated sets of technologies used to help support a more agentic use of analytics, which simply means that human analysts will have greater access to AI for supporting the busy work of data while business users will have greater access to natural language-based querying of data.

The first of the three sets of technical capabilities are the foundational layers used to support the data. Tableau describes these layers as an open data layer, a semantic layer, a visualization layer, and an action layer. For many long-time Tableau users, there has never really been a need to think about much more than the semantic and visualization layers, but as we’ve already established the times are a-changin’. The most interesting aspect of these layers to me is the re-use of Salesforce platform capabilities to expand Tableau’s functionality. That sentence alone probably needs some explaining.

So, when most people first hear of integration with Salesforce, they think of integration with Salesforce CRM or some combination of Sales Cloud, Service Cloud, or maybe Marketing Cloud. That is not what is meant here. Others with knowledge of Salesforce may think this refers to Heroku or some sort of application development platform. But that is not what is meant here, either. Rather, the type of integration I am referring to here is between Tableau and specific functional aspects of the Salesforce Platform.

For instance, the open data layer that Tableau is providing to support access to external data platforms like Databricks and Snowflake uses the Salesforce Data Cloud to provide real time data access to a wide variety of data lakes, data sources, and business applications. This capability is something that Tableau needed to build to improve data analyst access to data and avoid unnecessary imports and exports. Zero copy data access is always preferrable when possible. The open data layer does not completely eliminate the need to duplicate and transfer data, but it does reduce this need while allowing for greater data access and orchestration.

The action layer is a reuse of another long-time Salesforce platform capability, the process automation capability of Flow. And this use of Flow is really interesting to me because it will allow for greater process automation within Flow. Using the action layer will be less clunky than the current use of Tableau External Actions. Despite Tableau’s dominant market position, it has not been seen as a leading data workflow automation solution. It is no secret that the data analyst will be asked to both support the AI-based creation of commoditized work as well as to be more responsible for automating the contextualization and insight creation associated with new data.

Tableau Semantics is an additional step towards supporting context. Data context is all over the place between data warehouses, data catalogs, ETL tools, and even Retrieval Augmented Generation jobs used for AI models. Although the initial version of Tableau Semantics is focused on Tableau Next, Tableau Cloud, Tableau Server, and the Salesforce platform, which includes access to the recently announced Salesforce Agentforce, future versions are expected to support Tableau Semantics to third-party semantic layers where I believe the real value will lie. Having a full enterprise semantic layer under the umbrella of the solution that is often the first tool that uncovers big strategic insights from data would be extremely helpful.

Currently, the visualization layer of Tableau Next is probably the area that will get the most scrutiny from veteran users, especially as Tableau Next is designed to be an agentic solution while the classic Tableau solution is still most optimized for providing the widest variety of visual capabilities. Tableau Next’s visualizations shine in terms of performance and supporting real-time data in a composable and API-friendly manner. Visualizations in Tableau Next are focused on being application components, which is important as the data analyst will be held increasingly responsible for sharing data outputs across every imaginable channel including reports, dashboards, visualizations, workflow automations, APIs, applications, and agents. The less work analysts have to spend on making visualizations app and API-ready, the better.

Tableau Next Launches Agentic Analytic Skills


The second area of Tableau Next that analysts will notice is the set of pre-built agentic analytics skills that are mostly scheduled to come out in June 2025. Data analysts have understandably been concerned about the agentic and AI capabilities coming into data analyst work as there has been a lot of press over the past three years about AI taking away technical jobs and even pressure from employers to avoid hiring employees to do work that AI can do. So, what is Tableau Next’s AI intended to do?

At Tableau Conference, Tableau Chief Product Officer Southard Jones provided some guidance on areas where Tableau Next would provide additional context for data analysts, especially in areas that data analysts typically either do not enjoy or are unable to keep up with because of the overwhelming volume of potential requests.

Data Pro is an agentic assistant for data preparation and quality that aligns with the Tableau brand promise of supporting data analysts. If the average data analyst rated their work from a 1-5 scale, I am sure that data prep and quality tasks would rank a consistent 1 as the bane to any good analyst’s goal of making data insights shine. Even those concerned about AI will likely welcome any agentic help supporting prep, cleansing, and transformation tasks.
Tableau’s second agentic capability is Concierge, which answers natural language data requests and is aimed at the business user.

Although the data analyst can use Concierge to look over large reams of data and provide some guidance on how to describe the data in human language, the primary intention here seems to be in helping the average business user to create, organize, and leverage basic data charts and visualizations. It will be interesting to see both how these queries end up leading to more complex requests for data analyst support and if this insight creation may end up creating a massive amount of outputs that need to be curated and rationalized by… of course, the data analyst.

Although I am generally a fan of giving more people the self-service access to data that they seek, I do wonder if there will be unexpected challenges from giving people who lack data fluency the access to explore data and rapidly create insights that lead to “top 3 opportunities,” “top 4 opportunities,” “top 5 opportunities” and “top 6 opportunities” all ending up in a departmental dashboard or notebook or holding area of some sort that needs to be cleaned up. Perhaps we will be trading data prep for analytic debt and prep in the near future as the cost of getting sales the data they tactically need in real-time? As long as the business value is there, this is not really a problem as much as a bit of a redefinition of the data analyst’s role. Both Data Pro and Concierge are scheduled for a June 2025 Generally Available launch.

The third Tableau agentic capability, Inspector, is not scheduled for June, but later in 2025. Inspector is designed to constantly monitor data for changes and then provide alerts related to those changes. This is a capability that I believe will be very interesting to combine with Tableau Action Layer, as an alert based on either an IT problem or a customer service problem could quickly set off a series of root-cause analysis visualizations or data pings or agentic descriptions or a rapid response team. Inspector also extends Tableau’s functionality to become increasingly proactive and automated to fit with the data analyst’s increasing need to orchestrate responses while designing data-driven explanations of the truth.


Finally, Tableau is including both internal and external marketplaces to help Tableau users to share and reuse existing data and analytic assets. Over time, internal marketplaces may end up superseding some of the need for static dashboards as marketplace subscriptions or selections allow for greater customization of data connections and data assets. The ability to potentially build valuable data products on an external marketplace could be interesting as well, depending on whether the customer has data that is commercially valuable.

Recommendations for Current and Potential Tableau Customers

Overall, the Tableau Next offering demonstrates both Tableau’s ability to leverage capabilities from the Salesforce platform and the desire to help Tableau data analysts as analysts are being asked to do more and different activities based on their data and analytic capabilities. Based on these functionalities and capabilities, Amalgam Insights provides the following suggestions.

First, look at these capabilities as part of an exploration of how your data analyst job will change. Automation will take away some of your dashboarding and reporting responsibilities. Look at Tableau Next as a starting point to see if there are opportunities to become more of a virtuoso graphic visualizer working off open-ended questions or whether data analyst skills can be used to help orchestrate enterprise analytic outputs and requests more intelligently. I’ll be going deeper into the demands on the next generation data analyst in a separate piece, but this is the time to look at your 100 day plan to figure out what kind of data analyst you want to be and the 1000 day plan to get from here to there.

Second, look at the new capabilities across data layer, semantics, and especially the action layer. Data analysts will be increasingly asked to provide more programmatic and automated access to analytics. Although some of this access will be handled through self-service agents, there will still be demands to manually define actions and to translate agentic and generative AI requests into auditable actions and workflows. See how Tableau Action layer compares to existing data and process automation solutions in your organization and determine how much overlap may exist from a functional perspective.

Third, Amalgam Insights believes this is a good time to look at Concierge in terms of how it will fundamentally change analytic usage in your organization. Concierge is both a massive opportunity and risk, as is any technology that can potentially be used by half of your employees without a lot of training. The UX for Concierge is the same natural language that we have all learned in our generative AI experiments. But Amalgam Insights still recommends a careful multi-stage approach with an intermediate stage of testing with trusted users before opening up Concierge to everyone. The concern here is less about whether end users will get what they want and more about ensuring that there are not unintended consequences for the data and analytics teams. But an approach like Concierge is necessary for business data to have its intended impact on the majority of employees. We have learned that even Tableau can’t turn the majority of employees into data analysts in most companies, so Tableau has taken the next step of making data easier to query and explore.

Tableau Next introduces a new set of capabilities for the data analyst portfolio as data responsibilities continue to expand. The integration of Salesforce platform capabilities into Tableau should be seen as a net positive, as the data and agentic capabilities that Tableau Next is inheriting are strategically important to the whole of Salesforce. The biggest challenge that Tableau analysts now face is to figure out how to reconfigure their job responsibilities to reflect what can now be automated. From a practical perspective, it is likely that analysts need to proactively identify and learn the “value-added” activities that should be supported with the additional time that AI and automation make available. Amalgam Insights provides this list as a preview for our upcoming video “The Future of the Data Scientist.”

Posted on

Research Note: Informatica’s Spring 2025 Release Focuses on the Truth, the Whole Truth, and a Lot of AI

Though the promise of both generative and agentic AI have captured the imaginations of the vast majority of executive teams, businesses that have taken on substantial AI projects or proofs of concept have quickly found that AI is only as good as the data used to train models and agents as well as the data used to augment generative workloads and tools. In contrast to the past decade of Big Data where quality often took a backseat to the volumes and variety of data owned by the enterprise, the new era of AI forces companies to look at data quality, metadata at a holistic and universal level, and the ability to delete or modify outdated metadata and data relationships to support models and modern AI approaches. In addition, the constant research coming out monthly on the state of AI from every major consulting firm (Accenture, Deloitte, BCG, KPMG, etc…) keeps showing that the biggest reason for AI failure is that companies consistently spend too much money trying to do it themselves and not enough time doing their due diligence on the out-of-box or easily-configured solutions that are already in the market.

In this light, this research brief is intended to provide highlights of the new Informatica Spring 2025 release that Amalgam Insights believes will provide the greatest business value to the IT audience. This note is not intended to be a fully comprehensive overview of the release, but rather focused on the key features that will provide the greatest new sources of business value for Informatica customers.

From a data integration perspective, Informatica has created a CLAIRE-based Copilot for data integration that can generate ingestion, replication, and integration pipelines that are accompanied by the automation of business and technical logic associated with the pipeline. It is no secret that logic documentation is the bane of many an integration engineer as this task massively slows down the productivity of an integration team in creating and editing pipelines. But in an agile technology world where data integration to support timely and contextually shared versions of the truth is increasingly more important than simply creating a hardcoded single version of truth, the ability to quickly interpret, modify, and remove pipelines as needed is a vital capability that will likely save a minimum of one-to-two engineers of work in the first year for the average enterprise moving toward an organization-wide AI deployment.

Strategically, the most important new capability Informatica is bringing to market is the use of CLAIRE Intelligent Structure Discovery (ISD) to find patterns in unstructured data. For the past 15 years, businesses have been collecting data with the expectation that there are hidden gems. But the dirty honest truth is that most companies have lacked the tools or approach or visibility to even detect the patterns that would lead to value. And, true to Informatica’s long-held status as a neutral third-party in data, CLAIRE ISD is designed to allow users to bring their own preferred large language model, which makes sense in a world where everyone from OpenAI to Anthropic to Cohere to Tencent to Alibaba to Meta to DeepSeek are all providing new and better models on a seemingly daily basis. 

On the integration platform as a service side of Informatica, there are some interesting announcements on hyperscaler connectors and a high-performance application integration runtime that will be vital for cloud-based companies seeking greater agility and more flexible API usage.  And there is a Copilot to support the generation of in-app insights and app-to-app integrations that will be useful for more casual users seeking to create basic integrations, find new data, and summarize data environments. But as an analyst, I will demandingly state that I expected Informatica to provide this as an integration leader in the Era of GenAI.

But Amalgam Insights’ perspective is that the GenAI recipes announced are going to be the most interesting new capability for businesses in iPaaS as they enable process integration with hyperscalers, key enterprise applications, and vertical-specific use cases in areas including patient care and insurance claims. It is interesting to see integration players, including Informatica being asked to coordinate supply chain, commerce, and customer service processes while orchestrating data movement and integration, as this is often assumed to be more aligned to ServiceNow’s role in the enterprise. Not to say that Informatica and ServiceNow will directly compete against each other, but there are more similarities on the process automation side than some may assume.

Informatica has also provided updates to master data management in this new release, including the use of CLAIRE GPT for exploring and documenting master data while matching external data to golden records. To be philosophical for a moment, truth can sometimes feel more ephemeral as time goes by, data changes, and our environmental scenarios change. The ability to quickly update golden records of data is increasingly important to provide more potential context for each individual accessing these records for their own use cases. In an AI world, companies must augment data quickly, they must access golden records of truth quickly, and they must be able to expand the value and context of those golden records across all other relevant data and relevant models to maximize the value of their data.

This Spring Release provides upgrades for Data Governance and Privacy (DGP) as well and the capability that Amalgam Insights finds most interesting is actually not the generative AI functionality. Informatica has included CLAIRE-generated glossary definitions to make data more consistently well described and this does matter. But the most interesting functionality here is the integration of data access management with data governance and catalog to get more granular and individually defined data controls.  To make customer experience, user experience, employee experience, and perhaps even agentic experience better over time, each human user or AI agent or hybrid skill capacity needs to be provided with appropriate data access that may be broader or more narrow than someone else with a nominally similar role. This integration has the potential to make data ecosystems and governance more human, which is a bit of a trend in the generative AI world where we are starting to make computers think more like humans rather than forcing us to place the correct bits into the correct bucket of memory to access the appropriate workflow based on predefined computing resources.

On a final note, there have been multiple platform improvements as well, but the one that caught Amalgam Insights’ attention the most was the improvement to the Informatica Platform Units (IPU) consumption by supporting tagging for chargeback use cases. The open-ended tagging is a valuable starting point, but Informatica’s additional capabilities and the data sources it is typically connected to make it trivial to tag IPU consumption with general ledger-based financial categories that the business cares most about, whether it be cost center, profit center, geography, or strategic initiative. And for those in IT who still have to do actual work, integration with GitLab will be welcomed as the tasks of data integration and governance come closer and closer to the rest of the software development lifecycle and need similar versioning and configuration controls.

Key Recommendation: Amalgam Insights’ analysts understand as well as anyone that a vendor like Informatica can often be taken for granted as a core vendor that excels at data integration and governance. And there is nothing wrong with being consistent and reliable for core IT. But Amalgam Insights recommends that Informatica customers take a closer look at the Spring 2025 release as the significant CLAIRE AI augmentations across every major Informatica category, the ability to discover new truths in unstructured data, and the ability to further augment the truth quickly are all important abilities that AI-ready companies will need to support.

Posted on

Rob Enslin Joins Workday as Chief Commercial Officer

Workday announced today that Robert Enslin, longtime enterprise software executive best known for his time at SAP before his more recent stints at Google Cloud and UIPath, will be joining Workday as their chief commercial officer.

I’ll be especially interested in seeing if this hire improves Workday’s positioning of its financial suite, where many of the sourcing, planning, and analytics pieces are there but Workday is still struggling to gain CFO mindshare and displace incumbents at the enterprise level.

And honestly, a lot of this is because Workday still approaches its business from an HR-first mindset that is clear when you look at their AI assistant announcements and partner announcements. it is not enough to just say “HR and finance” instead of HR in press releases when the actual products are still focused on talent management and individuals.

I would love to see Workday focus more on the office of the CFO and the idea of talent-and-skills based finance or finance for the innovation-based business, which requires talent and subject matter expertise. These are areas where traditional monolithic ERPs struggle and where smaller finance startups lack visibility to employee skills. Perhaps an analyst firm or consulting firm that Workday listens to will bring this up somewhere down the road.

Or perhaps Rob Enslin will get to flex the skills and positioning that he showed at SAP to push Workday forward into being a true enterprise software player rather than the HR specialist it is best known for being. The products are there, the roadmap and integrations are mostly in place, and the partnership intentions are there. Now for the go-to-market to solidify and for Workday Finance to be more than a me-too add-on.

Posted on

Salesforce Announces New Agentic Management Capabilities

Today, Salesforce announced a variety of agentic management tools to automate testing, prototype in sandbox environments, and manage usage.

The two aspects that I am most interested in across-the-board are:

The AI generated testing in Agentforce Testing Center where I think it is going to be vital for agents to be stress tested with the help of AI. It will obviously be easier for AI to bring up a wide variety of potential tests for an agent.

In the next few months, it will honestly be fairly trivial to build a standard agent within most large enterprise application platforms. But the challenge will be in testing these agents to run at enterprise scale, and with the variety of languages, context, grammar, jargon, and patois that may exist across the world in describing demands.

As George Bernard Shaw said “England and America are two countries separated by a common language”. and that can be multiplied by the countries and rules and backgrounds that global companies are trying to support with their Salesforce agents.

The other part that I most excited about is what Salesforce calls Utterance Analysis. This is a real time analysis on the usage of an agent based on the user inputs, requests, and query outputs. There has long been a struggle in translating event logs into useful data simply because logs are overwhelming. Salesforce’s efforts in this area are an important step forward in incorporating log data into more
practical and consumable analytic form factors.

The one big question this press release does not tackle is around the orchestration and ongoing management of agent portfolios. Is it possible to find duplicate or similar agents and avoid the technical debt associated with managing 100s or thousands of agents going forward? It is a stated goal of Mark Benioff to have 1 billion agents built in a year. That is a great goal, but anyone who has ever worked IT or in sales ops knows that 1 billion custom objects, workflows, tests, agents, or any other documented item is always going to be an administrative burden.

Although I believe that Salesforce is making progress in this area, it is no secret that we look to Salesforce as providing a standard around enterprise governance for CRM and related applications. And I think this is an opportunity for Salesforce to show leadership in the ongoing management of agent portfolios at a time when the data and metadata in Salesforce are increasingly important to the valuation of the company as a strategic partner and to a publicly traded market capitalization.

Posted on

How the NBA is Teaching IT Procurement & Accounting to Work Together

Sports has increasingly become a showcase for back-end business capabilities that have long eschewed the spotlight: analytics, data, accounting, etc…

This recent ESPN article on the Knicks showcases the importance of their contract pro and combining strategic procurement (contract negotiations, KPIs, expiration dates, payment terms, vendor and client responsibilities) with the accounting knowledge to enforce and fully leverage those terms. And the Knicks’ player procurement Brock Aller gets a nice glow-up here because of his expertise across these areas in his complex spend category: player contracts and options.

Basketball has increasingly made “cap-ology” or the management of each team’s salary cap an important topic, as it often defines the practical limits of how much a professional basketball team can choose to improve. There is a practical lesson here for strategic IT procurement (or really all procurement) professionals on how to structure, reallocate, and maximize IT investment on a fixed budget or within a budget cap. I especially like the use of laddered rates, date-specific cutoffs and performance, and the use of commoditized or overlooked assets to trade for cash or optionality are all mentioned or hinted at here.

Even if I’m not a fan, the resurgence of the New York Knicks is a great case for how procurement and accounting need to work more closely together, ideally with a bridge person, to maximize value.

Posted on

Informatica World 2024: CFO Considerations for Financial Stewardship in the Era of AI

Amalgam Insights recently had the privilege of attending Informatica World 2024. This is a must-track event for every data professional if for no other reason than Informatica’s market leadership across data integration, master data management, data catalog, data quality, API management, and data marketplace offerings. It is hard to have a realistic understanding of the current state of enterprise data without looking at where Informatica is. And at a time when data is front-and-center as the key enabler for high-quality AI tools, 2024 is a year where companies must be well-versed in the various levels of data governance, management, and augmentation needed to make enterprise data valuable.

Of course, Informatica has embraced AI fully, almost to the point where I wonder if there will be a rebrand to AInformatica later this year! But all kidding aside, my focus in listening to the opening keynote was in hearing about how CEO Amit Walia and a group of select product leaders, customers, and partners would help build the case for how Informatica increases business value from the CFO office’s perspective.

Of course, there are a variety of ways to create value from a Data FinOps (the financial operations for data management) perspective, such as eliminating duplicate data sources, reducing the size of data through quality and cleansing efforts, optimizing data transformation and analytic queries, enhancing the business context and data outputs associated with data, and increasing the accessibility, integration, and connectedness of long-tail data to core data and metadata. But in the Era of AI, there is one major theme and Informatica defined exactly what it is.

Everybody’s ready for AI except your data.

Informatica kicked off its keynote with an appeal to imagination and showing “AI come to life” with the addition of relevant, high-quality data. Some of CEO Amit Walia’s first words were in warning that AI does not create value and is vulnerable to negative bias, lack of trust, and business risks without access to relevant and well-contextualized data. His assertion that data management (of course, an Informatica strength) “breathes life into AI” is both poetic and true from a practical perspective. The biggest weakness in enterprise AI today is the lack of context and anchoring because of dirty data and missing metadata that were ignored in an era of Big Data when we threw everything into a lake and hoped for the best. Informatica faces the challenge of cleaning up the mess created over the past decade as both the number of apps and volume of data have increased by an order of magnitude.

From a customer perspective, Informatica provided context from two Chief Data Officers during this keynote: Royal Caribbean’s Rafeh Masood and Takeda’s Barbara Latulippe. Both spoke about the need to be “AI Ready” with a focus on starting with a comprehensive data management and integration strategy. Masood’s 4Cs strategy for Gen AI of Clarity, Connecting the Dots, Change Management, and Continual Learning spoke to the fundamental challenges of anchoring AI with data and creating a data-driven culture to get to AI. As Amit Walia stated at the beginning: everybody is ready for AI except your data.

Latulippe’s approach at Takeda provided some additional tactics that should resonate with financial buyers, such as moving to the cloud to reduce data center sites, purchasing data from a variety of sources to augment and improve the value of corporate data as an asset, and consolidating data vendors from eight to two and increasing the operational role of Informatica within the organization in the process. Latulippe also mentioned a 40% cost reduction from building a unified integration hub and a data factory investment that provided a million dollars in savings from improved data preparation and cleansing. (In using these metrics as a guidepost for potential savings, Amalgam Insights cautions that the financial benefits associated with the data factory are dependent on the value of the work that data engineers and data analysts are able to pursue by avoiding scut work: some companies may not have additional data work to conduct while others may see even greater value by shifting labor to AI and high business value use cases.)

Amit Walia also brought four of Informatica’s product leaders on stage to provide roadmaps across Master Data Management, Data Governance, Data Integration, and Data management. Manouj Tahilani, Brett Roscoe, Sumeet Agrawal, and Gaurav Pathak walked the audience through a wide range of capabilities, many of which were focused on AI-enhanced methods of tracking data lineage, creating pipelines and classifications, and improved metadata and relationship creation above and beyond what is already available with CLAIRE, Informatica’s AI-powered data management engine.

Finally, the keynote ended with what has become a tradition: enshrining the Microsoft-Informatica relationship with a discussion from a high-level Microsoft executive. This year, Scott Guthrie provided the honors in discussing the synergies between Microsoft Fabric and Informatica’s Data Management Cloud.

Recommendations for the CFO Looking at Data Challenges and CIOs seeking to be financial stewards

Beyond the hype of AI is a new set of data governance and management responsibilities that must be pursued if companies are to avoid unexpected AI bills and functional hallucinations. Data environments must be designed so that all business data can now be used to help center and contextualize AI capabilities. On the FinOps and financial management side of data, a couple of capabilities that especially caught my attention were:

IPU consumption and chargeback: The Informatica Data Management Cloud, the cloud-based offering for Informatica’s data management capabilities, is priced in Informatica Pricing Units based on its billing schedule. The ability to now chargeback capabilities to departments, locations, and relevant business units is increasingly important in ensuring that data is fully accounted for as an operational cost or as a cost of goods sold, as appropriate. The Total Cost of Ownership for new AI projects cannot be fully understood without understanding the data management costs involved.

Multiple mentions of FinOps, mostly aligned to Informatica’s ability to optimize data processing and compute configurations. CLAIRE GPT is expected to further help with this analysis as it provides greater visibility to the data lineage, model usage, data synchronization, and other potential contributors to high-cost transactions, queries, agents, and applications.

And the greatest potential contribution to data productivity is the potential for CLAIRE GPT to accelerate the creation of new data workflows with documented and governed lineage from weeks to minutes. This “weeks to minutes” value proposition is fundamentally what CFOs should be looking for from a productivity perspective rather than more granular process mapping improvements that may promise to shave a minute off of a random process. Grab the low-hanging fruit that will result in getting 10x or 100x more work done in areas where Generative AI excels: complex processes and workflows defined by complex human language.

CFO’s should be aware that, in general, we are starting to reach a point where every standard IT task that has traditionally taken several weeks to approve, initiate, assign resources, write, test, govern, move to production, and deploy in an IT-approved manner is becoming either a templated or a Generative AI supported capability that can be done in a few minutes. This may be an opportunity to reallocate data analysts and engineers to higher-level opportunities, just as the self-service analytics capabilities a decade ago allowed many companies to advance their data abilities from report and dashboard building to higher-level data analysis. We are about to see another quantum leap in some data engineering areas. This is a good time to evaluate where large bottlenecks exist in making the company more data-driven and to invest in Generative AI capabilities that can quickly help move one or more full-time equivalents to higher value roles such as product and revenue support or optimizing data environments.

Based on my time at Informatica World, it was clear that Informatica is ready to massively accelerate standard data quality and governance challenges that have been bottlenecks. Whether companies are simply looking for a tactical way to accelerate access to the thousands of apps and data sources that are relevant to their business or if they are more aggressively pursuing AI initiatives in the near term, the automation and generative AI-powered capabilities introduced by Informatica provide an opportunity for companies to step forward and improve the quality and relevance of their data in a relatively cost-effective manner compared to legacy and traditional data management tools.

Posted on

What Happened In Tech? – AI has its Kardashians Moment with OpenAI’s Chaotic Weekend

The past week has been “Must See TV” in the tech world as AI darling OpenAI provided a season of Reality TV to rival anything created by Survivor, Big Brother, or the Kardashians. Although I often joke that my professional career has been defined by the well-known documentaries of “The West Wing,” “Pitch Perfect,” and “Sillcon Valley,” I’ve never been a big fan of the reality TV genre as the twist and turns felt too contrived and over the top… until now.

Starting on Friday, November 17th, when The Real Housewives of OpenAI started its massive internal feud, every organization working on an AI project has been watching to see what would become of the overnight sensation that turned AI into a household concept with the massively viral ChatGPT and related models and tools.

So, what the hell happened? And, more importantly, what does it mean for the organizations and enterprises seeking to enter the Era of AI and the combination of generative, conversational, language-driven, and graphic capabilities that are supported with the multi-billion parameter models that have opened up a wide variety of business processes to natural language driven interrogation, prioritization, and contextualization?

The Most Consequential Shake Up In Technology Since Steve Jobs Left Apple

The crux of the problem: OpenAI, the company we all know as the creator of ChatGPT and the technology provider for Microsoft’s Copilots, was fully controlled by another entity, OpenAI, the nonprofit. This nonprofit was driven by a mission of creating general artificial intelligence for all of humanity. The charter starts with“OpenAI’s mission is to ensure that artificial general intelligence (AGI) – by which we mean highly autonomous systems that outperform humans at most economically valuable work – benefits all of humanity. We will attempt to directly build safe and beneficial AGI, but will also consider our mission fulfilled if our work aids others to achieve this outcome.”

There is nothing in there about making money. Or building a multi-billion dollar company. Or providing resources to Big Tech. Or providing stakeholders with profit other than highly functional technology systems. In fact, further in the charter, it even states that if a competitor shows up with a project that is doing better at AGI, OpenAI commits to “stop competing with and start assisting this project.”

So, that was the primary focus of OpenAI. If anything, OpenAI was built to prevent large technology companies from being the primary force and owner of AI. In that context, four of the six board members of OpenAI decided that open AI‘s efforts to commercialize technology were in conflict with this mission, especially with the speed of going to market, and the shortcuts being made from a governance and research perspective.

As a result, they ended up firing both the CEO, Sam, Altman and removed President COO Greg Brockman, who had been responsible for architecting that resources and infrastructure associated with OpenAI, from the board. That action begat this rapid mess and chaos for this 700+ employee organization which was allegedly about to see an 80 billion dollar valuation

A Convoluted Timeline For The Real Housewives Of Silicon Valley

Friday: OpenAI’s board fires its CEO and kicks its president Greg Brockman off the board. CTO Mira Murati, who was called the night before, was appointed temporary CEO. Brockman steps down later that day.

Saturday: Employees are up in arms and several key employees leave the company, leading to immediate action by Microsoft going all the way up to CEO Satya Nadella to basically ask “what is going on? And what are you doing with our $10 billion commitment, you clowns?!” (Nadella probably did not use the word clowns, as he’s very respectful.)

Sunday: Altman comes in the office to negotiate with Microsoft and OpenAI’s investors. Meanwhile, OpenAI announces a new CEO, Emmett Shear, who was previously the CEO of video game streaming company Twitch. Immediately, everyone questions what he’ll actually be managing as employees threaten to quit, refuse to show up to an all-hands meeting, and show Altman overwhelming support on social media. A tumultuous Sunday ends with an announcement by Microsoft that Altman and Brockman will lead Microsoft’s AI group.

Monday: A letter shows up asking the current board to resign with over 700 employees threatening to quit and move to the Microsoft subsidiary run by Altman and Brockman. Co-signers include board member and OpenAI Ilya Sutskever, who was one of the four board votes to oust Altman in the first place.

Tuesday: The new CEO of OpenAI, Emmett Shear, states that he will quit if the OpenAI board can’t provide evidence of why they fired Sam Altman. Late that night, Sam Altman officially comes back to OpenAI as CEO with a new board consisting initially of Bret Taylor, former co-CEO of Salesforce, Larry Summers (former Secretary of the Treasury), and Adam d’Angelo, one of the former board members who voted to figure Sam Altman. Helen Toner of Georgetown and Tasha McCauley, both seen as ethical altruists who were firmly aligned with OpenAI’s original mission, both step down from the board.

Wednesday: Well, that’s today as I’m writing this out. Right now, there are still a lot of questions about the board, the current purpose of OpenAI, and the winners and losers.

Keep In Mind As We Consider This Wild And Crazy Ride

OpenAI was not designed to make money. Firing Altman may have been defensible from OpenAI’s charter perspective to build safe General AI for everyone and to avoid large tech oligopolies. But if that’s the case, OpenAI should not have taken Microsoft’s money. OpenAI wanted to have its cake and eat it as well with a board unused to managing donations and budgets at that scale.

Was firing Altman even the right move? One could argue that productization puts AI into more hands and helps prepare society for an AGI world. To manage and work with superintelligences, one must first integrate AI into one’s life and the work Altman was doing was putting AI into more people’s hands in preparation for the next stage of global access and interaction with superintelligence.

At the same time, the vast majority of current OpenAI employees are on the for-profit side and signed up, at least in part, because of the promise of a stock-based payout. I’m not saying that OpenAI employees don’t also care about ethical AI usage, but even the secondary market for OpenAI at a multi-billion dollar valuation would help pay for a lot of mortgages and college bills. But tanking the vast majority of employee financial expectations is always going to be a hard sell, especially if they have been sold on a profitable financial outcome.

OpenAI is expensive to run: probably well over 2 billion dollars per year, including the massive cloud bill. Any attempt to slow down AI development or reduce access to current AI tools needs to be tempered by the financial realities of covering costs. It is amazing to think that OpenAI’s board was so naïve that they could just get rid of the guy who was, in essence, their top fundraiser or revenue officer without worrying about how to cover that gap.

Primary research versus go-to-market activities are very different. Normally there is a church-and-state type of wall between these two areas exactly because they are to some extent at odds with each other. The work needed to make new, better, safer, and fundamentally different technology is often conflicted with the activity used to sell existing technology. And this is a division that has been well established for decades in academia where patented or protected technologies are monetized by a separate for-profit organization.

The Effective Altruism movement: this is an important catchphrase in the world of AI, as it is not just defined as a dictionary definition. This is a catchphrase for a specific view of developing artificial general intelligence (superintelligences beyond human capacity) with the goal of supporting a population of 10^58 millennia from now. This is one extreme of the AI world, which is countered by a “doomer” mindset thinking that AI will be the end of humanity.

Practically, most of us are in between with the understanding that we have been using superhuman forces in business since the Industrial Revolution. We have been using Google, Facebook, data warehouses, data lakes, and various statistical and machine learning models for a couple of decades that vastly exceed human data and analytic capabilities.

And the big drama question for me: What is Adam d’Angelo still doing on the board as someone who actively caused this disaster to happen? There is no way to get around the fact that this entire mess was due to a board-driven coup and he was part of the coup. It would be surprising to see him stick around for more than a few months especially now that Bret Taylor is on board, who provides an overlap of experiences and capabilities that d’Angelo possesses, but at greater scale.

The 13 Big Lessons We All Learned about AI, The Universe, and Everything

First, OpenAI needs better governance in several areas: board, technology, and productization.

  1. Once OpenAI started building technologies with commercial repercussions, the delineation between the non-profit work and the technology commercialization needed to become much clearer. This line should have been crystal clear before OpenAI took a $10 billion commitment from Microsoft and should have been advised by a board of directors that had any semblance of experience in managing conflicts of interest at this level of revenue and valuation. In particular, Adam d’Angelo as the CEO of a multi-billion dollar valued company and Helen Toner of Georgetown should have helped to draw these lines and make them extremely clear for Sam Altman prior to this moment.
  2. Investors and key stakeholders should never be completely surprised by a board announcement. The board should only take actions that have previously been communicated to all major stakeholders. Risks need to be defined beforehand when they are predictable. This conflict was predictable and, by all accounts, had been brewing for months. If you’re going to fire a CEO, make sure your stakeholders support you and that you can defend your stance.
  3. You come at the king, you best not miss.” As Omar said in the famed show “The Wire,” you cannot try to take out the head of an organization unless your followup plan is tight.
  4. OpenAI’s copyright challenges feel similar to when Napster first became popular as a streaming platform for music. We had to collectively figure out how to avoid digital piracy while maintaining the convenience that Napster provided for supporting music and sharing other files. Although the productivity benefits make generative AI worth experimenting with, always make sure that you have a back up process or capability for anything supported with generative AI.

    OpenAI and other generative AI firms have also run into challenges regarding the potential copyright issues associated with their models. Although a number of companies are indemnifying clients from damages associated with any outputs associated with their models, companies will likely still have to stop using any models or outputs that end up being associated with copyrighted material.

    From Amalgam Insights’ perspective, the challenge with some foundational models is that training data is used to build the parameters or modifiers associated with a model. This means that the copyrighted material is being used to help shape a product or service that is being offered on a commercial basis. Although there is no legal precedent either for or against this interpretation, the initial appearance of this language fits with the common sense definitions of enforcing copyright on a commercial basis. This is why the data collating approach that IBM has taken to generative AI is an important differentiator that may end up being meaningful.
  5. Don’t take money if you’re not willing to accept the consequences. This is a common non-profit mistake to accept funding and simply hope it won’t affect the research. But the moment research is primarily dependent on one single funder, there will always be compromises. Make sure those compromises are expressly delineated in advance and if the research is worth doing under those circumstances.
  6. Licensing nonprofit technologies and resources should not paralyze the core non-profit mission. Universities do this all the time! Somebody at OpenAI, both in the board and at the operational level, should be a genius at managing tech transfer and commercial utilization to help avoid conflicts between the two institutions. There is no reason that the OpenAI nonprofit should be hamstrung by the commercialization of its technology because there should be a structure in place to prevent or minimize conflicts of interest other than firing the CEO.

    Second, there are also some important business lessons here.
  7. Startups are inherently unstable. Although OpenAI is an extreme example, there are many other more prosaic examples of owners or boards who are unpredictable, uncontrollable, volatile, vindictive, or otherwise unmanageable in ways that force businesses to close up shop or to struggle operationally. This is part of the reason that half of new businesses fail within five years.
  8. Loyalty matters, even in the world of tech. It is remarkable that Sam Altman was backed by over 90% of his team on a letter saying that they would follow him to Microsoft. This includes employees who were on visas and were not independently rich, but still believed in Sam Altman more than the organization that actually signed their paychecks. Although it never hurts to also have Microsoft’s Kevin Scott and Satya Nadella in your corner and to be able to match compensation packages, this also speaks to the executive responsibility to build trust by creating a better scenario for your employees than others can provide. In this Game of Thrones, Sam Altman took down every contender to the throne in a matter of hours.
  9. Microsoft has most likely pulled off a transaction that ends up being all but an acquisition of OpenAI. It looks like Microsoft will end up with the vast majority of OpenAI’s‘s talent as well as an unlimited license to all technology developed by OpenAI. Considering that OpenAI was about to support a stock offering with an $80 billion market cap, that’s quite the bargain for Microsoft. In particular, Bret Taylor’s ascension to the board is telling as his work at Twitter was in the best interests of the shareholders of Twitter in accepting and forcing an acquisition that was well in excess of the publicly-held value of the company. Similarly, Larry Summers, as the former president of Harvard University, is experienced in balancing non-profit concerns with the extremely lucrative business of Harvard’s endowment and intellectual property. As this board is expanded to as many as nine members, expect more of a focus on OpenAI as a for-profit entity.
  10. With Microsoft bringing OpenAI closer to the fold, other big tech companies that have made recent investments in generative AI now have to bring those partners closer to the core business. Salesforce, NVIDIA, Alphabet, Amazon, Databricks, SAP, and ServiceNow have all made big investments in generative AI and need to lock down their access to generative AI models, processors, and relevant data. Everyone is betting on their AI strategy to be a growth engine over the next five years and none can afford a significant misstep.
  11. Satya Nadella’s handling of the situation shows why he is one of the greatest CEOs in business history. This weekend could have easily been an immense failure and a stock price toppling event for Microsoft. But in a clutch situation, Satya Nadella personally came in with his executive team to negotiate a landing for openAI, and to provide a scenario that would be palatable both to the market and for clients. The greatest CEOs have both the strategic skills to prepare for the future and the tactical skills to deal with immediate crisis. Nadella passes with flying colors on all accounts and proves once again that behind the velvet glove of Nadella’s humility and political savvy is an iron fist of geopolitical and financial power that is deftly wielded.
  12. Carefully analyze AI firms that may have similar charters for supporting safe AI, and potentially slowing down or stopping product development for the sake of a higher purpose. OpenAI ran into challenges in trying to interpret its charter, but the charter’s language is pretty straightforward for anyone who did their due diligence and took the language seriously. Assume that people mean what they say. Also, consider that there are other AI firms that have similar philosophies to OpenAI, such as Anthropic, which spun off of OpenAI for reasons similar to the OpenAI board reasoning of firing Sam Altman. Although it is unlikely that Anthropic (or large firms with safety-first philosophies like Alphabet and Meta’s AI teams) will fall apart similarly, the charters and missions of each organization should be taken into account in considering their potential productization of AI technologies.
  13. AI is still an emerging technology. Diversify, diversify, diversify. It is important to diversify your portfolio and make sure that you were able to duplicate experiments on multiple foundation models when possible. The marginal cost of supporting duplicate projects pales in comparison to the need to support continuity and gain greater understanding of the breath of AI output possibilities. With the variety of large language models, software vendor products, and machine learning platforms on the market, this is a good time to experiment with multiple vendors while designing process automation and language analysis use cases.
Posted on

8 Keys to Managing the Linguistic Copycats that are Large Language Models

Over the past year, Generative AI has taken the world by storm as a variety of large language models (LLMs) appeared to solve a wide variety of challenges based on basic language prompts and questions.

A partial list of market-leading LLMs currently available include:

Amazon Titan
Anthropic Claude
Cohere
Databricks Dolly
Google Bard, based on PaLM2
IBM Watsonx
Meta Llama
OpenAI’s GPT

The biggest question regarding all of these models is simple: how to get the most value out of them. And most users fail because they are unused to the most basic concept of a large language model: they are designed to be linguistic copycats.

As Andrej Karpathy of OpenAI stated earlier this year,

"The hottest new programming language is English."

And we all laughed at the concept for being clever as we started using tools like ChatGPT, but most of us did not take this seriously. If English really is being used as a programming language, what does this mean for the prompts that we use to request content and formatting?

I think we haven’t fully thought out what it means for English to be a programming language either in terms of how to “prompt” or ask the model how to do things correctly or how to think about the assumptions that an LLM has as a massive block of text that is otherwise disconnected from the real world and lacks the sensory input or broad-based access to new data that can allow it to “know” current language trends.

Here are 8 core language-based concepts to keep in mind when using LLMs or considering the use of LLMs to support business processes, automation, and relevant insights.

1) Language and linguistics tools are the relationships that define the quality of output: grammar, semantics, semiotics, taxonomies, and rhetorical flourishes. There is a big difference between asking for “write 200 words on Shakespeare” vs. “elucidate 200 words on the value of Shakespeare as a playwright, as a poet, and as a philosopher based on the perspective on Edmund Malone and the English traditions associated with blank verse and iambic pentameter as a preamble to introducing the Shakespeare Theatre Association.”

I have been a critic of the quality that LLMs provide from an output perspective, most recently in my perspective “Instant Mediocrity: A Business Guide to ChatGPT in the Enterprise.” https://amalgaminsights.com/2023/06/06/instant-mediocrity-a-business-guide-to-chatgpt-in-the-enterprise/. But I readily acknowledge that the outputs one can get from LLMs will improve. Expert context will provide better results than prompts that lack subject matter knowledge

2) Linguistic copycats are limited by the rules of language that are defined within their model. Asking linguistic copycats to provide language formats or usage that are not commonly used online or in formal writing will be a challenge. Poetic structures or textual formats referenced must reside within the knowledge of the texts that the model has seen. However, since Wikipedia is a source for most of these LLMs, a contextual foundation exists to reference many frequently used frameworks.

3) Linguistic copycats are limited by the frequency of vocabulary usage that they are trained on. It is challenging to get an LLM to use expert-level vocabulary or jargon to answer prompts because the LLM will typically settle for the most commonly used language associated with a topic rather than elevated or specific terms.

This propensity to choose the most common language associated with a topic makes it difficult for LLM-based content to sound unique or have specific rhetorical flourishes without significant work from the prompt writer.

4) Take a deep breath and work on this. Linguistic copycats respond to the scope, tone, and role mentioned in a prompt. A recent study found that, across a variety of LLM’s, the prompt that provided the best answer for solving a math problem and providing instructions was not a straightforward request such as “Let’s think step by step,” but “Take a deep breath and work on this problem step-by-step.”

Using a language-based perspective, this makes sense. The explanations of mathematical problems that include some language about relaxing or not stressing would likely be designed to be more thorough and make sure the reader was not being left behind at any step. The language used in a prompt should represent the type of response that the user is seeking.

5) Linguistic copycats only respond to the prompt and the associated prompt engineering, custom instructions, and retrieval data that they can access. It is easy to get carried away with the rapid creation of text that LLM’s provide and mistake this for something resembling consciousness, but the response being created is a combination of grammatical logic and the computational ability to take billions of parameters into account across possibly a million or more different documents. This ability to access relationships across 500 or more gigabytes of information is where LLMs do truly have an advantage over human beings.

6) Linguistic robots can only respond based on their underlying attention mechanisms that define their autocompletion and content creation responses. In other words, linguistic robots make judgment calls on which words are more important to focus on in a sentence or question and use that as the base of the reply.

For instance, in the sentence “The cat, who happens to be blue, sits in my shoe,” linguistic robots will focus on the subject “cat” as the most important part of this sentence. The cat “happens to be,” implies that this isn’t the most important trait. The cat is blue. The cat sits. The cat is in my shoe. The words include an internal rhyme and are fairly nonsensical. And then the next stage of this process is to autocomplete a response based on the context provided in the prompt.

7) Linguistic robots are limited by a token limit for inputs and outputs. Typically, a token is about four characters while the average English content word is about 6.5 characters (https://core.ac.uk/download/pdf/82753461.pdf). So, when an LLM talks about supporting 2048 tokens, that can be seen as about 1260 words, or about four pages of text, for concepts that require a lot of content. In general, think of a page of content as being about 500 tokens and a minute of discussion typically being around 200 tokens when one is trying to judge how much content is either being created or entered into an LLM.

8) Every language is dynamic and evolves over time. LLMs that provide good results today may provide significantly better or worse results tomorrow simply because language usage has changed or because there are significant changes in the sentiment of a word. For instance, the English language word “trump” in 2015 has a variety of political relationships and emotional associations that are now standard to language usage in 2023. Be aware of these changes across languages and time periods in making requests, as seemingly innocuous and commonly used words can quickly gain new meanings that may not be obvious, especially to non-native speakers.

Conclusion

The most important takeaway of the now-famous Karpathy quote is to take it seriously not only in terms of using English as a programming language to access structures and conceptual frameworks, but also to understand that there are many varied nuances built into the usage of the English language. LLM’s often incorporate these nuances even if those nuances haven’t been directly built into models, simply based on the repetition of linguistic, rhetorical, and symbolic language usage associated with specific topics.

From a practical perspective, this means that the more context and expertise provided in asking an LLM for information and expected outputs, the better the answer that will typically be provided. As one writes prompts for LLMs and seek the best possible response, Amalgam Insights recommends providing the following details in any prompt:

Tone, role, and format: This should include a sentence that shows, by example, the type of tone you want. It should explain who you are or who you are writing for. And it should provide a form or structure for the output (essay, poem, set of instructions, etc…). For example, “OK, let’s go slow and figure this out. I’m a data analyst with a lot of experience in SQL, but very little understanding of Python. Walk me through this so that I can explain this to a third grader.”

Topic, output, and length: Most prompts start with the topic or only include the topic. But it is important to also include perspective on the size of the output. Example, “I would like a step by step description of how to extract specific sections from a text file into a separate file. Each instruction should be relatively short and comprehensible to someone without formal coding experience.”

Frameworks and concepts to incorporate: This can include any commonly known process or structure that is documented, such as an Eisenhower Diagram, Porter’s Five Forces, or the Overton Window. As a simpe example, one could ask, “In describing each step, compare each step to the creation of a pizza, wherever possible.”

Combining these three sections together into a prompt should provide a response that is encouraging, relatively easy to understand, and compares the code to creating a pizza.

In adapting business processes based on LLMs to make information more readily available for employees and other stakeholders, be aware of these biases, foibles, and characteristics associated with prompts as your company explores this novel user interface and user experience.

Posted on 1 Comment

Zoom Faces Challenges in Navigating the Age of Generative AI

Note: This piece was accurate as of the time it was written, but on August 11th, Zoom edited its Service Agreement to remove the most egregious claims around content ownership. Its current language is more focused on the limited license needed to deliver content and establishes that user content is owned by the user. Amalgam Insights considers the changes made as of August 11th to be more in-line both with industry standards and with enterprise compliance concerns.

On August 7, 2023, Zoom announced a change to its terms and conditions in response to language discovered in Zoom’s service agreement that gave Zoom nearly unlimited capability to collect data and an unlimited license to use this information going forward for any commercial use. In doing so, Zoom has brought up a variety of intellectual property and AI issues that are important for every software vendor, IT department, and software sourcing group to consider over the next 12-18 months.

Analyzing Zoom’s Service Agreement Language

This discovery seems to have been a few months in the making as these changes seem to have initially been made back in March 2023 as it was launching some AI capabilities. Looking at each section, we can see that 10.2 and 10.3 focus on the usage of data.

Although this data usage may seem aggressive at first, one has to understand that Zoom‘s primary function is video conferencing, which requires moving both video and audio data across multiple servers to get from one point to another. This requires Zoom to have broad permission to transfer all data involved in a standard video, conference, or webinar, which includes all the data being used and all of the service data created. So, in this case, Amalgam Insights believes this access to data is not such a big deal as Zoom probably needed to update this language simply to support even basic augments, such as cleaning up audio or improving visual quality with any sort of artificial or machine learning capabilities.

However, in Amalgam, insights perspective, 10.4 is of much more aggressive set of terms. This change provides Zoom with a broad-ranging commercial license to any data used on Zoom‘s platform. This means that your face, your voice, and any trade, secrets, patents, or trademarks used on Zoom now become commercially usable by Zoom. Whether this was the intention or not, this section both sounds aggressive and crosses the line on the treatment that companies expect for their own data.

This is an extremely aggressive stance by most intellectual property standards. And it stands out as conflicting in comparison, to how data is positioned by Microsoft and Salesforce, enterprise application platform companies that aren’t exactly considered innocent or naïve in terms of running a business.

What went wrong here? Zoom is traditionally known as a company that is for the most part end user-centric. Zoom’s mission includes the goal, to “improve the quality and effectiveness of communications. We deliver happiness.” And Eric Yuan’s early stories about wanting to speak with loved ones remotely and refusing to do on-site meetings in promoting the power of remote meetings are part of the Zoom legend.

However, Zoom is also facing the challenge of meeting institutional shareholder demands to increase stock value. When Zoom’s stock rose in the pandemic, it reached such amazing heights that it led to extreme pressure for Zoom to figure out how to 5X or 10X their company revenue quickly. Knowing that the stock was in a bit of a bubble, Zoom initially tried to purchase Five9, a top-notch cloud contact center solution, but ran into problems during the acquisition process as the stock prices of each company ended up being too volatile to come to an agreement on both the value and price of the stock involved.

And I speculate that at this point Zoom is focused on bringing its stock back up to pandemic heights, a bubble that may honestly never be reached again. For Zoom, 2020 was a dot-com-like event, where its valuation wildly exceeded its revenue. And as other video conferencing, and event software solutions ended up quickly improving their products, Zoom’s core conferencing capabilities started to be seen as a somewhat commoditized capability.

Following the mission of the company would have meant looking more deeply at communications-based processes, collaboration, transcription, and perhaps even emoji and social media enhancement: all of the ways that we communicate with each other. But, the problem is that there is really only one play right now that can quickly leads to a doubling or tripling of stock price and that is AI. There’s no doubt that the amount of video and audio that Zoom processes on a daily basis can train a massive language model, as well as other machine learning models focused on re-creating and enhancing video and audio.

Positioned in a way where it was understood that Zoom would enhance current communicative capabilities, it could’ve been a very positive announcement for Zoom to talk about new AI capabilities. Zoom has taken initial steps to integrate AI into Zoom with Meeting Summary and Team Chat Compose products. But given the limited capabilities of these products, the licensing language used in the service agreement seems excessive.

The language used in section 10 of Zoom’s service agreement is very clear about maintaining the right to license and commercialized all aspects of any data collected by Zoom. And that statement has not been modified. Whether this is because of an overactive lawyer or Zoom’s future ambitions, or promises made to a board or institutional investors is beyond my pay grade and visibility. But I do know that that phrase is obviously not user-friendly, and Zoom is not providing visibility to those changes at the administrative level. The language and buttons used to support zooms, a model and commercialization efforts are very different on the administrative page compared to the language used in the service agreement.

Image from Zoom’s August 7th blog post

Understanding that legal language can take time to change, it makes sense to wait a few days to see if Zoom reverts to prior language or further modifies section 10 to represent a more user-friendly and client-friendly promise. And I think this language reflects a couple of issues that go far beyond Zoom.

First, service agreements for software companies in general, are often treated as an exercise in providing companies with maximum flexibility, while taking away basic rights from end users. This is not just a product management issue; this is an industry issue where this language and behavior is considered status quo both in the technology industry and in the legal profession. When companies like Alphabet and Meta, previously Facebook, were able to get away with the level of data collection associated with supporting each free user without facing governance or compliance consequences in most of the world, that set a standard for tech companies’ corporate counsel. Honestly, the language used in Zoom‘s current service agreement as of August 7, 2023 is not out of scope for many companies in the consumer world that provide social technologies.

The second issue is the overwhelming pressure that exists to be first or early to market in AI. The remarkable success of ChatGPT and other open AI-related models has shown that there is demand for AI that is either interesting or useful and can be easily used and accessed by the typical user or customer. This demand is especially high for any company that has a significant amount of text, data, audio, or video. The recent March 2023 announcement of Bloomberg GPT is only the starting point of what will be a wide variety of custom language, models and machine learning models that come to market over the next 12 to 18 months. Zoom obviously wants to be part of that discussion, and there are other companies, such as Microsoft, Adobe and Alphabet as well as noted start-ups like OpenAI that have done amazing AI work with audio and video already. Part of the reason that this stands out is that Zoom is one of the first companies to change its policies and aggressively seek a permanent commercial license associated with all user content and forcing and opt-out process that lacks auditability or documentation regarding how users can trust that their data is no longer being used to train models or support any other commercial activities Zoom may wish to pursue. But Amalgam Insights is absolutely sure that Zoom will not be the last company to do this by any means. This language and the response should also serve as both a warning and a lesson to all other companies, seeking to significantly change their service agreements to support AI projects.

What is next for Zoom?

From Amalgam Insights’ perspective, there are three potential directions that Zoom can pursue going forward.

One, do nothing or make minimal changes to the current policy. Consumer and social media-based technology policies have set a precedent for the level of data and licensing access in Zoom’s service agreement, but this level of customer data usage is considered extreme in most business software agreements. Will Zoom end up being a test case for pushing the boundaries for business data use? This seems unlikely given that Zoom has not traditionally been considered an aggressive company in pushing customer norms. Zoom does try to move fast and scale fast, but Zoom’s mistakes have typically been more due to incomplete processes rather than acts of commission and intentionally trying to push boundaries.

Two, rewrite parts of Section 10 that are intrusive from a licensing and commercial usage perspective. Amalgam Insights hopes that this is an opportunity for Zoom to lead from an end user licensing or service agreement perspective in making agreements more transparent and in using more exact legal language that feels cooperative instead of coercive. The legal approach of including all possible scenarios may be considered professionally competent, but the business optics are antagonistic.

Three, come out with an explicit enterprise version of technology that is not managed under these current rules set in section 10 so that data is not explicitly used for models and cannot easily be turned on through a simple toggle switch in the administration console. As my friend and data management analyst extraordinaire Tony Baer stated on LinkedIn (where you should be following him) “The solution for Zoom is to be more explicit: an enterprise version where data, no matter how anonymized, is not shared for Generative AI or any other Zoom commercial purpose whatsoever, and maybe a more general and/or freemium edition (which is how many consumers have already been roped in) where Zoom can do its Gen AI thing.”

Recommendations

The first recommendation is actually aimed towards the CIO office, procurement office, and other software purchasers. Be aware that your software provider is going to pursue AI and will likely need to change terms and conditions associated with your account to do so. This is a challenge, as multinational enterprises now face the possibility of approaching or exceeding 1,000 apps and data sources under management and even businesses of 250 employees or less average one app per employee. There is a massive race towards aggregating data, building custom AI models, and commercializing the outputs as benchmarks, workflows, automation, and guidance. But Zoom is not a one-off situation and your organization isn’t going to escape the issues brought up in Zoom’s service agreement language just by moving to another provider. This is an endemic and market-wide challenge, far beyond what Zoom is experiencing.

The second recommendation: One solution to this problem may be for vendors to split their product into public consumer-facing products and private products from a EULA and terms and conditions perspective. This wouldn’t be the worst approach, and would maintain the consumer expectation of free services that are subsidized by data and access while giving businesses, the confidence that they are working with a solution that will protect their intellectual property from being accessed or recreated by a machine learning model. This also potentially allows for more transparency in legal language as this product split is considered. Tech lawyer Cathy Gellis, stated “There can be the lawyerly temptation to phrase them (terms of service) as broadly as possible to give you the most flexibility as you continue to develop your service. But the problem with trying to proactively obtain as many permissions as you can is that users may start to think you will actually use them and react to that possibility.” In 2023, software vendors should assume that corporate clients will be wary of any language that puts trade secrets, patents, trademarks, or personally identifiable information at risk. Any changes to terms of service or service agreements should be reviewed both from a buy-side and sell-side perspective. This may include bringing in procurement or specialized software purchasing teams to reflect the customer’s perspective.

The third recommendation goes back to the ethical AI work that Amalgam Insights did several years ago. AI must be conducted in context of the same culture and goals that are considered pervasive within the company. Any AI policy that goes significantly outside the culture, norms, and expectations of the company will stand out. And this can be a challenge, because AI has been treated as an experiment in many cases, rather than as a formalized, technical capability. As AI development and policy is shaped, this is a time when new products, governance, and documentation need to be tightly aligned to core business and mission principles. AI is a test of every company’s culture and purpose and this is a time when the corporate ability to execute on lofty qualitative ideals will be actively challenged.

Zoom’s misstep in aggressively pursuing rights and access to client data should not just be seen as a specific organizational misstep, but as part of a set of trends that are important for enterprise, IT, purchasing, and legal departments as well as all software and data source vendors seeking to pursue AI and further monetize deep digital assets. The next 12 to 18 months are going to be a wild time in the technology market as every software vendor pursues some sort of AI strategy, and there will be mountains of new legal language, technical capabilities, and compliance aspects to review.