Posted on Leave a comment

Torii raises a $10 Million A Round to Automate SaaS Management

On February 18, 2021, Torii, a management and automation solution for Software as a Service (SaaS) portfolios, announced a $10 million Series A funding round led by Wing Venture Capital with participation from its prior investors, Entree Capital, Global Founders Capital, Scopus Ventures, and Uncork Capital.

Context

SaaS management is a unique technology challenge for the enterprise both because of the sheer number of applications that are running in a large business (Amalgam Insights estimates that organizations with over 1,000 employees average 500 or more apps being used in production for business purposes) and the decentralized nature of SaaS purchases often allow any employee to bring a new app into the business without direct security, governance, or management input from IT.

About Torii

Over the past year, Torii has increased revenue by 400% and grown to 25 employees. Torii’s role in SaaS management focuses on providing a discovery engine to find all of the SaaS in an organization and then to provide automation to support the service orders to add and remove employees from each account, which is comparable to the telecom terminology of MAC-D (Moves, Adds, Changes, and Disconnects). Torii’s discovery capabilities are aided by integrations with identity management vendors solutions such as JumpCloud, Okta, Onelogin, and SailPoint as well as connections to reimbursement software such as Concur and Expensify. From a usage perspective, Torii also supports a variety of direct integrations with enterprise software solutions, provides web browser extensions to find new apps, and supports an app directory for SaaS applications being used in the enterprise.

As an interesting aside, Torii was founded by Uri Haramati, who was a founder of current social media darling Houseparty.

Context for this funding

Torii’s growth has been driven by the increased need for SaaS and was accelerated both by the global COVID-19 pandemic and the subsequent need to digitize workflows and onboard all remote employees to applications capable of managing work. The core driver associated with Torii and its competitors comes from the need to manage SaaS portfolios and licenses. This need is reminiscent of a similar challenge that was created a decade ago to manage the rapidly growing fleets of smartphones and tablets that quickly infiltrated the enterprise. But the biggest difference that Amalgam Insights sees between these two areas is that SaaS management is a purely digital challenge that can run across a variety of devices whereas mobile device management also included a physical device component.

This round of funding comes at a time when SaaS management is growing both in awareness and investment. Based on the growth of prior IT trends over the past 20 years, Amalgam Insights believes that the current market opportunity for managing SaaS across financial, operational, and technical management is over $2 billion and continues to grow as the SaaS market as a whole is growing. Given that even large enterprise SaaS applications such as Salesforce, Workday, and ServiceNow are continuing to grow 20-30% year-over-year, it is no surprise that Amalgam Insights expects the SaaS market continue growing 20% in 2021 and for the SaaS management opportunity to grow at roughly the same rate.

As of 2021, SaaS Management has become a legitimate market of competitors that have their own specializations, strengths, and market focus. In this market, Torii competes with the likes of startups Bettercloud, Blissfully, Cleanshelf, CoreView, Productiv, and Zylo to manage SaaS portfolios in the enterprise space as well as traditional Software Asset Management, IT Asset Management, and security providers such as Flexera, SailPoint, ServiceNow, and Snow Software.

Recent examples of investment in the SaaS Management space include

  • Zylo raising a $22 Million Series B round in September 2019
  • CoreView’s acquisition of Alpin in October 2019 and raising a $10 million Series B round in October 2020 to manage Office 365.
  • Productiv raising a $20 million Series B round in November 2019
  • Cleanshelf raising an $8 million Series A round in March 2020
  • Bettercloud raising a $75 million Series F round in May 2020
  • SailPoint’s acquisition of Intello in February 2021

What to expect from Torii

With this round of funding, Amalgam Insights expects that Torii will invest in talent and resources to pursue what is still a massive latent market. From a tactical perspective, Torii will likely double its employee size over the next year based on both its funding and revenue-based growth, which will help as it both enters net-new deals and competes against its peers in the SaaS Management market.

Amalgam Insights also expects that Torii will be pressed to pursue a wide variety of opportunities associated with audit, compliance, process and workflow automation, service automation, sourcing and contract management, and security management. This set of challenges and opportunities will require focus in the short-term and will likely require another round of funding in the next couple of years to fully pursue.

These are interesting times for the SaaS Management market as a set of vendors have started to coalescence in this space. As these companies grow over the next three-to-five years, the size of their market opportunity will potentially double with at least a couple of these companies achieving exits along the lines of AirWatch (now part of VMware), Apptio, MobileIron (now part of Ivanti), and Tangoe. With this round of funding and the opportunity in place, Amalgam Insights believes that Torii is well positioned to be a competitive option in the SaaS Management market for the next few years and should be considered by enterprises seeking to discover their hidden SaaS accounts and automate SaaS across the hundreds, if not thousands, of apps currently charged to the company and supported on corporate devices and the corporate network.

Posted on Leave a comment

Managing Inventory for Kubernetes Cost Management

Last week, we mentioned why Kubernetes is an important management concern for cloud-based cost management. To manage Kubernetes from a financial perspective, Amalgam Insights believes that the foundational starting point needs to be in discovering and documenting the relevant technical, financial, and operational aspects of container inventory.

Kubernetes Cost Management requires a complete inventory of containers that includes the documentation of namespaces, clusters, and pods associated with each node. This accounting allows companies to see how their Kubernetes environment is currently structured and provides the starting point for building a taxonomy for Kubernetes.

In addition, a container-based inventory also needs to include the technical context associated with each container. Containers must be tracked along with the cloud-based storage and compute services and resources associated with the container across the lifespan of the container. Since the portability of containers is a key value proposition, companies must focus on the time-series tracking of assets, services, and resource allocation with each container.

Containers must also track these changes on an ongoing basis as they are not simply static assets like a physical server. Although IT organizations are used to looking at usage, itself, on a time-series basis, IT assets and services are typically tracked simply based on when they are moved, added, changed, or deleted. Now, assets and services must also be tracked based on when they are reassigned and reallocated across containers and workloads. These time-based assignments for container-based reassignment can be difficult to track without a strategy to track these changes over time.

Inventories must also be tagged from an operational perspective, where containers and clusters are associated with relevant applications, functions, resources, and technical issues. This is an opportunity to tag containers, namespaces, and clusters with relevant monitoring tools, technical dependencies, cloud providers, applications, and other requirements for supporting containers.

From a practical perspective, this combination of operational, financial, and technical tagging ensures that a container can be managed, duplicated, altered, migrated, or terminated without any effects to relevant working environments. There is no point in saving a few dollars, Euro, or yuan only to impair an important test or production environment.

Kubernetes inventory management requires a combination of operational, financial, and technical information tracked over time to fully understand both the business dependencies and cost efficiencies associated with containerizing applications.

To learn more about Kubernetes Cost Management and key vendors to consider, read our groundbreaking report on the top Kubernetes Cost Management vendors.

Posted on

Tidelift Launches Catalogs to Support Open Source Maintenance

“On February 2, 2021, Tidelift announced several updates to its Tidelift Subscription designed to help companies manage open source in their IT and software environments. Amalgam Insights has covered Tidelift in the past as an emerging vendor dedicated to solving the thorny problem of supporting open-source maintenance and paying maintainers across the breadth of open source projects that are used in the business world.

Amalgam Insights finds today’s announcement by Tidelift to be interesting in supporting open-source portfolios for a few reasons.”

To learn more about this announcement and our recommendations for the Open Source community, read the full report: Tidelift Catalogs Clean Up the Enterprise Open Source Portfolio

Posted on

Market Alert: Box Acquires SignRequest to Develop Internal Electronic Signatures

Key Takeaway: “This opportunity for existing Box customers to embed e-signature more deeply into their document approval processes is a multi-billion dollar opportunity when the analytics, automation, workflows, and business process optimization opportunities are all taken into account.”


On February 3, 2021, Box announced its intention to acquire SignRequest, a Dutch e-signature vendor founded in 2014, for an estimated $55 million to develop Box Sign. Box plans to launch Box Sign in the summer of 2021 and to make it available both for personal and enterprise plans. Amalgam Insights believes this is an interesting opportunity for multiple reasons.

First, consider that Box’s entire go-to-market strategy is driven around placing enterprise standards around cloud-based content. This has always been its key driver and was the foundational starting point for allowing Box to succeed at a time when cloud-based content management system startups were popping up like wildfire a decade ago. As a starting point, let’s just use “enterprise standards” as a shorthand to describe the governance, security, analytics, and automation necessary to translate basic data and activity into the context and foundation needed to support businesses. Adding e-signatures allows Box to better serve its pharmaceutical, healthcare, government, legal, & other regulated clients with contractual & personal information transfers.

Second, the emergence of the COVID pandemic has driven the need to develop remote work capabilities and highlighted weaknesses in paper-based workflows that organizations have avoided for decades. The disease-driven digital transformation happening now is forcing companies to conduct the operational equivalent of changing the tires on a car while driving on a highway and requires complex problem-solving solutions that are well-packaged and readily available. This need drove the revenue of enterprise Software-as-a-Service companies in 2020 and will continue to drive growth as the majority of companies still need to fill gaps in their digital work toolsets.

Third, with internal e-signature, Box can now add human trust, activity, response time, & human-driven automation to a variety of documents and activities where it was previously dependent on partners. Human sign-off is a key data component, but it’s not the be-all, end-all of work. This is an opportunity to add signature-based approval as a foundational metadata component to every document, workroom, and content-based collaboration that Box supports, which is a vital area that no company has fully conquered. Looking at the enterprise market, companies that have started taking on this challenge include Workiva and ServiceNow, which are both obviously cloud SaaS darlings both from a revenue growth and valuation perspective.

I’m hoping this is a step towards Box being a Workiva (and eventually ServiceNow) competitor and starting to push activity analytics, machine-learning driven optimizations, and workflow capabilities to themarket. The content activity Box supports has immense latent value in benchmarking, authorizing, and rationalizing work. This trusted activity was one of the areas that some analysts, including myself, hoped that Blockchain would serve. But reality has proven that unlocking value from trusted activity requires hybrid activity that includes people, documents, and transactions.

This hybrid activity management along with the analytics, automation, trust, and force multiplier productivity that could result from this combination of human trust, document context, timely context, and related documents and workflows is the true promise of this acquisition. Existing document management vendors either lack the enterprise governance, platform standardization, automation, or functional capabilities to bring authorization and work together to the masses in a cost-efficient manner. Box’s business model that includes both freemium and enterprise models provides a unique opportunity to bridge the gaps in e-signature adoption, content, and business scale to provide both a better e-signature product and a next-generation trust platform driven by e-signature.

The takeaways here are two-fold. First, look closely at Box to see how they bring Box Sign to market in 2021. This opportunity for existing Box customers to embed e-signature more deeply into their document approval processes is a multi-billion dollar opportunity when the analytics, automation, workflows, and business process optimization opportunities are all taken into account. Second, expect enterprise workflow and content vendors ranging from ServiceNow to Workiva to OpenText to both change their esignature offerings and to start a product war to support greater advancement in signature-based capabilities, data management, and analytics as Box threatens to change the game.

Posted on 1 Comment

Databricks and DataRobot Funding Rounds Highlight a Rising Trend in Tech Funding: the Investipartner

In the tech era, one of the key buzzwords to describe businesses going to market is the idea of “coopertition” where companies choose to work together towards common goals while competing with each other.

Coined by Novell founder Raymond Noorda, this neologism now describes a common occurrence in the technology world and is a key operational aspect in describing Microsoft’s rapid ascent as Satya Nadella took office as CEO. Under Nadella, Microsoft is happy to sell its cloud infrastructure services while supporting competitive applications such as the 2019 announcement of selling Salesforce on Azure. Needless to say, coopertition is both a mature and expected business practice.

In the 2020s, this idea of coopertition has transformed and evolved as several tech trends have accelerated the pace of business.

  • Large tech companies have billions of dollars in cash on hand, see their stock trading at record highs, and need to continue growing rapidly.
  • Venture capital and private equity-backed companies have improved their ability to build “unicorns,” startups that grow to billion-dollar valuations within a few years. This size increasingly prevents even relatively large companies from purchasing these startups.
  • The radical growth of data, analytics, and machine learning as both data and algorithmic models continue to grow at triple-digit pace year over year
  • Customer interest in purchasing best-in-breed point solutions to solve specific problems
  • Customers are increasingly comfortable in quickly knitting these solutions together through a shared platform, use of APIs, virtualization, and containerization.

This combination of technology creation and consumption makes it difficult for incumbent vendors to build and bring tools to market in a relevant time frame before startups pop up and rapidly gain market share. In light of this challenge, Amalgam Insights notes that a number of recent funding announcements show signs of modernization of “cooperition” where vendor companies in competitive or adjacent markets invest in a quickly emerging and growing partner that solves issues that are related to their own solutions.

Rather than purchasing the company outright or creating their own version, the vendors choose to take a minority stake in these companies while having some shared go-to-market or partnering strategy. This additional step beyond cooperation involving an equity investment is a trend that Amalgam Insights calls “Investipartnering” where companies choose to make equity commitments with go-to-market partners. Recent examples include:

DataRobot, an automated machine learning solution that has quickly acquired and developed machine learning preparation, operations, and deployment capabilities. In December 2020, DataRobot raised a $320 million Series F round which added investipartners Snowflake, Salesforce Ventures, and Hewlett Packard Enterprise to accompany go-to-market approaches to pair analytics and cloud infrastructure with DataRobot’s ability to develop and operationalize machine learning.

Databricks, a unified analytics platform built by the creators of Apache Spark, announced its $1 billion Series G round on February 2021. This round included new investors Amazon Web Services, Salesforce Ventures as well as additional investment from Microsoft. In addition, Databricks took investment from the Canada Pension Plan Investment Board, which is currently a private equity owner of Informatica. Databricks competes in data management, machine learning, and analytics against each of these investors to some extent, but is also seen as a strategic partner.

Of course, this approach requires both that the startup is willing to partner with an established company in a space where the startup may also be positioned for further growth. And it requires that the large investing company both has the humility to realize that it may not be best suited to create the solution in question or that it should diversify its holdings in a particular market.

And this is not a unique or especially new trend. Microsoft’s investments in Apple in 1997 and Facebook in 2007 both show prior examples of investipartnering. However, what is new is the increased frequency with which high-flying companies such as Microsoft, Amazon, Adobe, Salesforce, Paypal, ServiceNow, Zoom, Snowflake, and Workday will continue to play this role in building fast-growing startups.

As large technology companies continue the need for growth and startups seek strategic smart money to facilitate their transition from private to public companies, Amalgam Insights expects that the Investipartner route will continue to be an attractive one for savvy technology companies that realize that the power of building markets is more important than a basic winner-take-all strategy.

Posted on 1 Comment

The Need to Manage Kubernetes Cost in the Age of COVID

Kubernetes has evolved from an interesting project to a core aspect of the next generation of software platforms. As a result, enterprise IT departments need to manage the costs associated with Kubernetes to be responsible financial stewards. This pressure to be financially responsible is exacerbated by the current status of COVID-driven pandemic recession that the majority of the world.

Past recessions have shown that the companies best suited to increase sales after a recession are those that

  • Minimize financial and technical debt
  • Invest in IT
  • Support decentralized and distributed decision-making, and
  • Avoid permanent layoffs.

Although Kubernetes is a technology and not a human resources management capability, it does help support increased business flexibility. Kubernetes cost management is an important exercise to ensure that the business flexibility created by Kubernetes is handled in a financially responsible manner. Technology investments must support multiple business goals: optimizing current business initiatives, supporting new business initiatives, and allowing new business initiatives to scale. Without a financial management component, technology deployments cannot be fully aligned to business goals.

From a cost perspective, Amalgam Insights believes that the IT Rule of 30 also applies to Kubernetes.

The IT Rule of 30 states that any unmanaged IT spend category averages 30% in wasted spend, due to a combination of resource optimization, demand-based scaling, time-based resource scaling, and pricing optimization opportunities that technical buyers often miss.

IT engineers and developers are not typically hired for their sourcing, procurement, and finance skills, so it is understandable that their focus is not going to be on cost optimization. However, as Kubernetes-related technology deployments start exceeding $300,000, companies start to have 6-figure dollar savings opportunities just from optimizing Kubernetes and related cloud and hardware resources used to support containerized workloads.

To learn more about Kubernetes Cost Management and key vendors to consider, read our groundbreaking report on the top Kubernetes Cost Management vendors.

Posted on

Informatica Supports the Data-Enabled Customer Experience with Customer 360

On January 11, 2021, Informatica announced its Customer 360 SaaS solution for customer Master Data Management. Built on the Informatica Intelligent Cloud Services platform, Informatica’s Customer 360 solution provides data integration, data governance, data quality, reference data management, process orchestration, and master data management as an integrated SaaS (Software as a Service)-based solution.

It can be easy to take the marketing aspects of master data management solutions for granted, as every solution on the market focused on customer data seems to claim that they help to manage relationships, provide personalization, and support data privacy and compliance while supporting a single version of the truth. The similarity of these marketing positions provides confusion on how new offerings in the master data space differ. And the idea of providing a “single version of the truth” is becoming less relevant or useful in an era where data grows and changes faster than ever before and the need for relevant data based on a shared version of the truth becomes more important than simply having one monolithic and complete version of the truth documented and reified in a single master data repository.

Customer master data also provides challenges to companies as this data has to be considered in context of the expanded role that data now plays in defining the customer. In the Era of COVID, customer interactions and relationships have largely been driven by remote, online transactions as in-person shopping has been sharply curtailed by a combination of health concerns, regulations, and, in some cases, supply chain interruptions that have affected the availability of commodities and finished goods. In this context, customer data management plays an increasingly important role in driving customer relationships, not only by providing personalization of data but also in managing appropriate metadata to support the appropriate building of hierarchies and relationships. Both now and as we move past the current time of Coronavirus, companies must support the data-enabled customer and reduce barriers to commerce and purchasing.

In exploring the Informatica Customer 360 solution, Amalgam Insights found several compelling aspects that enterprise organizations should consider as they build out their customer master data and seek a solution to maintain data consistency across all applications.

First, Amalgam Insights considers the combination of data management, metadata management, and data cleansing capabilities in Customer 360 to be an important capability. Customer data is notorious for its ability to become dirty and inaccurate because it is linked to the characteristics of human lives: home addresses, email addresses, phone numbers, purchase activities, and preferences, etc…

In this context, it is vital that a master data solution focused on customer data needs to support clean and relevant data and the business context provided with reference data and other relevant metadata. Rather than treat master data literally as a static and standalone data record, Customer 360 brings together the context and cleansing needed to maximize both the value and accuracy of master data.

Second, Customer 360’s use of artificial intelligence and machine learning will help businesses to maintain an accurate and shared version of the truth. AI is used in this solution to

  • assist with data matching across data sets
  • provide “smart” field governance to auto-correct data in master data fields with defined formats such as zip codes or country abbreviations
  • use Random Forests to support active learning for blocking and matching
  • support deep learning techniques for text matching and cleansing text that may be difficult to parse

Third, Informatica’s Customer 360 solution provides a strong foundation for application development based both on being built on a shared microservices architecture as well as investments in DevOps-friendly capabilities including metadata versioning supported by automated testing and defined DevOps pipelines. The ability to open up both the services available on the Customer 360 solution as well as the data to custom applications and integrations will help the data-driven business to make relevant data more accessible

The Customer 360 product also includes simplified consumption-based pricing, a different user interface designed for a more simple user experience, as well as improved integration capabilities including real-time, streaming, and batch functionality that reflects the changing nature of data. Amalgam Insights looks forward to seeing how a pay-as-you-go SaaS approach to Customer 360 is received, as this combination is relatively new in the world of master data management implementations that are often treated as massive CapEx projects.

Overall, Amalgam Insights sees Informatica’s Customer 360 as a valuable step forward for both the master data management and customer data markets.

This combined vision of providing consumption-based pricing, a contextual and intelligently augmented version of the truth, and a combination of data capabilities designed to maximize the accuracy of customer data within a modern UX is a compelling offering for organizations seeking a customer data management solution.

As a result, Amalgam Insights recommends Customer 360 for organizations interested in minimizing the amount of time spent in cleansing, fixing, and finding customer data while accelerating time-to-context. Organizations focused on progressing beyond standard non-AI-enabled data cleansing and governance processes are best positioned to maximize the value of this offering. 

Posted on

Tom Petrocelli’s Retirement Message to All of You

Well, best to rip off the band-aid. 

I’m retiring at the end of the year. That’s right, on January 1, 2021 I will be officially and joyfully retired from the IT industry. No more conferences, papers, designs, or coding unless I want to. Truth be told, I’m still pretty young to retire. Some blame has to be laid at the feet of the pandemic. Being in the “trend” industry also sometimes makes you aware of negatives changes coming up. The pandemic is driving some of those including tighter budgets. This will just make everything harder.  Many aspects of my job that I like, especially going to tech conferences, will be gone for a while or maybe forever. 

I can’t blame it all on the pandemic though. Some of it is just demographics. Ours is a youthful industry with a median age of roughly mid to early 40’s. To be honest, I’m getting tired of being the oldest, or one of the oldest people in the room. It’s not as if I’m personally treated as an old person. In fact, I’m mostly treated as younger than I am which means a certain comfort making “old man” jokes around me. No one thinks that I will take offense at the ageism, I suppose. It’s not really “offense” as much as it’s irritation.

There will be a good number of things I will miss. I really love technology and love being among people who love it as much as I do. What I will miss the most is the people I’ve come to know throughout the years. It’s a bit sad that I can’t say goodbye in person to most of them. I will especially miss the team here at Amalgam Insights. Working with Hyoun, Lisa, and everyone else has been a joy. Thanks for that you all.

My career has spanned a bit over 36 years (which may surprise some of you… I hope) and changes rarely experienced in any industry. When I started fresh from college in 1984, personal computers were new, and the majority of computing was still on the mainframes my Dad operated. No one could even imagine walking around with orders of magnitude more computing power in our pockets. So much has changed. 

If you will indulge me, I would like to present a little parting analysis. Here is “What has changed during my career”.

  1. When I started mainframes were still the dominant form of computing. Now they are the dinosaur form of computing. Devices of all kinds wander the IT landscape, but personal computers and servers still dominate the business world. How long before we realize that cyberpunk goal of computers embedded in our heads? Sooner than I would like.
  2. At the beginning of my career, the most common way to access a remote computer was a 300 baud modem. Serial lines that terminals deployed to speak to the mainframes and minicomputers of the times were also that speed. The bandwidth of those devices was roughly 0.03 Mbps. Now, a home connection to an ISP is 20 – 50 Mps or more and a corporate desktop can expect 1 Gbs connections. That’s more than 33 times what was common in the 80s.
  3. Data storage has gotten incredibly cheap compared to the 1980s. The first 10M hard drive I purchased for a $5000 PC cost almost US$ 1000.00 in 1985 dollars.  For 1/10 of that price I can now order a 4T HD (and have it delivered the next day.) Adjusted for inflation that $1000 HD cost ~$2500 in 2020 dollars. That’s 25 times what the modern 4T drive costs.
  4. Along with mainframes, monolithic software has disappeared from the back end. Instead, client-server computing has given way to n-Tier as the main software platform. Not for long though. Distributed computing is in the process of taking off. It’s funny. At the beginning of my career I wrote code for distributed systems, which was an oddity back then. Now, after more than 30 years it’s becoming the norm. Kind of like AI.
  5. Speaking of AI, artificial intelligence was little more than science fiction. Even impressive AI was more about functions like handwriting recognition, which was created at my alma mater, the University at Buffalo, for the post office. Nothing like we see today. We are still, thankfully, decades or maybe centuries from real machine cognition. I’ll probably be dead before we mere humans need to bow to our robot overlords. 
  6. When I began my career, it was very male and white. My first manager was a woman and we had two other women software engineers in our group. This was as weird as a pink polka-dotted rhinoceros walking through the break room. Now, the IT industry is… still very male and white. There are more women, people with disabilities, and people of color than there was then but not quite the progress I had hoped for.
  7. IBM was, at that time, the dominant player in the computer industry. Companies such as Oracle and Cisco were just getting started, Microsoft was still basically a garage operation, and Intel was mostly making calculator chips. Now, IBM struggles to stay alive, Cisco, Oracle, Intel, and Microsoft are the established players in the industry and Amazon, an online store, is at the top of the most important trend in computing in the last 20 years, cloud computing. So many companies have come and gone, I don’t even bother to keep track.
  8. In the 1980s, the computer industry was almost entirely American, with a few European and Japanese companies in the market. Now, it’s still mostly American but for the first time since the dawn of the computer age, there is a serious contender: China. I don’t think they will dominate the industry the way the US has, but they will be a clear and powerful number two in the years to come. The EU is also showing many signs of innovation in the software industry.
  9. At the start of my career, you still needed paper encyclopedias. Within 10 years, you could get vast amounts of knowledge on CD’s. Today, all the world’s data is available at our fingertips. I doubt young people today can even imagine what it was like before the Internet gave us access to vast amounts of data in an instant. To them, it would be like living in a world where state of the art data storage is a clay tablet with cuneiform writing on it.
  10. What we wore to work has changed dramatically. When I started my career, we were expected to wear business dress. That was a jacket and tie with dress slacks for men, and a dress or power suit for women. In the 90s that shifted to business casual. Polo shirts and khakis filled up our closets. Before the pandemic, casual became proper office attire with t-shirts and jeans acceptable. At the start of my career, dressing like that at work could get you fired. Post pandemic, pajamas and sweatpants seem to be the new norm, unless you are on a Zoom call. Even so, pants are becoming optional.
  11. Office communications has also changed dramatically. For eons the way to communicated to co-workers was “the memo.” You wrote a note in longhand on paper and handed it to a secretary who typed it up. If there was more than one person, the secretary would duplicate it with a Xerox machine and place it in everyone’s mailboxes. You had to check your mailbox every day to make sure that you didn’t have any memos. It was slow and the secretaries knew everyone’s business. We still have vestiges of this old system in our email systems. CC stands for carbon copy which was a way of duplicating a memo. In some companies, everyone on the “To:” list received a fresh typed copy while the CC list received a copy that used carbon paper and a duplicating machine. As much as you all might hate email, it is so much better (and faster) than the old ways of communicating. 
  12. When I started my first job, I became the second member of my immediate family that was in the IT industry. My Dad was an operations manager in IBM shops. Today, there are still two members of our immediate family that are computer geeks. My son is also a software developer. He will have to carry the torch for the Petrocelli computer clan. No pressure though…
  13. Remote work? Ha! Yeah no. Not until the 90s and even then, it was supplementary to my go to the office job. I did work out of my house during one of my start ups but I was only 10 minutes from my partner. My first truly remote job was in 2000 and it was very hard to do. This was before residential broadband and smartphones. Now, it’s so easy to do with lots of bandwidth to my house, cheap networking, Slack, and cloud services to make it easy to stay connected. Unfortunately, not everyone has this infrastructure nor the technical know-how to deal with network issues. We’ve come a long way but not far enough as many of you have recently discovered.

So, goodbye my audience, my coworkers, and especially my friends. Hopefully, the universe will conspire to have us meet again. In the meantime, it’s time for me to devote more time to charity, ministry, and just plain fun. What can I say? It’s been an amazing ride. See ya!

(Editor’s Note: It has been a privilege and an honor to work with Tom over the past few years. Tom has always been on the bucket list of analysts I wanted to work with in my analyst career and I’m glad I had the chance to do so. Please wish Tom well in his next chapter! – Hyoun)

Posted on

Why Babelfish for Aurora PostgreSQL is a Savage and Aggressive Announcement by AWS

On December 1st at Amazon re:invent, Amazon announced its plans to open source Babelfish for PostgreSQL in Q1 of 2021 under the Apache 2.0 license. Babelfish for PostgreSQL is a service that allows PostgreSQL databases to support SQL Server requests and communication without requiring schema rewrites or custom SQL.

As those of you who work with data know, this is an obvious shot across the bow by Amazon to make it easier than ever to migrate away from SQL Server and towards PostgreSQL. Amazon is targeting Microsoft in yet another attempt to push database migration.

Over my 25 years in tech (and beyond), there have been many many attempts to push database migration and the vast majority have failed. Nothing in IT has the gravitational pull of the enterprise database, mostly because the business risks of migration have almost never warranted the potential operational and cost savings of migration.

So, what makes Babelfish for PostgreSQL different? PostgreSQL is more flexible than traditional relational databases in managing geospatial data and is relatively popular, placing fourth on DB-Engines ranking as of December 2, 2020. So, the demand to use PostgreSQL as a transactional database fundamentally exists at a groundroots level.

In addition, the need to create and store data is continuing to grow exponentially. There is no longer a “single source of truth” as there once was in the days of monolithic enterprise applications. Today, the “truth” is distributed, multi-faceted, and rapidly changing based on new data and context, which is often better set up in new or emerging databases rather than retrofitted into an existing legacy database tool and schema.

The aspect that I think is fundamentally most important is that Babelfish for PostgreSQL allows PostgreSQL to understand SQL Server’s proprietary T-SQL. This removes the need to rewrite schemas and code for the applications that are linked to SQL Server prior to migration.

And it doesn’t hurt that, as an open source project, the PostgreSQL community has traditionally been both open and not dominated by any one vendor. So, although this project will help Amazon, Amazon will not be driving the majority of the project or have a majority of the contributors to the project.

My biggest caveat is that Babelfish is still a work in progress. For now, it’s an appropriate tool for standard transactional database use cases, but you will want to closely check data types. And if you have a specialized industry vertical or use case associated with the application, you may need an industry-specific contributor to help with developing Babelfish for your migration.

As for the value, there is both the operational value and the financial value. From an operational perspective, PostgreSQL is typically easier to manage than SQL Server and provides more flexibility to migrate and host the database based on your preferences. There is also an obvious cost benefit, as the inherent license cost of SQL Server will likely cut the cost of the database itself by 60%, give or take on Amazon Web Services. For companies that are rapidly spinning up services and creating data, this can be a significant cost over time.

For now, I think the best move is to start looking at the preview of Babelfish on Amazon Aurora to get a feel for the data translations and transmissions since Babelfish for PostgreSQL likely won’t be open sourced for another couple of months. This will allow you to measure up the maturity of Babelfish for your current and rapidly growing databases. Given the likely gaps that exist in Babelfish at the moment, the best initial use cases for this tool are for databases where fixed text values make up the majority of data being transferred.

As an analyst, I believe this announcement is one of the few in my lifetime that will result in a significant migration of relational database hosting. I’m not predicting the death of SQL Server, by any means, and this tool is really best suited for smaller TB and below transactional databases at this point. (Please don’t think of this as a potential tool for your SQL Server data warehouse at this point!)

But the concept, the proposed execution, and the value proposition of Babelfish all line up in a way that is client and customer-focused, rather than a heavy-handed attempt to force migration for vendor-related revenue increases.

Posted on

Underspecification, Deep Evidential Regression, and Protein Folding: Three Big Discoveries in Machine Learning

This past month has been a banner month for Machine Learning as three key reports have come out that change the way that the average lay person should think about machine learning. Two of these papers are about conducting machine learning while considering underspecification and using deep evidential regression to estimate uncertainty. The third report is about a stunning result in machine learning’s role to improve protein folding.

The first report was written by a team of 40 Google researchers, titled Underspecification Presents Challenges for Credibility in Modern Machine Learning. Behind the title is the basic problem that certain predictors can lead to nearly identical results in a testing environment, but provide vastly different results in a production environment. It can be easy to simply train a model or to optimize a model to provide a strong initial fit. However, savvy machine learning analysts and developers will realize that their models need to be aligned not only to good results, but to the full context of the environment, language, risk profile, and other aspects of the problem in question.

The paper suggests conducting additional real-world stress tests for models that may seem similar and to understand the full scope of requirements associated with the model in question. As with much of the data world, the key for avoiding underspecification seems to come back to strong due diligence and robust testing rather than simply trusting the numbers.

The second report is Deep Evidential Regression, written by a team of MIT and Harvard authors which did the following.

In this paper, we propose a novel method for training non-Bayesian NNs to estimate a continuous target as well as its associated evidence in order to learn both aleatoric and epistemic uncertainty. We accomplish this by placing evidential priors over the original Gaussian likelihood function and training the NN to infer the hyperparameters of the evidential distribution

http://www.mit.edu/~amini/pubs/pdf/deep-evidential-regression.pdf

From a practical perspective, this method provides a relatively simple way to understand how “uncertain” your neural net is compared to the reality that it is trying to reflect. This paper moves beyond the standard measures of variance and accuracy to start trying to understand how confident we can be in the models being created. From my perspective, this concept couples well with the problem of underspecification. Together, I believe these two papers will help data scientists go a long way towards cleaning up models that look superficially good, but fail to reflect real world results.

Finally, I would be remiss if I didn’t mention the success of DeepMind’s program, AlphaFold, in the Critical Assessment of Structure Prediction challenge, which focuses on protein-structure predictions.

Although DeepMind has been working on AlphaFold for years, this current version tested yesterday provided results that were a quantum leap compared to prior years.

From Deepmind: https://deepmind.com/blog/article/alphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology

The reason that protein folding is so difficult to calculate is that there are multiple levels of structure to a protein. We learn about amino acids, which are the building blocks of proteins and basically defined by DNA. The A’s, T’s, C’s, and G’s basically provide an alphabet that defines the linear lineup of a protein with groups of three nucleotides defining an amino acid.

But then there’s a secondary structure where internal bonding can make the proteins line up as alpha sheets or beta helices. The totality of these secondary structures, this combination of sheets and helix shapes, makes up the tertiary structure.

And then multiple chains of tertiary structure can come together into a quaternary structure, which is the end game for building a protein. If you really want to learn the details, Khan Academy has a nice video to walk you through the details, as I’ve skipped all of the chemistry.

But the big takeaway: there are four levels of increasingly complicated chemical structure for a protein, each with its own set of interactions that make it very computationally challenging to guess what a protein would look like based just on having the basic DNA sequence or the related amino acid sequence.

Billions of computing hours have been spent on trying to figure out some vague idea of what a protein might look like and billions of lab hours have then been spent trying to test whether this wild guess is accurate or, more likely, not. This is why it is an amazing game-changer to see that DeepMind has basically nailed what the quaternary structure looks like.

This version of AlphaFold is an exciting Nobel Prize-caliber discovery. I think this will be the first Nobel Prize driven by deep learning and this discovery is an exciting validation of the value of AI at a practical level. At this point, AlphaFold is the “Data Prep” tool for protein folding with the same potential to greatly reduce the effort needed to simply make sure that a protein is feasable.

This discovery will improve our ability to create drugs, explore biological systems, and fundamentally understand how mutations affect proteins on a universal scale.

This is an exciting time to be a part of the AI community and to see advances being made literally on a weekly basis. As an analyst in this space, I look forward to seeing how these, and other discoveries, filter down to tools that we are able to use for business and at home.