Posted on

Tom Petrocelli’s Retirement Message to All of You

Well, best to rip off the band-aid. 

I’m retiring at the end of the year. That’s right, on January 1, 2021 I will be officially and joyfully retired from the IT industry. No more conferences, papers, designs, or coding unless I want to. Truth be told, I’m still pretty young to retire. Some blame has to be laid at the feet of the pandemic. Being in the “trend” industry also sometimes makes you aware of negatives changes coming up. The pandemic is driving some of those including tighter budgets. This will just make everything harder.  Many aspects of my job that I like, especially going to tech conferences, will be gone for a while or maybe forever. 

I can’t blame it all on the pandemic though. Some of it is just demographics. Ours is a youthful industry with a median age of roughly mid to early 40’s. To be honest, I’m getting tired of being the oldest, or one of the oldest people in the room. It’s not as if I’m personally treated as an old person. In fact, I’m mostly treated as younger than I am which means a certain comfort making “old man” jokes around me. No one thinks that I will take offense at the ageism, I suppose. It’s not really “offense” as much as it’s irritation.

There will be a good number of things I will miss. I really love technology and love being among people who love it as much as I do. What I will miss the most is the people I’ve come to know throughout the years. It’s a bit sad that I can’t say goodbye in person to most of them. I will especially miss the team here at Amalgam Insights. Working with Hyoun, Lisa, and everyone else has been a joy. Thanks for that you all.

My career has spanned a bit over 36 years (which may surprise some of you… I hope) and changes rarely experienced in any industry. When I started fresh from college in 1984, personal computers were new, and the majority of computing was still on the mainframes my Dad operated. No one could even imagine walking around with orders of magnitude more computing power in our pockets. So much has changed. 

If you will indulge me, I would like to present a little parting analysis. Here is “What has changed during my career”.

  1. When I started mainframes were still the dominant form of computing. Now they are the dinosaur form of computing. Devices of all kinds wander the IT landscape, but personal computers and servers still dominate the business world. How long before we realize that cyberpunk goal of computers embedded in our heads? Sooner than I would like.
  2. At the beginning of my career, the most common way to access a remote computer was a 300 baud modem. Serial lines that terminals deployed to speak to the mainframes and minicomputers of the times were also that speed. The bandwidth of those devices was roughly 0.03 Mbps. Now, a home connection to an ISP is 20 – 50 Mps or more and a corporate desktop can expect 1 Gbs connections. That’s more than 33 times what was common in the 80s.
  3. Data storage has gotten incredibly cheap compared to the 1980s. The first 10M hard drive I purchased for a $5000 PC cost almost US$ 1000.00 in 1985 dollars.  For 1/10 of that price I can now order a 4T HD (and have it delivered the next day.) Adjusted for inflation that $1000 HD cost ~$2500 in 2020 dollars. That’s 25 times what the modern 4T drive costs.
  4. Along with mainframes, monolithic software has disappeared from the back end. Instead, client-server computing has given way to n-Tier as the main software platform. Not for long though. Distributed computing is in the process of taking off. It’s funny. At the beginning of my career I wrote code for distributed systems, which was an oddity back then. Now, after more than 30 years it’s becoming the norm. Kind of like AI.
  5. Speaking of AI, artificial intelligence was little more than science fiction. Even impressive AI was more about functions like handwriting recognition, which was created at my alma mater, the University at Buffalo, for the post office. Nothing like we see today. We are still, thankfully, decades or maybe centuries from real machine cognition. I’ll probably be dead before we mere humans need to bow to our robot overlords. 
  6. When I began my career, it was very male and white. My first manager was a woman and we had two other women software engineers in our group. This was as weird as a pink polka-dotted rhinoceros walking through the break room. Now, the IT industry is… still very male and white. There are more women, people with disabilities, and people of color than there was then but not quite the progress I had hoped for.
  7. IBM was, at that time, the dominant player in the computer industry. Companies such as Oracle and Cisco were just getting started, Microsoft was still basically a garage operation, and Intel was mostly making calculator chips. Now, IBM struggles to stay alive, Cisco, Oracle, Intel, and Microsoft are the established players in the industry and Amazon, an online store, is at the top of the most important trend in computing in the last 20 years, cloud computing. So many companies have come and gone, I don’t even bother to keep track.
  8. In the 1980s, the computer industry was almost entirely American, with a few European and Japanese companies in the market. Now, it’s still mostly American but for the first time since the dawn of the computer age, there is a serious contender: China. I don’t think they will dominate the industry the way the US has, but they will be a clear and powerful number two in the years to come. The EU is also showing many signs of innovation in the software industry.
  9. At the start of my career, you still needed paper encyclopedias. Within 10 years, you could get vast amounts of knowledge on CD’s. Today, all the world’s data is available at our fingertips. I doubt young people today can even imagine what it was like before the Internet gave us access to vast amounts of data in an instant. To them, it would be like living in a world where state of the art data storage is a clay tablet with cuneiform writing on it.
  10. What we wore to work has changed dramatically. When I started my career, we were expected to wear business dress. That was a jacket and tie with dress slacks for men, and a dress or power suit for women. In the 90s that shifted to business casual. Polo shirts and khakis filled up our closets. Before the pandemic, casual became proper office attire with t-shirts and jeans acceptable. At the start of my career, dressing like that at work could get you fired. Post pandemic, pajamas and sweatpants seem to be the new norm, unless you are on a Zoom call. Even so, pants are becoming optional.
  11. Office communications has also changed dramatically. For eons the way to communicated to co-workers was “the memo.” You wrote a note in longhand on paper and handed it to a secretary who typed it up. If there was more than one person, the secretary would duplicate it with a Xerox machine and place it in everyone’s mailboxes. You had to check your mailbox every day to make sure that you didn’t have any memos. It was slow and the secretaries knew everyone’s business. We still have vestiges of this old system in our email systems. CC stands for carbon copy which was a way of duplicating a memo. In some companies, everyone on the “To:” list received a fresh typed copy while the CC list received a copy that used carbon paper and a duplicating machine. As much as you all might hate email, it is so much better (and faster) than the old ways of communicating. 
  12. When I started my first job, I became the second member of my immediate family that was in the IT industry. My Dad was an operations manager in IBM shops. Today, there are still two members of our immediate family that are computer geeks. My son is also a software developer. He will have to carry the torch for the Petrocelli computer clan. No pressure though…
  13. Remote work? Ha! Yeah no. Not until the 90s and even then, it was supplementary to my go to the office job. I did work out of my house during one of my start ups but I was only 10 minutes from my partner. My first truly remote job was in 2000 and it was very hard to do. This was before residential broadband and smartphones. Now, it’s so easy to do with lots of bandwidth to my house, cheap networking, Slack, and cloud services to make it easy to stay connected. Unfortunately, not everyone has this infrastructure nor the technical know-how to deal with network issues. We’ve come a long way but not far enough as many of you have recently discovered.

So, goodbye my audience, my coworkers, and especially my friends. Hopefully, the universe will conspire to have us meet again. In the meantime, it’s time for me to devote more time to charity, ministry, and just plain fun. What can I say? It’s been an amazing ride. See ya!

(Editor’s Note: It has been a privilege and an honor to work with Tom over the past few years. Tom has always been on the bucket list of analysts I wanted to work with in my analyst career and I’m glad I had the chance to do so. Please wish Tom well in his next chapter! – Hyoun)

Posted on

Why Babelfish for Aurora PostgreSQL is a Savage and Aggressive Announcement by AWS

On December 1st at Amazon re:invent, Amazon announced its plans to open source Babelfish for PostgreSQL in Q1 of 2021 under the Apache 2.0 license. Babelfish for PostgreSQL is a service that allows PostgreSQL databases to support SQL Server requests and communication without requiring schema rewrites or custom SQL.

As those of you who work with data know, this is an obvious shot across the bow by Amazon to make it easier than ever to migrate away from SQL Server and towards PostgreSQL. Amazon is targeting Microsoft in yet another attempt to push database migration.

Over my 25 years in tech (and beyond), there have been many many attempts to push database migration and the vast majority have failed. Nothing in IT has the gravitational pull of the enterprise database, mostly because the business risks of migration have almost never warranted the potential operational and cost savings of migration.

So, what makes Babelfish for PostgreSQL different? PostgreSQL is more flexible than traditional relational databases in managing geospatial data and is relatively popular, placing fourth on DB-Engines ranking as of December 2, 2020. So, the demand to use PostgreSQL as a transactional database fundamentally exists at a groundroots level.

In addition, the need to create and store data is continuing to grow exponentially. There is no longer a “single source of truth” as there once was in the days of monolithic enterprise applications. Today, the “truth” is distributed, multi-faceted, and rapidly changing based on new data and context, which is often better set up in new or emerging databases rather than retrofitted into an existing legacy database tool and schema.

The aspect that I think is fundamentally most important is that Babelfish for PostgreSQL allows PostgreSQL to understand SQL Server’s proprietary T-SQL. This removes the need to rewrite schemas and code for the applications that are linked to SQL Server prior to migration.

And it doesn’t hurt that, as an open source project, the PostgreSQL community has traditionally been both open and not dominated by any one vendor. So, although this project will help Amazon, Amazon will not be driving the majority of the project or have a majority of the contributors to the project.

My biggest caveat is that Babelfish is still a work in progress. For now, it’s an appropriate tool for standard transactional database use cases, but you will want to closely check data types. And if you have a specialized industry vertical or use case associated with the application, you may need an industry-specific contributor to help with developing Babelfish for your migration.

As for the value, there is both the operational value and the financial value. From an operational perspective, PostgreSQL is typically easier to manage than SQL Server and provides more flexibility to migrate and host the database based on your preferences. There is also an obvious cost benefit, as the inherent license cost of SQL Server will likely cut the cost of the database itself by 60%, give or take on Amazon Web Services. For companies that are rapidly spinning up services and creating data, this can be a significant cost over time.

For now, I think the best move is to start looking at the preview of Babelfish on Amazon Aurora to get a feel for the data translations and transmissions since Babelfish for PostgreSQL likely won’t be open sourced for another couple of months. This will allow you to measure up the maturity of Babelfish for your current and rapidly growing databases. Given the likely gaps that exist in Babelfish at the moment, the best initial use cases for this tool are for databases where fixed text values make up the majority of data being transferred.

As an analyst, I believe this announcement is one of the few in my lifetime that will result in a significant migration of relational database hosting. I’m not predicting the death of SQL Server, by any means, and this tool is really best suited for smaller TB and below transactional databases at this point. (Please don’t think of this as a potential tool for your SQL Server data warehouse at this point!)

But the concept, the proposed execution, and the value proposition of Babelfish all line up in a way that is client and customer-focused, rather than a heavy-handed attempt to force migration for vendor-related revenue increases.

Posted on

Underspecification, Deep Evidential Regression, and Protein Folding: Three Big Discoveries in Machine Learning

This past month has been a banner month for Machine Learning as three key reports have come out that change the way that the average lay person should think about machine learning. Two of these papers are about conducting machine learning while considering underspecification and using deep evidential regression to estimate uncertainty. The third report is about a stunning result in machine learning’s role to improve protein folding.

The first report was written by a team of 40 Google researchers, titled Underspecification Presents Challenges for Credibility in Modern Machine Learning. Behind the title is the basic problem that certain predictors can lead to nearly identical results in a testing environment, but provide vastly different results in a production environment. It can be easy to simply train a model or to optimize a model to provide a strong initial fit. However, savvy machine learning analysts and developers will realize that their models need to be aligned not only to good results, but to the full context of the environment, language, risk profile, and other aspects of the problem in question.

The paper suggests conducting additional real-world stress tests for models that may seem similar and to understand the full scope of requirements associated with the model in question. As with much of the data world, the key for avoiding underspecification seems to come back to strong due diligence and robust testing rather than simply trusting the numbers.

The second report is Deep Evidential Regression, written by a team of MIT and Harvard authors which did the following.

In this paper, we propose a novel method for training non-Bayesian NNs to estimate a continuous target as well as its associated evidence in order to learn both aleatoric and epistemic uncertainty. We accomplish this by placing evidential priors over the original Gaussian likelihood function and training the NN to infer the hyperparameters of the evidential distribution

From a practical perspective, this method provides a relatively simple way to understand how “uncertain” your neural net is compared to the reality that it is trying to reflect. This paper moves beyond the standard measures of variance and accuracy to start trying to understand how confident we can be in the models being created. From my perspective, this concept couples well with the problem of underspecification. Together, I believe these two papers will help data scientists go a long way towards cleaning up models that look superficially good, but fail to reflect real world results.

Finally, I would be remiss if I didn’t mention the success of DeepMind’s program, AlphaFold, in the Critical Assessment of Structure Prediction challenge, which focuses on protein-structure predictions.

Although DeepMind has been working on AlphaFold for years, this current version tested yesterday provided results that were a quantum leap compared to prior years.

From Deepmind:

The reason that protein folding is so difficult to calculate is that there are multiple levels of structure to a protein. We learn about amino acids, which are the building blocks of proteins and basically defined by DNA. The A’s, T’s, C’s, and G’s basically provide an alphabet that defines the linear lineup of a protein with groups of three nucleotides defining an amino acid.

But then there’s a secondary structure where internal bonding can make the proteins line up as alpha sheets or beta helices. The totality of these secondary structures, this combination of sheets and helix shapes, makes up the tertiary structure.

And then multiple chains of tertiary structure can come together into a quaternary structure, which is the end game for building a protein. If you really want to learn the details, Khan Academy has a nice video to walk you through the details, as I’ve skipped all of the chemistry.

But the big takeaway: there are four levels of increasingly complicated chemical structure for a protein, each with its own set of interactions that make it very computationally challenging to guess what a protein would look like based just on having the basic DNA sequence or the related amino acid sequence.

Billions of computing hours have been spent on trying to figure out some vague idea of what a protein might look like and billions of lab hours have then been spent trying to test whether this wild guess is accurate or, more likely, not. This is why it is an amazing game-changer to see that DeepMind has basically nailed what the quaternary structure looks like.

This version of AlphaFold is an exciting Nobel Prize-caliber discovery. I think this will be the first Nobel Prize driven by deep learning and this discovery is an exciting validation of the value of AI at a practical level. At this point, AlphaFold is the “Data Prep” tool for protein folding with the same potential to greatly reduce the effort needed to simply make sure that a protein is feasable.

This discovery will improve our ability to create drugs, explore biological systems, and fundamentally understand how mutations affect proteins on a universal scale.

This is an exciting time to be a part of the AI community and to see advances being made literally on a weekly basis. As an analyst in this space, I look forward to seeing how these, and other discoveries, filter down to tools that we are able to use for business and at home.

Posted on

Updated Analysis: ServiceNow Acquires Element AI

(Note: Last Updated January 15, 2021 to reflect the announced purchase price.)

On November 30, 2020, ServiceNow announced an agreement to purchase Element AI, which was one of the top-funded and fastest-growing companies in the AI space.

Element AI was founded in 2016 by a supergroup of AI and technology executives with prior exits including Jean-Francois Gagne, Anne Martel, Nicolas Chapados, Philippe Beaudoin, and Turing Award winner Yoshua Bengio. This team was focused on helping non-technical companies to develop AI software solutions and expectations were only raised after a $102 million Series A round in 2017 followed by a $151 million funding round in 2019.

Element AI’s business model was similar to the likes of Pivotal Labs from a software development perspective or Slalom from an analytics perspective in that Element AI sought to provide the talent, skills, resources, and development plans to help companies adopt AI. The firm was often brought in to support AI projects that were beyond the scope of larger, more traditional consultancies such as Accenture and McKinsey.

However, Element AI faced a crossroads in 2020 in between several key market trends. First, the barrier to entry for AI has reduced considerably due to the development of AutoML solutions combined with the increased adoption of Python and R. Second, management consulting revenue growth slowed down in 2020, which reduced the velocity of pipeline to Element AI and made it harder to project the “hockey stick” exponential growth expected by highly funded companies, especially in light of COVID-related contract delays. And third, the ROI associated with AI projects is now better understood to largely come from the automation and optimization of processes associated with already-existing digital transformation projects that make separate AI efforts to be duplicative in nature, as Amalgam Insights has documented in our Business Value Analysis reports over time.

In the face of these trends, the acquisition of Element AI by ServiceNow is a very logical exit. This acquisition allows investors to get their money back relatively quickly.

(Update: on January 14, 2021, ServiceNow filed that the purchase price was approximately US $230 million or CDN $295 million. This was a massive discount on the estimated $600 million+ valuation from the previous September 2019 funding announcement )

Not every bet on building a multi-billion dollar company works out as planned, but this exercise was successful in creating a team of AI professionals with experience in building enterprise solutions. Amalgam Insights expects that over 200 Element AI employees will end up moving over to ServiceNow to build AI-driven pipelines and solutions under ServiceNow’s chief AI officer Vijay Narayanan. This team was ultimately the key reason for ServiceNow to make the acquisition, as Element AI’s commercial work is expected to be shut down after the close of this acquisition so that Element AI can focus on the ServiceNow platform and the enterprise transformation efforts associated with the million-dollar contracts that ServiceNow creates.

With this acquisition, ServiceNow has also stated that it intends to maintain the Element AI Montreal office as an “AI innovation hub,” which Amalgam Insights highly approves of. Montreal has long been a hub of analytics, data science, and artificial intelligence efforts and it would both help ServiceNow from a technical perspective to maintain a hub here and could help assuage some wounds that the Canadian government may have from losing a top AI company with government funding to a foreign company. Given ServiceNow’s international approach to business and Canada’s continued importance to the data, analytics, and AI spaces, this acquisition could be an unexpected win-win relationship between ServiceNow and Canada.

What To Expect Going Forward

With this acquisition, ServiceNow has quickly gained access to a large team of highly skilled AI professionals at a time when its revenues are growing 30% year over year. At this point, ServiceNow must scale quickly simply to keep up with its customers and this acquisition ended up being a necessary step to do so. This acquisition is the fourth AI-related acquisition made by ServiceNow after purchases of Loom Systems for log analytics, Passage AI to support conversational and natural language understanding, and Sweagle to support configuration management (CMDB) for IT management.

At the same time, Amalgam Insights believes this acquisition will provide focus to the Element AI team, which was dealing with the challenge of growing rapidly, while trying to solve the world’s AI problems ranging from AI for Good to defining ethical AI to building AI tools to discovering product-revenue alignment. The demands of trying to solve multiple problems as a startup, even an ambitious and well-funded startup, can be problematic. This acquisition allows Element AI to be a professional services arm and a development resource for ServiceNow’s ever-evolving platform roadmap as ServiceNow continues to expand from its IT roots to take on service, HR, finance, and other business challenges as a business-wide transformational platform.