On a monthly basis, I will be rounding up key news associated with the Data Science Platforms space for Amalgam Insights. Companies covered will include: Alteryx, Amazon, Anaconda, Cambridge Semantics, Cloudera, Databricks, Dataiku, DataRobot, Datawatch, Domino, Elastic, H2O.ai, IBM, Immuta, Informatica, KNIME, MathWorks, Microsoft, Oracle, Paxata, RapidMiner, SAP, SAS, SnapLogic, Tableau, Talend, Teradata, TIBCO, Trifacta, TROVE.
In Amalgam’s recent Analyst Insight, “Domo Hajimemashite At Domopalooza 2018, Domo Solves Its Case of Mistaken Identity”, Amalgam introduced a figure showing the 5 Tiers of Technology Value. This pyramid, based on Maslow’s Hierarchy of Needs, demonstrates how technology provides value that can be documented, calculated, and used to build business cases.
5 Tiers of Technology Value
To better understand these five tiers, Amalgam provides this guidance to companies seeking a better understanding of how IT investments are justified, as well as the pros and cons associated with each tier.
Recommended Audience: CIOs, Enterprise Architects, Data Managers, Analytics Managers, Data Scientists, IT Managers Vendors Mentioned: Trifacta, Paxata, Datameer, Datawatch, Lavastorm, Alation, Tamr, Unifi, 1010Data, Podium Data, IBM, Domo, Microsoft, Information Builders, Board, Microstrategy, Cloudera, H20.ai, RapidMiner, Domino Data Lab, Dataiku, TIBCO, SAS, Amazon Web Services, Google, DataRobot. In case you missed it, I just finished…
On CIO.com, analyst Hyoun Park discusses recent cloud pricing changes by Oracle, Amazon, and Google in context of understanding who is actually providing the cheapest cloud. In this blog, Park posits that Oracle’s new Universal Credits for IaaS and PaaS usage are fundamentally different from the traditional pricing models for cloud and shows that the enterprise cloud is coming of age.
One of Park’s assertions is that the most granular pricing may not be the cheapest because the complexity of detailed pricing prevents companies from optimizing their costs. Will this trend affect your cloud costs?
Also, join Hyoun’s webinar to learn more about managing cloud costs on BrightTALK: Cloud Service Management: Managing Cost, Resources, and Security
On November 29th, Amazon Web Services announced SageMaker, a managed machine language service that manages the authoring, model training, and hosting of algorithms and frameworks. These capabilities can be used by themselves, or as an end-to-end production pipeline.
SageMaker is currently available with a Free tier providing 250 hours of t2.medium notebook usage, 50 hours of m4.xlarge training usage, and 125 hours of m4.xlarge hosting usage for hosting for two months. After two months or for additional hours, the service is billed per instance, storage GB, and data transfer GB.
Amalgam Insights anticipates watching the adoption of SageMaker as it solves several basic problems in machine learning.
On November 29th, Amazon Web Services announced a variety of interesting database announcements at Amazon re:invent. Amazon Neptune, DynamoDB enhancements, and Aurora Serverless. Amalgam found both Neptune and DynamoDB announcements to be valuable but believes Aurora Serverless was the most interesting of these events both in its direct competition with Oracle and its personification of a key transitional challenge that all enterprise IT organizations face.
Amazon Neptune is a managed graph database service that Amalgam believes will be important for analyzing relationships, networked environments, process and anomaly charting, pattern sequencing, and random walks (such as solving the classic “traveling salesman” problem). Amazon Neptune is currently in limited preview with no scheduled date for production. Over time, Amalgam expects that Neptune will be an important enhancer for Amazon Kinesis’ streaming data, IoT Platform, Data Pipeline, and EMR (Elastic MapReduce) as graph databases are well-suited to find the context and value hiding in large volumes of related data.
For the DynamoDB NoSQL database service, Amazon announced two new capabilities. The first is global tables that will be automatically replicated across multiple AWS regions, which will be helpful for global support of production applications. Secondly, Amazon now provides on-demand backups for DynamoDB tables without impacting their availability or speed. With these announcements, DynamoDB comes closer to being a dependable and consistently governed global solution for unstructured and semistructured data.
But the real attention-getter was in the announcement of Aurora Serverless, an upcoming relational database offering that will allow end users to pay for database usage and access on a per-second basis. This change is made possible by Amazon’s existing Aurora architecture in separating storage from compute from a functional basis. This capability will be extremely valuable in supporting highly variable workloads.
How much will Aurora Serverless affect the world of relational databases?
Taking a step back, the majority of business data value is still created by relational data. Relational data is the basis of the vast majority of enterprise applications, the source for business intelligence and business analytics efforts, and the standard format that enterprise employees understand best for creating data. For the next decade, relational data will still be the most valuable form of data in the enterprise and the fight for relational data support will be vital in driving the future of machine learning, artificial intelligence, and digital user experience. To understand where the future of relational data is going, we have to first look at Oracle, who still owns 40+% of the relational database market and is laser-focused on business execution.
In early October, Oracle announced the “Autonomous Database Cloud,” based on Database 18c. The Autonomous Database Cloud was presented as a solution for managing the tuning, updating, performance driving, scaling, and recovery tasks that database administrators are typically tasked with and was scheduled to be launched in late 2017. This announcement came with two strong guarantees: 1) A telco-like 99.995% availability guarantee, including scheduled downtime and 2) a promise to provide the database at half the price of Amazon Redshift based on the processing power of the Oracle database.
In doing so, Oracle is using a combination of capabilities based on existing Oracle tuning, backup, and encryption automation and adding monitoring, failure detection, and automated correction capabilities. All of these functions will be overseen by machine learning designed to maintain and improve performance over time. The end result should be that Oracle Autonomous Database Cloud customers would see an elimination of day-to-day administrative tasks and reduced downtime as the machine learning continues to improve the database environment over time.
IT Divergence In Motion: Oracle vs. Amazon
Oracle and Amazon have taken divergent paths in providing their next-generation relational databases, leading to an interesting head-to-head decision for companies seeking enterprise-grade database solutions.
On the one hand, IT organizations that are philosophically seeking to manage IT as a true service have, in Oracle, an automated database option that will remove the need for direct database and maintenance administration. Oracle is removing a variety of traditional corporate controls and replacing them with guaranteed uptime, performance, maintenance, and error reduction. This is an outcome-based approach that is still relatively novel in the IT world.
For those of us who have spent the majority of our careers handling IT at a granular level, it can feel somewhat disconcerting to see many of the manual tuning, upgrading, and security responsibilities being both automated and improved through machine learning. In reality, highly repetitive IT tasks will continue to be automated over time as the transactional IT administration tasks of the 80s and 90s finally come to an end. The Oracle approach is a look towards the future where the goal of database planning is to immediately enact analytic-ready data architecture rather than to coordinate efforts between database structures, infrastructure provisioning, business continuity, security, and networking. Oracle has also answered the question of how it will answer questions regarding the “scale-out” management of its database by providing this automated management layer with price guarantees.
In this path of database management evolution, database administrators must be architects who focus on how the wide variety of data categories (structured, semi-structured, unstructured, streaming, archived, binary, etc…) will fit into the human need for structure, context, and worldview verification.
On the other hand, Amazon’s approach is fundamentally about customer control at extremely granular levels. Aurora is easy to spin up and allows administrators a great deal of choice between instance size and workload capacity. With the current preview of Amazon Aurora Serverless, admins will have even more control over both storage and processing consumption by starting at the endpoint level as a starting point for provisioning and production. Amazon will target the support of MySQL compatibility in the first half of 2018, then follow with PostgreSQL later in 2018. This billing will occur in Aurora Capacity Units as a combination of storage and memory metered in one-second increments. This granularity of consumption and flexibility of computing will be very helpful in supporting on-demand applications with highly variable or unpredictable usage patterns.
But my 20+ years in technology cost administration also lead me to believe that there is an illusory quality of control in the cost and management structure that Amazon is providing. There is nothing wrong with providing pricing at an extremely detailed level, but Amalgam already finds that the vast majority of enterprise cloud spend unmonitored from a month-to-month basis at all but the most cursory levels. (For those of you in IT, who is the accountant or expense manager who cross-checks and optimizes your cloud resources on a monthly basis? Oh, you don’t have one?)
Because of that, we at Amalgam believe that additional granularity is more likely to result in billing disputes or complaints. We will also be interested in understanding the details of compute: there can be significant differences in resource pricing based on reserved instances, geography, timing, security needs, and performance needs. Amazon will need to reconcile these compute costs to prevent this service from being an uncontrolled runaway cost. This is the reality of usage-based technology consumption: decades of telecom, network, mobility, and software asset consumption have all demonstrated the risks of pure usage-based pricing.
Amalgam believes that there is room for both as Ease-of-Use vs. Granular Management continues to be a key IT struggle in 2018. Oracle represents the DB option for enterprises seeking governance, automation, and strategic scale while Amazon provides the DB option for enterprises seeking to scale while tightly managing and tracking consumption. The more important issue here is that the Oracle DB vs. Amazon DB announcements represent a microcosm of the future of IT. In one corner is the need to support technology that “just works” with no downtime, no day-to-day adminstration, and cost reduction driven by performance. In the other corner is the ultimate commoditization of technology where customers have extremely granular consumption options, can get started at minimal cost, and can scale out with little-to-no management.
1) Choose your IT model: “Just Works” vs. ” Granular Control.” Oracle and Amazon announcements show how both models have valid aspects. But inherent in both are the need to both scale up and scale out to fit business needs.
2) For “Just Works” organizations, actively evaluate machine learning and automation-driven solutions that reduce or eliminate day-to-day administration. For these organizations, IT no longer represents the management of technology, but the ability to supply solutions that increase in value over time. 2018 is going to be a big year in terms of adding new levels of automation in your organizations.
3) For “Granular Control” organizations, define the technology components that are key drivers or pre-requisites to business success and analyze them extremely closely. In these organizations, IT must be both analytics-savvy and maintain constant vigilance in an ever-changing world. If IT is part of your company’s secret sauce and a fundamental key to differentiated execution, you now have more tools to focus on exactly how, when, and where inflection points take place for company growth, change, or decline.
For additional insights on Amazon’s impact on the future of IT, read Amalgam analyst Tom Petrocelli’s perspective on Amazon Web Services and the Death of IT Ops on InformationWeek
Yesterday, at the Boston Cloud Services Meetup at the Cambridge IBM Innovation Center, Amalgam Insights (AI) attended a Cloudyn-based event on “Overcoming the Challenges of Multi-Cloud Financial Management.” This presentation was headed by Account Executive Marcus Benson and focused on the challenges that Fortune 500 companies and managed service providers have in managing both complex single-vendor and multi-vendor cloud infrastructure environments.
Cloudyn is a cloud business and financial management solution founded in 2011 and set up as both a multi-tenant and multi-cloud solution running on AWS, Microsoft Azure and Google Cloud. Cloudyn supports a single pane of glass view for consolidated management and a real-time and continuous support of cost optimization for multiple vendors including Amazon Web Services, Microsoft Azure, Google Cloud, OpenStack, and Docker. Cloudyn has raised over $20 million in venture capital and seed funding and currently targets large enterprises, managed service providers, and companies with over 1 million dollars in annual cloud spend.