On a monthly basis, I will be rounding up key news associated with the Data Science Platforms space for Amalgam Insights. Companies covered will include: Alteryx, Anaconda, Cloudera, Databricks, Dataiku, DataRobot, Datawatch, Domino, H2O.ai, IBM, Immuta, Informatica, KNIME, MathWorks, Microsoft, Oracle, Paxata, RapidMiner, SAP, SAS, Tableau, Talend, Teradata, TIBCO, Trifacta.
Vendors and Solutions Mentioned: VMware, CloudHealth Technologies, Cloudyn, Microsoft Azure Cloud Cost Management, Cloud Cruiser, HPE OneSphere. Nutanix Beam, Minjar, Botmetric
Key Stakeholders: Chief Financial Officers, Chief Information Officers, Chief Accounting Officers, Chief Procurement Officers, Cloud Computing Directors and Managers, IT Procurement Directors and Managers, IT Expense Directors and Managers
Key Takeaway: As Best-of-Breed vendors continue to emerge, new technologies are invented, existing services continue to evolve, vendors pursue new and innovative pricing and delivery models, cloud computing remains easy to procure, and IaaS doubles every three years as a spend category, cloud computing management will only increase in complexity and the need for Cloud Service Management will only increase. VMware has made a wise choice in buying into a rapidly growing market and now has greater opportunity to support and augment complex peak, decentralized, and hybrid IT environments.
About the Announcement
On August 27, 2018, VMware announced a definitive agreement to acquire CloudHealth Technologies, a Boston-based startup company focused on providing a cloud operations and expense management platform that supports enterprise accounts across Amazon Web Services, Microsoft Azure, and Google Cloud Platform.
Continue reading VMware Purchases CloudHealth Technologies to support Multicloud Enterprises and Continue Investing in Boston
The spectrum of code-centricity on data science platforms ranges from “code-free” to “code-based.” Data science platforms frequently boast that they provide environments that require no coding, and that are code-friendly as well. Where a given platform falls along this spectrum affects who can successfully use a given data science platform, and what tasks they are able to perform at what level of complexity and difficulty. Codeless interfaces supply drag-and-drop simplicity and relatively quick responses to straightforward questions at the expense of customizability and power. Code-based interfaces require specialized coding, data, and statistics skills, but supply the flexibility and power to answer cutting-edge questions.
Codeless and hybrid code environments furnish end users who may lack a significant coding and statistics background with some level of data science capabilities. If a problem is relatively simple, such as a straightforward clustering question to identify customer personas for marketing, graphic interfaces provide the ability to string together data workflows from a pool of available algorithms without needing to know Python or other coding languages. Even for data scientists who do know how to code, the ability to pull together relatively simple models in a drag-and-drop GUI can be faster than manually coding them, and this also avoids the problem of typos and reduces the need for debugging code technicalities at the expense of focusing on the pure logic without distractions.
Answering a more advanced question may require some level of custom coding. Your data workflow may be constructed in a hybrid manner, composed of some pre-built models connected to nodes that can include bespoke code. This permits more adaptability of models, and makes them more powerful than those restricted solely to what a given data science platform supplies out of the box. However, even if a data science platform includes the option to include custom code in a hybrid model, taking advantage of this feature requires somebody with coding knowledge to create the code.
If the problem being addressed is complex enough, sharper coding, statistics, and data skills are necessary to create appropriately tailored models. At this level of complexity, a code-centric interactive development environment is necessary so that the data scientist can put their advanced skills into model construction and customization.
Data science platforms can equip data science users and teams with multiple interfaces for creating machine learning models. What interfaces are included say a fair bit about what kind of end users a given platform aims to best serve, and the level of skill expected of the various members of your data science team. A fully-inclusive data science platform includes both a GUI environment for data analysts to construct simple workflows (and for project managers and line of business to understand what the model is doing from a high-level perspective), as well as a proper coding environment for data scientists to code more complex custom models.
For much of the past 30 years, Microsoft was famous for its hostility toward Free and Open Source Software (FOSS). They reserved special disdain for Linux, the Unix-like operating system that first emerged in the 1990s. Linux arrived on the scene just as Microsoft was beginning to batter Unix with Windows NT. The Microsoft leadership at the time, especially Steve Ballmer, viewed Linux as an existential threat. They approached Linux with an “us versus them” mentality that was, at times, rabid.
It’s not news that times have changed and Microsoft with it. Instead of looking to destroy Linux and FOSS, Microsoft CEO Satya Nadella has embraced it.
Microsoft has begun to meld with the FOSS community, creating Linux-Windows combinations that were unthinkable in the Ballmer era.
In just the past few years Microsoft has:
Continue reading Microsoft Loves Linux and FOSS Because of Developers
On August 15, 2018, Oracle announced the availability of GraphPipe, a network protocol designed to transmit machine learning data between remote processes in a standardized manner, with the goal of simplifying the machine learning model deployment process. The spec is now available on Oracle’s GitHub, along with clients and servers that have implemented the spec for Python and Go (with a Java client soon to come); and a TensorFlow plugin that allows remote models to be included inside TensorFlow graphs.
Oracle’s goal with GraphPipe is to standardize the process of model deployment regardless of the frameworks utilized in the model creation stage.
I recently wrote a Market Milestone report on Oracle’s launch of Autonomous Transaction Processing, the latest in a string of Autonomous Database announcements made by Oracle following announcements in Autonomous Data Warehousing and the initial announcement of the Autonomous Database late last year.
This string of announcements by Oracle takes advantage of Oracle’s investments in infrastructure, distributed hardware, data protection and security and index optimization to create a new set of database services that seek to automate basic support and optimization capabilities. These announcements matter because, as transactional and data-centric business models continue to proliferate, both startups and enterprises should seek a data infrastructure that will remain optimized, secure, and scalable over time without become cost and resource intensive. With Oracle Automated Transaction Processing, Oracle provides its solution to provide an enterprise-grade data foundation for this next generation of businesses.
One of Amalgam Insights’ key takeaways in this research is the analyst estimate that Oracle ATP could reduce the cost of cloud-based transactional database management by 65% compared to similar services managed on Amazon Web Services. Frankly, companies that need to support net-new transactional databases that must be performant and scalable to support Internet of Things, messaging, and other new data-driven businesses should consider Oracle ATP and should do due diligence on Oracle Autonomous Database Cloud for reducing long-term Total Cost of Ownership. This chart is based on the costs of a 10 TB Oracle database on a reserved instance on Amazon Web Services vs. a similar database on the Oracle Autonomous Database Cloud
One of the most interesting aspects of the Autonomous Database in general that Oracle will need to further explain is how to guide companies with existing transactional databases and data warehouses to an Automated environment. It is no secret that every enterprise IT department is its own special environment driven by a combination of business rules, employee preferences, governance, regulation, security, and business continuity expectations. At the same time, IT is used to automation and rapid processing of some aspects of technology management, such as threat management and logs for patching and other basic transactions. But considering the needs of IT for extreme customization, how does IT gain enough visibility to the automated decisions made in indexing and ongoing optimization?
At this point, Amalgam Insights believes that Oracle is pushing a fundamental shift in database management that will likely lead to the automation of manual technical management tasks. This change will be especially helpful for net-new databases where organizations can use the Automated Database Cloud to help establish business rules for data access, categorization, and optimization. This is likely a no-brainer decision, especially for Oracle shops that are strained in their database management resources and seeking to handle more data for new transaction-based business needs or machine learning.
For established database workloads, enterprises will have to think about how or if to transfer existing enterprise databases to the Autonomous Database Cloud. Although enterprises will likely gain some initial performance improvements and potentially reduce the support costs associated with large databases, they will also likely spend time in double-checking the decisions and lineage associated with Automated Database decisions, both in test and in deployment settings. Amalgam Insights would expect that Autonomous Database management would lead to indexing, security, and resource management decisions that may be more optimal than human-led decisions, but with a logic that may not be fully transparent to IT departments that have strongly-defined and governed business rules and processes.
Although Amalgam Insights is convinced that Oracle Autonomous Database is the beginning of a new stage of Digitized and Automated IT, we also believe that a next step for Oracle Autonomous Database Cloud will be to create governance, lineage, and audit packages to support regulated industries, legislative demands, and documentation to describe the business rules for Autonomous logic. Amalgam Insights expects that Oracle would want to keep specific algorithms and automation logic as proprietary trade secrets. But without some level of documentation that is tracable and auditable, large enterprises will have to conduct significant work on their own to figure out if they are able to transfer large databases to Oracle Autonomous Database Cloud, which Amalgam Insights would expect to be an important part of Oracle’s business model and cloud revenue projections going forward.
To read the full report with additional insights and details on the Oracle Autonomous Transaction Processing announcement, please download the full report on Oracle’s launch of Autonomous Transaction Processing, available at no cost for a limited time.
I recently received an analyst briefing from Nick Howe, the Chief Learning Officer at Area9 Learning who offer an adaptive learning solution. Although Area9 Learning was founded in 2006, I have known about area 9 since the 1980s and it was first “discovered” in 1909. How is that possible?
In 1909, the German anatomist Korbinian Brodmann developed a numbering system for mapping the cerebral cortex based on the organization of cells (called cytoarchitecture). Brodmann area 9, or BA9, includes the prefrontal cortex (a region of brain right behind the forehead) which is a critical structure in the cognitive skills learning system in the brain and functionally serves working memory and attention.
The cognitive skills learning system, prefrontal cortex (BA9), working memory and attention are critical for many aspects of learning, especially hard skills learning.
Continue reading Area9: Leveraging Brain and Computer Science to Build an Effective Adaptive Learning Platform
If your organization already has a data scientist, but your data science workload has grown beyond their capacity, you’re probably thinking about hiring another data scientist. Perhaps even a team of them. But cloning your existing data scientist isn’t the best way to grow your organization’s capacity for doing data science.
Why not simply hire more data scientists? First, so many of the tasks listed above are actually well outside the core competency of data scientists’ statistical work, and other roles (some of whom likely already exist in your organization) can perform these tasks much more efficiently. Second, data scientists who can perform all of these tasks well are a rare find; hoping to find their clones in sufficient numbers on the open market is a losing proposition. Third, though your organization’s data science practice continues to expand, the amount of time your original domain expert is able to spend with the data scientist on a growing pool of data science projects does not; it’s time to start delegating some tasks to operational specialists.
Companies struggle with all types of compliance issues. Failure to comply with government regulations, such as Dodd-Frank, EPA or HIPAA, is a significant business risk for many companies. Internally mandated compliance also represents problems as well. Security and cost control policies are just as vital as other forms of regulation since they protect the company from reputational, financial, the operational risks.
Continue reading Infrastructure as Code Provides Advantages for Proactive Compliance
Development organization continue to feel increasing pressure to produce better code more quickly. To help accomplish that faster-better philosophy, a number of methodologies have emerged that that help organizations quickly merge individual code, test it, and deploy to production. While DevOps is actually a management methodology, it is predicated on an integrated pipeline that drives code from development to production deployment smoothly. In order to achieve these goals, companies have adopted continuous integration and continuous deployment (CI/CD) tool sets. These tools, from companies such as Atlassian and GitLab, help developers to merge individual code into the deployable code bases that make up an application and then push them out to test and production environments.
Cloud vendors have lately been releasing their own CI/CD tools to their customers. In some cases, these are extensions of existing tools, such as Microsoft Visual Team Studio on Azure. Google’s recently announced Cloud Build as well as AWS CodeDeploy and CodePipeline are CI/CD tools developed specifically for their cloud environments. Cloud CI/CD tools are rarely all-encompassing and often rely on other open source or commercial products, such as Jenkins or Git, to achieve a full CI/CD pipeline.
These products represent more than just new entries into an increasingly crowded CI/CD market. They are clearly part of a longer-term strategy by cloud service providers to become so integrated into the DevOps pipeline that moving to a new vendor or adopting a multi-cloud strategy would be much more difficult. Many developers start with a single cloud service provider in order to explore cloud computing and deploy their initial applications. Adopting the cloud vendor’s CI/CD tools embeds the cloud vendor deeply in the development process. The cloud service provider is no longer sitting at the end of the development pipeline; They are integrated and vital to the development process itself. Even in the case where the cloud service provider CI/CD tools support hybrid cloud deployments, they are always designed for the cloud vendors own offerings. Google Cloud Build and Microsoft Visual Studio certainly follow this model.
There is danger for commercial vendors of CI/CD products outside these cloud vendors. They are now competing with native products, integrated into the sales and technical environment of the cloud vendor. Purchasing products from a cloud vendor is as easy as buying anything else from the cloud portal and they are immediately aware of the services the cloud vendor offers. No fuss, no muss.
This isn’t a problem for companies committed to a particular cloud service provider. Using native tools designed for the primary environment offers better integration, less work, and ease of use that is hard to achieve with external tools. The cost of these tools is often utility-based and, hence, elastic based on the amount of work product flowing through the pipeline. The trend toward native cloud CI/CD tools also helps explain Microsoft’s purchase of GitHub. GitHub, while cloud agnostic, will be much for powerful when completely integrated into Azure – for Microsoft customers anyway.
Building tools that strongly embed a particular cloud vendor into the DevOps pipeline is clearly strategic even if it promotes monoculture. There will be advantages for customers as well as cloud vendors. It remains to be seen if the advantages to customers overcome the inevitable vendor lock-in that the CI/CD tools are meant to create.