IBM and Cloudera Join Forces to Expand Data Science Access

On June 21, IBM and Cloudera jointly announced that they were expanding their existing relationship to bring more advanced data science solutions to Hadoop users by developing a shared go-to-market program. IBM will now resell Cloudera Enterprise Data Hub and Cloudera DataFlow, while Cloudera will resell IBM Watson Studio and IBM BigSQL.

In bulking up their joint go-to-market programs, IBM and Cloudera are reaffirming their pre-existing partnership to amplify each others’ capabilities, particularly in heavy data workflows. Cloudera Hadoop is a common enterprise data source, but Cloudera’s existing base of data science users is small despite the growing demand for data science options, and their Data Science Workbench is coder-centric. Being able to offer the more user-friendly IBM Watson Studio to its customers gives Cloudera’s existing data customers a convenient option for doing data science without necessarily needing to know Python or R or Scala. IBM can now sell Watson Studio, BigSQL, and IBM consulting and services into Cloudera customers more deeply; it broadens their ability to upsell additional offerings.

Because IBM and Cloudera each hold significant amounts of on-prem data, It’s interesting to look at this partnership in terms of the 800-pound gorilla of cloud data: AWS. IBM, Cloudera, and Amazon are all leaders when it comes to the sheer amount of data each holds. But Amazon is the biggest cloud provider on the planet; it holds the plurality of the cloud hosting market, and most of IBM and Cloudera’s customers’ data is on-prem. Because that data is hosted on-prem, it’s data Amazon doesn’t have access to; IBM and Cloudera are teaming up to sell their own data science and machine learning capabilities on that on-prem data where there may be security or policy reasons to keep it out of the cloud.

A key differentiator in comparing AWS with the IBM-Cloudera partnership lies in AWS’ breadth of machine learning offerings. In addition to having a general-purpose data science and machine learning platform in SageMaker, AWS also offers task-specific tools like Amazon Personalize and Textract that address precise use cases for a number of Amazon customers who don’t need a full-blown data science platform. IBM has some APIs for visual recognition, natural language classification, and decision optimization, but AWS has developed their own APIs into higher-level services. Cloudera customers building custom machine learning models may find that IBM’s Watson Studio suits their needs. However, IBM lacks the variety of off-the-shelf machine learning applications that AWS provides. IBM supplies their machine learning capabilities as individual APIs that an application development team will need to fit together to create their own in-house apps.

Recommendations

  • For Cloudera customers looking to do broad data science, IBM Watson Studio is now an option. This offers Cloudera customers an alternative to Data Science Workbench; in particular, an option that has a more visual interface, with more drag-and-drop capabilities and some level of automation, rather than a more code-centric environment.
  • IBM customers can now choose Cloudera Enterprise Data Hub for Hadoop. IBM and Hortonworks had a long-term partnership; IBM supporting and cross-selling Enterprise Data Hub demonstrates that IBM will continue to sell enterprise Hadoop in some flavor.

Tom Petrocelli Releases Groundbreaking Technical Guide on Service Mesh

On April 2, 2019, Amalgam Insights Research Fellow Tom Petrocelli published Technical Guide: A Service Mesh Primer, which serves as a vital starting point for technical architects and developer teams to understand the current trends in microservices and service mesh. This report provides enterprise architects, CTOs, and developer teams with the guidance they need to understand the microservices architecture, service mesh architecture, and OSI model context necessary to conceptualize service mesh technologies.

In this report, Amalgam Insights provides context in the following areas: Continue reading “Tom Petrocelli Releases Groundbreaking Technical Guide on Service Mesh”

Big Changes in the Cloud Data Migration Market: Attunity and Alooma Get Acquired

Mid-February (Feb. 17 – 23) was a hot week for data and cloud migration companies with two big acquisitions. Google announced on Tuesday, Feb. 19 the acquisition of Alooma to assist with cloud data migration issues. This acquisition aligns well with the 2018 acquisition of Velostrata to support cloud workload migration. This acquisition reflects Google’s continued…

Please register or log into your Amalgam Insights Community account to read more.
Log In Register

Amazon Expands Toolkit of Machine Learning Services at AWS re:Invent

At AWS re:Invent, Amazon Web Services expanded its toolkit of machine learning application services with the announcements of Amazon Comprehend Medical, Amazon Forecast, Amazon Personalize, and Amazon Textract. These new services augment the capabilities Amazon provides to end users when it comes to text analysis, personalized recommendations, and time series forecasts. The continued growth of these individual services removes obstacles for companies looking to get started with common machine learning tasks on a smaller scale; rather than building a wholesale data science pipeline in-house, these services allow companies to quickly get one task done, and this permits an incremental introduction to machine learning for a given organization. Forecast, Personalize, and Textract are in preview, while Comprehend Medical is available now.

Amazon Comprehend Medical, Forecast, Personalize, and Textract join a collection of machine learning services that include speech recognition (Transcribe) and translation (Translate), speech-to-text and text-to-speech (Lex and Polly) to power machine conversation such as chatbots and Alexa, general text analytics (Comprehend), and image and video analysis (Rekognition).

New Capabilities

Amazon Personalize lets developers add personalized recommendations into their apps, based on a given activity stream from that app and a corpus of what’s available to be recommended, whether that’s products, articles, or other things. In addition to recommendations, Personalize can also be used to customize search results and notifications. By combining a given search string or location with contextual behavior data, Amazon looks to provide customers with the ability to build trust.

Amazon Forecast builds private, custom time-series forecast models that predict future trends based on that data. Customers provide both histoical data and related causal data, and Forecast analyzes the data to determine the relevant factors in building its models and providing forecasts.

Amazon Textract extracts text and data from scanned documents, without requiring manual data entry or custom code. In particular, using machine learning to recognize when data is in a table or form field and treat it appropriately will save a significant amount of time over the current OCR standard.

Finally, Amazon Comprehend Medical, an extension of last year’s Amazon Comprehend, uses natural language processing to analyze unstructured medical text such as doctor’s notes or clinical trial records, and extract relevant information from this text.

Recommendations

Organizations doing resource planning, financial planning, or other similar forecasting that currently lack the capability to do time series forecasting in-house should consider using Amazon Forecast to predict product demand, staffing levels, inventory levels, material availability, and to perform financial forecasting. Outsourcing the need to build complex forecasting models in-house lets departments focus on the predictions.

Consumer-oriented organizations looking to build higher levels of engagement with their customers who provide generic, uncontextualized recommendations right now (based on popularity or other simple measures) should consider using Amazon Personalize to provide personalized recommendations, search results, and notifications via their apps and website. Providing high-quality relevant recommendations a la minute builds customer trust in the quality of a given organization’s engagement efforts, particularly compared to the average spray-and-pray marketing communication.

Organizations that still depend on physical documents, or who have an archive of physical documents to scan and analyze, should consider using Amazon Textract. OCR’s limits are well-known, especially when it comes to accurately interpreting and formatting semi-structured blocks of text data such as form fields and tables, resulting in significant time devoted to post-processing manual correction. Textract handles complex documents without the need for custom code or maintaining templates; being able to automate text interpretation and analysis further accelerates document processing workflows, and better permits organizations to maintain compliance.

Medical organizations using software that depends on manually-implemented rules to process their medical text should consider using Amazon Comprehend Medical. By removing the need to maintain a list of rules in-house, Comprehend Medical accelerates the ability to extract and analyze medical information from unstructured text fields like doctor’s notes and health records, improving processes such as medical coding, cohort analysis to recruit patients for clinical trials, and health monitoring of patients.

All organizations looking to use machine learning services from external providers need to consider whether outsourcing will work for their circumstances. Data privacy is a key concern, and even more so in regulated verticals with industry-specific rules such as HIPAA. Does the service you want to use respect those rules? From a compliance perspective, why a model gives the results it does needs to be explained as well; merely accepting results from the black box at face value is insufficient. Machine learning products that automatically provide such an explanation in plain English do exist, but this feature is still uncommon and in its infancy.

Conclusion

With its latest announcements, Amazon continues to broaden the scope of customer issues it addresses with machine learning services. Medical companies need better text analytics yesterday, but struggle to comply with HIPAA while assessing the data they have. Customer-facing organizations face stiff competition when their competitor is only a click away. And any company trying to plan for the future based on past data grapples with understanding what factors affect future results. Amazon’s machine learning application services address common tactical business issues by simplifying the process for customers of implementing task-specific machine learning models to pure inputs and outputs. These services present outsourcing opportunities for overworked departments struggling to keep up.

Data Science and Machine Learning News, November 2018

On a monthly basis, I will be rounding up key news associated with the Data Science Platforms space for Amalgam Insights. Companies covered will include: Alteryx, Amazon, Anaconda, Cambridge Semantics, Cloudera, Databricks, Dataiku, DataRobot, Datawatch, DominoElastic, H2O.ai, IBM, Immuta, Informatica, KNIME, MathWorks, Microsoft, Oracle, Paxata, RapidMiner, SAP, SAS, SnapLogic, Tableau, Talend, Teradata, TIBCO, Trifacta, TROVE.

Continue reading “Data Science and Machine Learning News, November 2018”

Amalgam’s 5 Tiers of Technology Value


In Amalgam’s recent Analyst Insight, “Domo Hajimemashite At Domopalooza 2018, Domo Solves Its Case of Mistaken Identity”, Amalgam introduced a figure showing the 5 Tiers of Technology Value. This pyramid, based on Maslow’s Hierarchy of Needs, demonstrates how technology provides value that can be documented, calculated, and used to build business cases.

5 Tiers of Technology Value

Amalgam 5 Tiers Of Technology Value
Amalgam 5 Tiers Of Technology Value

To better understand these five tiers, Amalgam provides this guidance to companies seeking a better understanding of how IT investments are justified, as well as the pros and cons associated with each tier.

Please register or log into your Amalgam Insights Community account to read more.
Log In Register

Data and Analytic Strategies for Developing Ethical IT: a BrightTALK webinar

BI to AI on Trusted Data - An Amalgam Insights Research Theme
BI to AI on Trusted Data – An Amalgam Insights Research Theme

Recommended Audience: CIOs, Enterprise Architects, Data Managers, Analytics Managers, Data Scientists, IT Managers

Vendors Mentioned: Trifacta, Paxata, Datameer, Datawatch, Lavastorm, Alation, Tamr, Unifi, 1010Data, Podium Data, IBM, Domo, Microsoft, Information Builders, Board, Microstrategy, Cloudera, H20.ai, RapidMiner, Domino Data Lab, Dataiku, TIBCO, SAS, Amazon Web Services, Google, DataRobot.

In case you missed it, I just finished up my webinar on Data and Analytic Strategies for Developing Ethical IT. We are headed into a new algorithmic, statistical, and heterogenous data-defined model of IT where IT ethics and relevance are being challenged. In this webinar, we discussed:

  • Why IT is broken from a support and business perspective
  • The aspects of IT that can be fixed
  • What we can do as IT managers to fix IT
  • Data Prep, Data Unification, Business Intelligence, Data Science, and Machine Learning vendors that can help unlock the Black Boxes and Opt-Out disasters in IT
  • Key Recommendations

This webinar provides context to my ongoing research tracks of “BI to AI on Shared Data” and “IT Management at Scale.” To attend the webinar, please check the embedded view below or click to watch on BrightTALK


Hyoun Park Discusses Cloud Pricing on CIO.com

Money Bubbles in the Clouds

On CIO.com, analyst Hyoun Park discusses recent cloud pricing changes by Oracle, Amazon, and Google in context of understanding who is actually providing the cheapest cloud. In this blog, Park posits that Oracle’s new Universal Credits for IaaS and PaaS usage are fundamentally different from the traditional pricing models for cloud and shows that the enterprise cloud is coming of age.

One of Park’s assertions is that the most granular pricing may not be the cheapest because the complexity of detailed pricing prevents companies from optimizing their costs. Will this trend affect your cloud costs?

To learn more, click through to CIO.com and read this article: “Is the cheapest cloud pricing flexible or granular?”

Also, join Hyoun’s webinar to learn more about managing cloud costs on BrightTALK: Cloud Service Management: Managing Cost, Resources, and Security

Amazon SageMaker: A Key to Accelerating Enterprise Machine Learning Adoption

On November 29th, Amazon Web Services announced SageMaker, a managed machine language service that manages the authoring, model training, and hosting of algorithms and frameworks. These capabilities can be used by themselves, or as an end-to-end production pipeline.

SageMaker is currently available with a Free tier providing 250 hours of t2.medium notebook usage, 50 hours of m4.xlarge training usage, and 125 hours of m4.xlarge hosting usage for hosting for two months. After two months or for additional hours, the service is billed per instance, storage GB, and data transfer GB.

Amalgam Insights anticipates watching the adoption of SageMaker as it solves several basic problems in machine learning.

Please register or log into your Amalgam Insights Community account to read more.
Log In Register

Amazon Aurora Serverless vs. Oracle Autonomous Database: A Microcosm for The Future of IT

On November 29th, Amazon Web Services announced a variety of interesting database announcements at Amazon re:invent. Amazon Neptune, DynamoDB enhancements, and Aurora Serverless. Amalgam found both Neptune and DynamoDB announcements to be valuable but believes Aurora Serverless was the most interesting of these events both in its direct competition with Oracle and its personification of…

Please register or log into your Amalgam Insights Community account to read more.
Log In Register