Data Science and Machine Learning News Roundup, May 2019

On a monthly basis, I will be rounding up key news associated with the Data Science Platforms space for Amalgam Insights. Companies covered will include: Alteryx, Amazon, Anaconda, Cambridge Semantics, Cloudera, Databricks, Dataiku, DataRobot, Datawatch, Domino, Elastic, Google, H2O.ai, IBM, Immuta, Informatica, KNIME, MathWorks, Microsoft, Oracle, Paxata, RapidMiner, SAP, SAS, Tableau, Talend, Teradata, TIBCO, Trifacta, TROVE.

Domino Data Lab Champions Expert Data Scientists While Outpacing Walled-Garden Data Science Platforms

Domino announced key updates to its data science platform at Rev 2, its annual data science leader summit. For data science managers, the new Control Center provides information on what an organization’s data science team members are doing, helping managers address any blocking issues and prioritize projects appropriately. The Experiment Manager’s new Activity Feed supplies data scientists with better organizational and tracking capabilities on their experiments. The Compute Grid and Compute Engine, built on Kubernetes, will make it easier for IT teams to install and administer Domino, even in complex hybrid cloud environments. Finally, the beta Domino Community Forum will allow Domino users to share best practices with each other, as well as submit feature requests and feedback to Domino directly. With governance becoming a top priority across data science practices, Domino’s platform improvements around monitoring and making experiments repeatable will make this important ability easier for its users.

Informatica Unveils AI-Powered Product Innovations and Strengthens Industry Partnerships at Informatica World 2019

At Informatica World, Informatica publicized a number of key partnerships, both new and enhanced. Most of these partnerships involve additional support for cloud services. This includes storage, both data warehouses (Amazon Redshift) and data lakes (Azure, Databricks). Informatica also announced a new Tableau Dashboard Extension that enables Informatica Enterprise Data Catalog from within the Tableau platform. Finally, Informatica and Google Cloud are broadening their existing partnership by making Intelligent Cloud Services available on Google Cloud Platform, and providing increased support for Google BigQuery and Google Cloud Dataproc within Informatica. Amalgam Insights attended Informatica World and provides a deeper assessment of Informatica’s partnerships, as well as CLAIRE-ity on Informatica’s AI initiatives.

Microsoft delivers new advancements in Azure from cloud to edge ahead of Microsoft Build conference

Microsoft announced a number of new Azure Machine Learning and Azure AI capabilities. Azure Machine Learning has been integrated with Azure DevOps to provide “MLOps” capabilities that enable reproducibility, auditability, and automation of the full machine learning lifecycle. This marks a notable increase in making the machine learning model process more governable and compliant with regulatory needs. Azure Machine Learning also has a new visual drag-and-drop interface to facilitate codeless machine learning model creation, making the process of building machine learning models more user-friendly. On the Azure AI side, Azure Cognitive Services launched Personalizer, which provides users with specific recommendations to inform their decision-making process. Personalizer is part of the new “Decisions” category within Azure Cognitive Services; other Decisions services include Content Moderator, an API to assist in moderation and reviewing of text, images, and videos; and Anomaly Detector, an API that ingests time-series data and chooses an appropriate anomaly detection model for that data. Finally, Microsoft added a “cognitive search” capability to Azure Search, which allows customers to apply Cognitive Services algorithms to search results of their structured and unstructured content.

Microsoft and General Assembly launch partnership to close the global AI skills gap

Microsoft also announced a partnership with General Assembly to address the dearth of qualified data workers, with the goal of training 15,000 workers by 2022 for various artificial intelligence and machine learning roles. The two companies will found an AI Standards Board to create standards and credentials for artificial intelligence skills. In addition, Microsoft and General Assembly will develop scalable training solutions for Microsoft customers, and establish an AI Talent network to connect qualified candidates to AI jobs. This continues the trend of major enterprises building internal training programs to bridge the data skills gap.

Amalgam Insights Publishes Highly Anticipated SmartList on Service Mesh and Microservices Management

Amalgam Insights has just published my highly anticipated SmartList Market Guide on Service Mesh. It is currently available this week at no cost as we prepare for KubeCon and CloudNativeCon Europe 2019 where I’ll be attending.

Before you go to the event, get prepared by catching up on the key strategies, trends, and vendors associated with microservices and service mesh. For instance, consider how the Service Mesh market is currently constructed.

To get a deep dive on this figure regarding the three key sectors of the Service Mesh market, gain insights describing the current State of the Market for service mesh, and learn where key vendors and products including Istio, Linkerd, A10, Amazon, Aspen Mesh, Buoyant, Google, Hashicorp, IBM, NGINX, Red Hat, Solo.io, Vamp, and more fit into today’s microservices management environment, download my report today.

Tom Petrocelli Clarifies How Cloud Foundry and Kubernetes Provide Different Paths to Microservices

DevOps Research Fellow Tom Petrocelli has just published a new report describing the roles that Cloud Foundry Application Runtime and Kubernetes play in supporting microservices. This report explores when each solution is appropriate and provides a set of vendors that provide resources and solutions to support the development of these open source projects.

Organizations and Vendors mentioned include: Cloud Foundry Foundation, Cloud Native Computing Foundation, Pivotal, IBM, Suse, Atos, Red Hat, Canonical, Rancher, Mesosphere, Heptio, Google, Amazon, Oracle, and Microsoft

To download this report, which has been made available at no cost until the end of February, go to https://amalgaminsights.com/product/analyst-insight-cloud-foundry-and-kubernetes-different-paths-to-microservices

Amazon Expands Toolkit of Machine Learning Services at AWS re:Invent

At AWS re:Invent, Amazon Web Services expanded its toolkit of machine learning application services with the announcements of Amazon Comprehend Medical, Amazon Forecast, Amazon Personalize, and Amazon Textract. These new services augment the capabilities Amazon provides to end users when it comes to text analysis, personalized recommendations, and time series forecasts. The continued growth of these individual services removes obstacles for companies looking to get started with common machine learning tasks on a smaller scale; rather than building a wholesale data science pipeline in-house, these services allow companies to quickly get one task done, and this permits an incremental introduction to machine learning for a given organization. Forecast, Personalize, and Textract are in preview, while Comprehend Medical is available now.

Amazon Comprehend Medical, Forecast, Personalize, and Textract join a collection of machine learning services that include speech recognition (Transcribe) and translation (Translate), speech-to-text and text-to-speech (Lex and Polly) to power machine conversation such as chatbots and Alexa, general text analytics (Comprehend), and image and video analysis (Rekognition).

New Capabilities

Amazon Personalize lets developers add personalized recommendations into their apps, based on a given activity stream from that app and a corpus of what’s available to be recommended, whether that’s products, articles, or other things. In addition to recommendations, Personalize can also be used to customize search results and notifications. By combining a given search string or location with contextual behavior data, Amazon looks to provide customers with the ability to build trust.

Amazon Forecast builds private, custom time-series forecast models that predict future trends based on that data. Customers provide both histoical data and related causal data, and Forecast analyzes the data to determine the relevant factors in building its models and providing forecasts.

Amazon Textract extracts text and data from scanned documents, without requiring manual data entry or custom code. In particular, using machine learning to recognize when data is in a table or form field and treat it appropriately will save a significant amount of time over the current OCR standard.

Finally, Amazon Comprehend Medical, an extension of last year’s Amazon Comprehend, uses natural language processing to analyze unstructured medical text such as doctor’s notes or clinical trial records, and extract relevant information from this text.

Recommendations

Organizations doing resource planning, financial planning, or other similar forecasting that currently lack the capability to do time series forecasting in-house should consider using Amazon Forecast to predict product demand, staffing levels, inventory levels, material availability, and to perform financial forecasting. Outsourcing the need to build complex forecasting models in-house lets departments focus on the predictions.

Consumer-oriented organizations looking to build higher levels of engagement with their customers who provide generic, uncontextualized recommendations right now (based on popularity or other simple measures) should consider using Amazon Personalize to provide personalized recommendations, search results, and notifications via their apps and website. Providing high-quality relevant recommendations a la minute builds customer trust in the quality of a given organization’s engagement efforts, particularly compared to the average spray-and-pray marketing communication.

Organizations that still depend on physical documents, or who have an archive of physical documents to scan and analyze, should consider using Amazon Textract. OCR’s limits are well-known, especially when it comes to accurately interpreting and formatting semi-structured blocks of text data such as form fields and tables, resulting in significant time devoted to post-processing manual correction. Textract handles complex documents without the need for custom code or maintaining templates; being able to automate text interpretation and analysis further accelerates document processing workflows, and better permits organizations to maintain compliance.

Medical organizations using software that depends on manually-implemented rules to process their medical text should consider using Amazon Comprehend Medical. By removing the need to maintain a list of rules in-house, Comprehend Medical accelerates the ability to extract and analyze medical information from unstructured text fields like doctor’s notes and health records, improving processes such as medical coding, cohort analysis to recruit patients for clinical trials, and health monitoring of patients.

All organizations looking to use machine learning services from external providers need to consider whether outsourcing will work for their circumstances. Data privacy is a key concern, and even more so in regulated verticals with industry-specific rules such as HIPAA. Does the service you want to use respect those rules? From a compliance perspective, why a model gives the results it does needs to be explained as well; merely accepting results from the black box at face value is insufficient. Machine learning products that automatically provide such an explanation in plain English do exist, but this feature is still uncommon and in its infancy.

Conclusion

With its latest announcements, Amazon continues to broaden the scope of customer issues it addresses with machine learning services. Medical companies need better text analytics yesterday, but struggle to comply with HIPAA while assessing the data they have. Customer-facing organizations face stiff competition when their competitor is only a click away. And any company trying to plan for the future based on past data grapples with understanding what factors affect future results. Amazon’s machine learning application services address common tactical business issues by simplifying the process for customers of implementing task-specific machine learning models to pure inputs and outputs. These services present outsourcing opportunities for overworked departments struggling to keep up.

Data Science and Machine Learning News, November 2018

On a monthly basis, I will be rounding up key news associated with the Data Science Platforms space for Amalgam Insights. Companies covered will include: Alteryx, Amazon, Anaconda, Cambridge Semantics, Cloudera, Databricks, Dataiku, DataRobot, Datawatch, DominoElastic, H2O.ai, IBM, Immuta, Informatica, KNIME, MathWorks, Microsoft, Oracle, Paxata, RapidMiner, SAP, SAS, SnapLogic, Tableau, Talend, Teradata, TIBCO, Trifacta, TROVE.

Continue reading “Data Science and Machine Learning News, November 2018”