Data Science Platforms News Roundup, June 2018

On a monthly basis, I will be rounding up key news associated with the Data Science Platforms space for Amalgam Insights. Companies covered will include: Alteryx, Anaconda, Cloudera, Databricks, Dataiku, Datawatch, Domino, H2O.ai, IBM, Immuta, Informatica, KNIME, MathWorks, Microsoft, Oracle, Paxata, RapidMiner, SAP, SAS, Tableau, Talend, Teradata, TIBCO, Trifacta.

Databricks Conquers AI Dilemma with Unified Analytics

At Spark + AI, Databricks announced new capabilities for its Unified Analytics Platform, including MLFlow and Databricks Runtime for ML. MLFlow is an open source multi-cloud framework intended to standardize and simplify machine learning workflows to ensure machine learning gets put into production. Databricks Runtime for ML scales deep learning with new GPU support for AWS and Azure, along with preconfigured environments for the most popular machine learning frameworks such as scikit-learn, Tensorflow, Keras, and XGBoost. The net result of these new capabilities is that Databricks users will be able to get their machine learning work done faster.

IBM and H2O.ai Partnership Aims to Accelerate Adoption of AI in the Enterprise

H2O.ai and IBM announced a partnership in early June that permits the use of H2O’s Driverless AI on IBM PowerSystems, Driverless AI has automated machine learning capabilities, while PowerAI is a machine learning and deep learning toolkit, and the combination will permit significantly faster processing overall. This builds on the pre-existing integration H2O’s open source libraries into IBM’s Data Science Experience analytics solution, though when this announcement was made, IBM had not yet debuted PowerAI Enterprise, so the availability of Driverless AI on PowerAI Enterprise remains TBD.

Announcing PowerAI Enterprise: Bringing Data Science into Production

This month, IBM announced the release of PowerAI Enterprise, which runs on IBM PowerSystems. It’s an expansion of IBM’s PowerAI applied AI offering that extends its coverage to include the entire data science workflow. IBM continues to cover its bases by diversifying its data science offerings, adding PowerAI to their existing Data Science Experience and Watson Studio offerings, but this also creates confusion as companies seek to determine which data science platform product suits their needs. We look forward to covering and clarifying this in greater detail.

Alteryx Reveals Newest Platform Release at Inspire 2018

At Alteryx Inspire, Alteryx announced the latest release (2018.2) of the Alteryx Analytics Platform, with improvements such as making common analytic tasks even easier via templates, extending community search across the entire platform, and enhanced onboarding for new users. I detail the new features in my earlier post, Alter(yx)ing Everything at Inspire 2018; the upshot is that Alteryx continues to focus on ease of use for analytics end users.

Introducing Dask for Scalable Machine Learning

Anaconda released Dask, a new Python-based tool for processing large datasets. Python libraries like NumPy, pandas, and scikit-learn are designed to work with data in-memory on a single core; Dask will let data scientists process large datasets in parallel, even on a single computer, without needing to use Spark or another distributed computing framework. This expedites machine learning workflows on large datasets in Python, with the added convenience of being able to remain in your Python work environment.


Finally, I’m also working on a Vendor SmartList for the Data Science Platforms space this summer. If you’d like to learn more about this research initiative, or set up a briefing with Amalgam Insights for potential inclusion, please email me at lynne@amalgaminsights.com.

What Data Science Platform Suits Your Organization’s Needs?

This summer, my Amalgam Insights colleague Hyoun Park and I will be teaming up to address that question. When it comes to data science platforms, there’s no such thing as “one size fits all.” We are writing this landscape because understanding the processes of scaling data science beyond individual experiments and integrating it into your business is difficult. By breaking down the key characteristics of the data science platform market, this landscape will help potential buyers choose the appropriate platform for your organizational needs. We will examine the following questions that serve as key differentiators to determine appropriate data science platform purchasing solutions to figure out which characteristics, functionalities, and policies differentiate platforms supporting introductory data science workflows from those supporting scaled-up enterprise-grade workflows.

Amalgam’s Assumptions: The baseline order of operations for conducting data science experiments begins with understanding the business problem you’re trying to address. Gathering, prepping, and exploring the data are the next steps, done to extract appropriate features and start creating your model. The modeling process is iterative, and data scientists will adjust their model throughout the process based on feedback. Finally, if and when a model is deemed satisfactory, it can be deployed in some form.

How do these platforms support reproducibility of data, workflows, and results?

One advantage some data science platforms provide is the ability to track and save the data and hyperparameters run in each experiment, so that that experiment can be re-run at any time. Individual data scientists running ad hoc experiments need to do this tracking manually, if they even know to bother with it.

How secure, governable, and compliant are these platforms compared to corporate, standards-based, and legislative needs?

Data access is fragmented, and in early-stage data science setups, it’s not uncommon for data scientists to copy and paste and store the data they need on their own laptop, because they lack the ability to use that data directly while keeping it secure in an IT-approved manner. Data science platforms can help make this secure access process easier.

How do these platforms support collaboration between data scientists, data analysts, IT, and line-of-business departments?

Your data scientists should be able to share their reports in a usable form with the rest of the business, whether this looks likes reports, dashboards, microservices, or apps. In addition, the consumers of these data outputs need to be able to give feedback to the producers to improve results. To capitalize on data science experiments being done in a company, some level of collaboration is necessary, but this may mean different things to different organizations. Some have shared code repositories. Some use chat. Effectively scaling up data science operations requires a more consistent experience across the board, so that everybody knows where to find what they need to get their work done. Centralizing feedback on models into the platform, associated with the models and their outputs, is one example of enabling the consistency necessary.

How do these platforms support a consistent view of data science based on the user interfaces and user experiences that the platforms provide to all users?

This consistency isn’t just limited to creating a model catalog with centralized feedback – the process of going from individual data scientists operating ad hoc and using their specific preferred tools to a standardized experience can meet resistance. Data science platforms often support a wide variety of such tools, which can ease this transition, but not all data science platforms support the same sets of tools. But moving to a unified experience makes it easier to onboard new data scientists into your environment.

What do data science teams look like when they are using data science platforms?

Some teams consist of a couple of people constructing skunkworks pipelines out of code as an initial or side project. Others may do enough ongoing data science work that they work with line of business stakeholders, perhaps with the assistance of a project manager. If data science is core business for your organization, that’s a large team relative to your company size no matter how large your company is, and these teams have different needs. A focus of this research is to categorize typical experiences across the spectrum by team size and complexity, code-centricness, and other measures.

Team ComplexityCode-Based or Codeless

By exploring the people, processes, and technological functionalities associated with data science platforms over this summer, Amalgam Insights looks forward to bringing clarity to the market and providing directional recommendations to the enterprise community. This Vendor SmartList on Data Science Platforms will explore these questions and more in differentiating between a variety of Data Science Platforms currently in the market including, but not limited to: Alteryx, Anaconda, Cloudera, Databricks, Dataiku, Domino, H20.ai, IBM, KNIME, Mathworks, Oracle, RapidMiner, SAP, SAS Viya, Teradata, TIBCO, and other startups and new entrants in this space that establish themselves over the Summer of 2018.

If you’d like to learn more about this research initiative, or set up a briefing with Amalgam Insights for potential inclusion, please email me at lynne@amalgaminsights.com.

 

Lynne Baer: Clarifying Data Science Platforms for Business

Word cloud of data science software and terms

My name is Lynne Baer, and I’ll be covering the world of data science software for Amalgam Insights. I’ll investigate data science platforms and apps to solve the puzzle of getting the right tools to the right people and organizations.

“Data science” is on the tip of every executive’s tongue right now. The idea that new business initiatives (and improvements to existing ones) can be found in the data a company is already collecting is compelling. Perhaps your organization has already dipped its toes in the data discovery and analysis waters – your employees may be managing your company’s data in Informatica, or performing statistical analysis in Statistica, or experimenting with Tableau to transform data into visualizations.

But what is a Data Science Platform? Right now, if you’re looking to buy software for your company to do data science-related tasks, it’s difficult to know which applications will actually suit your needs. Do you already have a data workflow you’d like to build on, or are you looking to the structure of an end-to-end platform to set your data science initiative up for success? How do you coordinate a team of data scientists to take better advantages of existing resources they’ve already created? Do you have coders in-house already who can work with a platform designed for people writing in Python, R, Scala, Julia? Are there more user-friendly tools out there your company can use if you don’t? What do you do if some of your data requires tighter security protocols around it? Or if some of your data models themselves are proprietary and/or confidential?

All of these questions are part and parcel of the big one: How can companies tell what makes a good data science platform for their needs before investing time and money? Are traditional enterprise software vendors like IBM, Microsoft, SAP, SAS dependable in this space? What about companies like Alteryx, H2O.ai, KNIME, RapidMiner? Other popular platforms under consideration should also include Anaconda, Angoss (recently acquired by Datawatch), Domino, Databricks, Dataiku, MapR, Mathworks, Teradata, TIBCO. And then there’s new startups like Sentenai, focused on streaming sensor data, and slightly more established companies like Cloudera looking to expand from their existing offerings.

Over the next several months, I’ll be digging deeply to answer these questions, speaking with vendors, users, and investors in the data science market. I would love to speak with you, and I look forward to continuing this discussion. And if you’ll be at Alteryx Inspire in June, I’ll see you there.

Cloudera Analyst Conference Makes The Case for Analytic & AI Insights at Scale

On April 9th and 10th, Amalgam Insights attended the fifth Cloudera’s Industry Analyst and Influencer Conference (which I’ll self-servingly refer to as the Analyst Conference since I attended as an industry analyst) in Santa Monica. Cloudera sought to make the case that it was evolving beyond the market offerings that it is currently best known for as a Hadoop distribution and commercial data lake in becoming a machine learning and analytics platform. In doing so, Cloudera was extremely self-aware of its need to progress beyond the role of multi-petabyte storage at scale to a machine learning solution.
Cloudera’s Challenges in Enterprise Machine Learning 
Please register or log into your Free Amalgam Insights Community account to read more.
Log In Register

Anaplan States Planning Is Dead, Focuses on the Era of Real-Time Decision


Recommended Reading for: Finance, Sales Operations, Supply Chain Management, IT Management, and Enterprise Strategy Personnel
Companies Mentioned: Anaplan, IBM, SAP, Oracle, Microstrategy, Tableau, DataRobot, TROVE Data, Louis Vuitton, Premji Invest, Salesforce Ventures, Top Tier Capital Partners, Baillie Gifford, Granite Ventures, Industry Ventures, Meritech Capital, Constellation Research, Ventana Research, IDC, Mint Jutras, ISG, Gartner, Apps Run the World, TechVentive

On March 6th and 7th, 2018, Amalgam Insights attended Anaplan Hub 18. Anaplan has been on Amalgam analysts’ radar for several years, as we consider Anaplan’s Hyperblock foundation and ability to serve multi-departmental planning in enterprises without a year or more of setup to be fundamental advantages. As we have covered this company, we have been waiting for Anaplan to reach its breakthrough moment where it takes its place as one of the true market leaders in enterprise applications. It is in this context that we attended Anaplan Hub and judged our interactions with Anaplan executives, customers, and partners.

This report provides updates on Anaplan’s key business metrics, executive insights from an analyst-only panel, keynote and product announcements, a 2018 perspective on customer success stories with Anaplan, and Amalgam’s expectations for Anaplan in 2018 and beyond as both a real-time planning application and a Platform as a Service.

Anaplan Key Business Updates

Please register or log into your Free Amalgam Insights Community account to read more.
Log In Register

Blockchain! What is it Good For?

Diamond - Immutable and Hardened
Tom Petrocelli, Amalgam Insights Contributing Analyst

Blockchain looks to be one of those up and coming technologies that is constantly being talked about. Many of the largest IT companies – IBM, Microsoft, and Oracle to name few – plus a not-for-profit or two are heavily promoting blockchain. Clearly, there is intense interest, much of it fueled by exotic-sounding cryptocurrencies such as Bitcoin and Ethereum. The big question I get asked – and analysts are supposed to be able to answer the big questions – is “What can I use blockchain for?”
Continue reading “Blockchain! What is it Good For?”

Market Milestone: Red Hat Acquires CoreOS Changing the Container Landscape

We have just published a new document from Tom Petrocelli analyzing Red Hat’s $250 million acquisition of CoreOS and why it matters for DevOps and Systems Architecture managers. This report is recommended for CIOs, System Architects, IT Managers, System Administrators, and Operations Managers who are evaluating CoreOS and Red Hat as container solutions to support…

Please register or log into your Free Amalgam Insights Community account to read more.
Log In Register

Data and Analytic Strategies for Developing Ethical IT: a BrightTALK webinar

Recommended Audience: CIOs, Enterprise Architects, Data Managers, Analytics Managers, Data Scientists, IT Managers Vendors Mentioned: Trifacta, Paxata, Datameer, Datawatch, Lavastorm, Alation, Tamr, Unifi, 1010Data, Podium Data, IBM, Domo, Microsoft, Information Builders, Board, Microstrategy, Cloudera, H20.ai, RapidMiner, Domino Data Lab, Dataiku, TIBCO, SAS, Amazon Web Services, Google, DataRobot. In case you missed it, I just finished…

Please register or log into your Free Amalgam Insights Community account to read more.
Log In Register

Calero Purchases European TEM Leader A&B Groep: What to Expect?


Note: This blog contains excerpts from Amalgam’s Due Diligence Dossier on Calero. To get the full report, click here.

In January 2018, Calero announced two acquisitions, Comview and A&B Groep. These acquisitions have increased Calero’s headcount by over 70 employees, added geographic footprint, demonstrated a specific profile for acquisition, and demonstrates the willingness for new owners, Riverside Partners, to quickly take action within 120 days of acquiring Calero. This combination of acquisition, execution, and stated focus result in the need to re-evaluate Calero in context of these significant changes. Continue reading “Calero Purchases European TEM Leader A&B Groep: What to Expect?”

The Brain Science of Effective Corporate Soft Skills Training

Cognitive and Behavior Systems of Learning

Companies Mentioned: Deloitte, Salesforce, SAP, Cornerstone, Saba, Skillsoft, Fivel, PageUp, PeopleFluent, Talentsoft, Oracle, SilkRoad, IBM, Lumesse, Litmos, D2L, LearnCore, and Lessonly

Soft skills are “people skills”, and they are extremely important in the commercial sector. They involve showing and feeling empathy, embracing diversity, and understanding that we all have biases that we need to be aware of and keep in check. They involve effective interpersonal interactions and real-time communication skills and are relevant at all corporate levels. Whether office staff who interface with clients, office managers who interface with employees and their superiors, or the C-suite who provide the leadership and vision for the company, effective soft skills matter. An individual with strong soft skills can be an effective collaborator, leader, and “good” citizen. They not only know “what” behaviors are appropriate and inappropriate, but they know “how” to generate those behaviors and do so in a highly effective manner.
Continue reading “The Brain Science of Effective Corporate Soft Skills Training”