This summer, my Amalgam Insights colleague Hyoun Park and I will be teaming up to address that question. When it comes to data science platforms, there’s no such thing as “one size fits all.” We are writing this landscape because understanding the processes of scaling data science beyond individual experiments and integrating it into your business is difficult. By breaking down the key characteristics of the data science platform market, this landscape will help potential buyers choose the appropriate platform for your organizational needs. We will examine the following questions that serve as key differentiators to determine appropriate data science platform purchasing solutions to figure out which characteristics, functionalities, and policies differentiate platforms supporting introductory data science workflows from those supporting scaled-up enterprise-grade workflows.
On May 22, Domino held its first Analyst Seminar in advance of its Rev conference for data science leaders. Domino provides an open data science platform to coordinate data science initiatives across enterprises, integrating data scientists, IT, and line of business.
At the Analyst Seminar, Domino introduced its Model Management framework: five pillars supporting a core belief that data science best practices involve data science not just being a siloed department or team, but that its resulting models should drive the business. For this to be possible, all relevant stakeholders across the enterprise will need to buy into data science initiatives, as this will involve changes to existing business process in order to take advantage of the knowledge gained from data science projects.
In early June, Amalgam Insights attended Alteryx Inspire ‘18, where Alteryx Chairman and CEO Dean Stoecker led an energetic keynote to inspire their users to “Alter(yx) Everything.” Based on conversations I had with Alteryx executives, partners, and end-users, I came away with the strong impression that Alteryx wants to make advanced analytics and data science tasks as easy and quick as possible for a broad audience that may not know code – and they want to expand that community and its capabilities as quickly as possible. Data scientists and analytics-knowledgeable employees are in high demand, and the shortage is projected to worsen as the demand for these capabilities grows; data is growing faster than the existing data analyst and data scientist community can keep up with it.
Key Stakeholders: IT managers, data scientists, data analysts, database administrators, application developers, enterprise statisticians, machine learning directors and managers, existing enterprise Cloudera customers Why It Matters: As Cloudera continues its pivot towards becoming a full-service machine learning and analytics platform, its latest updates enhance its ability to retain existing customers of its commercial data lake…
Industry: Data Science Platforms
Key Stakeholders: IT managers, data scientists, data analysts, database administrators, application developers, enterprise statisticians, machine learning directors and managers, current DataScience.com customers, current Oracle customers
Why It Matters: Oracle released a number of AI tools in Q4 2017, but until now, it lacked a data science platform to support complete data science workflows. With this acquisition, Oracle now has an end-to-end platform to manage these workflows and support collaboration among teams of data scientists and business users, and it joins other major enterprise software companies in being able to operationalize data science.
Top Takeaways: Oracle acquired DataScience.com to retain customers with data science needs in-house rather than risk losing their data science-based business to competitors. However, Oracle has not yet not defined a timeline for rolling out the unified data science platform, or its future availability on the Oracle Cloud.
Oracle Acquires DataScience.com
On May 16, 2018, Oracle announced that it had agreed to acquire DataScience.com, an enterprise data science platform that Oracle expects to add to the Oracle Cloud environment. With Oracle’s debut of a number of AI tools last fall, this latest acquisition telegraphs Oracle’s intent to expedite its entrance into the data science platform market by buying its way in.
Oracle is reviewing DataScience.com’s existing product roadmap and will supply guidance in the future, but they mean to provide a single unified data science platform in concert with Oracle Cloud Infrastructure and its existing SaaS and PaaS offerings, empowering customers with a broader suite of machine learning tools and a complete workflow.
This week, everybody is talking about Google Duplex, announced earlier this week at Google I/O. Based on previous interactions with IVRs from calling vendors for customer support, Duplex is an impressive leap forward in natural language AI, and offers future hope at making some clerical tasks easier to complete. Duplex will be tested further by a limited number of users in Google Assistant this summer, refining its ability to complete specific tasks: getting holiday hours for a business, making restaurant reservations, and scheduling appointments specifically at a hair salon.
So what does this mean for most businesses?
My name is Lynne Baer, and I’ll be covering the world of data science software for Amalgam Insights. I’ll investigate data science platforms and apps to solve the puzzle of getting the right tools to the right people and organizations.
“Data science” is on the tip of every executive’s tongue right now. The idea that new business initiatives (and improvements to existing ones) can be found in the data a company is already collecting is compelling. Perhaps your organization has already dipped its toes in the data discovery and analysis waters – your employees may be managing your company’s data in Informatica, or performing statistical analysis in Statistica, or experimenting with Tableau to transform data into visualizations.
But what is a Data Science Platform? Right now, if you’re looking to buy software for your company to do data science-related tasks, it’s difficult to know which applications will actually suit your needs. Do you already have a data workflow you’d like to build on, or are you looking to the structure of an end-to-end platform to set your data science initiative up for success? How do you coordinate a team of data scientists to take better advantages of existing resources they’ve already created? Do you have coders in-house already who can work with a platform designed for people writing in Python, R, Scala, Julia? Are there more user-friendly tools out there your company can use if you don’t? What do you do if some of your data requires tighter security protocols around it? Or if some of your data models themselves are proprietary and/or confidential?
All of these questions are part and parcel of the big one: How can companies tell what makes a good data science platform for their needs before investing time and money? Are traditional enterprise software vendors like IBM, Microsoft, SAP, SAS dependable in this space? What about companies like Alteryx, H2O.ai, KNIME, RapidMiner? Other popular platforms under consideration should also include Anaconda, Angoss (recently acquired by Datawatch), Domino, Databricks, Dataiku, MapR, Mathworks, Teradata, TIBCO. And then there’s new startups like Sentenai, focused on streaming sensor data, and slightly more established companies like Cloudera looking to expand from their existing offerings.
Over the next several months, I’ll be digging deeply to answer these questions, speaking with vendors, users, and investors in the data science market. I would love to speak with you, and I look forward to continuing this discussion. And if you’ll be at Alteryx Inspire in June, I’ll see you there.