Data Science and Machine Learning News Roundup, January 2019

On a monthly basis, I will be rounding up key news associated with the Data Science Platforms space for Amalgam Insights. Companies covered will include: Alteryx, Amazon, Anaconda, Cambridge Semantics, Cloudera, Databricks, Dataiku, DataRobot, Datawatch, DominoElastic, Google, H2O.ai, IBM, Immuta, Informatica, KNIME, MathWorks, Microsoft, Oracle, Paxata, RapidMiner, SAP, SAS, Tableau, Talend, Teradata, TIBCO, Trifacta, TROVE.

Cloudera and Hortonworks Complete Planned Merger

In early January, Cloudera and Hortonworks completed their planned merger. With this, Cloudera becomes the default machine learning ecosystem for Hadoop-based data, while providing an easy pathway for expanding into  machine learning and analytics capabilities for Hortonworks customers.

Study: 89 Percent of Finance Teams Yet to Embrace Artificial Intelligence

A study conducted by the Association of International Certified Professional Accountants (AICPA) and Oracle revealed that 89% of organizations have not deployed AI to their finance groups. Although a correlation exists between companies with revenue growth and companies that are using AI, the key takeaway is that artificial intelligence is still in the early adopter phase for most organizations.

Gartner Magic Quadrant for Data Science and Machine Learning Platforms

In late January, Gartner released its Magic Quadrant for Data Science and Machine Learning Platforms. New to the Data Science and Machine Learning MQ this year are both DataRobot and Google – two machine learning offerings with completely different audiences and scope. DataRobot offers an automated machine learning service targeted towards “citizen data scientists,” while Google’s machine learning tools, though part of Google Cloud Platform, are more of a DIY data pipeline targeted towards developers. By contrast, I find it curious that Amazon’s SageMaker machine learning platform – and its own collection of task-specific machine learning tools, despite their similarity to Google’s – failed to make the quadrant, given this quadrant’s large umbrella.

While data science and machine learning are still emerging markets, the contrasting demands of these technologies made by citizen data scientists and by cutting-edge developers warrants splitting the next Data Science and Machine Learning Magic Quadrant into separate reports targeted to the considerations of each of these audiences. In particular, the continued growth of automated machine learning technologies will likely drive such a split, as citizen data scientists pursue a “good enough” solution that provides quick results.

Data Science and Machine Learning News, October 2018

On a monthly basis, I will be rounding up key news associated with the Data Science Platforms space for Amalgam Insights. Companies covered will include: Alteryx, Anaconda, Cambridge Semantics, Cloudera, Databricks, Dataiku, DataRobot, Datawatch, DominoElastic, H2O.ai, IBM, Immuta, Informatica, KNIME, MathWorks, Microsoft, Oracle, Paxata, RapidMiner, SAP, SAS, Tableau, Talend, Teradata, TIBCO, Trifacta, TROVE.

Please register or log into your Amalgam Insights Community account to read more.
Log In Register

What Data Science Platform Suits Your Organization’s Needs?

This summer, my Amalgam Insights colleague Hyoun Park and I will be teaming up to address that question. When it comes to data science platforms, there’s no such thing as “one size fits all.” We are writing this landscape because understanding the processes of scaling data science beyond individual experiments and integrating it into your business is difficult. By breaking down the key characteristics of the data science platform market, this landscape will help potential buyers choose the appropriate platform for your organizational needs. We will examine the following questions that serve as key differentiators to determine appropriate data science platform purchasing solutions to figure out which characteristics, functionalities, and policies differentiate platforms supporting introductory data science workflows from those supporting scaled-up enterprise-grade workflows.

Please register or log into your Amalgam Insights Community account to read more.
Log In Register

Lynne Baer: Clarifying Data Science Platforms for Business

Word cloud of data science software and terms

My name is Lynne Baer, and I’ll be covering the world of data science software for Amalgam Insights. I’ll investigate data science platforms and apps to solve the puzzle of getting the right tools to the right people and organizations.

“Data science” is on the tip of every executive’s tongue right now. The idea that new business initiatives (and improvements to existing ones) can be found in the data a company is already collecting is compelling. Perhaps your organization has already dipped its toes in the data discovery and analysis waters – your employees may be managing your company’s data in Informatica, or performing statistical analysis in Statistica, or experimenting with Tableau to transform data into visualizations.

But what is a Data Science Platform? Right now, if you’re looking to buy software for your company to do data science-related tasks, it’s difficult to know which applications will actually suit your needs. Do you already have a data workflow you’d like to build on, or are you looking to the structure of an end-to-end platform to set your data science initiative up for success? How do you coordinate a team of data scientists to take better advantages of existing resources they’ve already created? Do you have coders in-house already who can work with a platform designed for people writing in Python, R, Scala, Julia? Are there more user-friendly tools out there your company can use if you don’t? What do you do if some of your data requires tighter security protocols around it? Or if some of your data models themselves are proprietary and/or confidential?

All of these questions are part and parcel of the big one: How can companies tell what makes a good data science platform for their needs before investing time and money? Are traditional enterprise software vendors like IBM, Microsoft, SAP, SAS dependable in this space? What about companies like Alteryx, H2O.ai, KNIME, RapidMiner? Other popular platforms under consideration should also include Anaconda, Angoss (recently acquired by Datawatch), Domino, Databricks, Dataiku, MapR, Mathworks, Teradata, TIBCO. And then there’s new startups like Sentenai, focused on streaming sensor data, and slightly more established companies like Cloudera looking to expand from their existing offerings.

Over the next several months, I’ll be digging deeply to answer these questions, speaking with vendors, users, and investors in the data science market. I would love to speak with you, and I look forward to continuing this discussion. And if you’ll be at Alteryx Inspire in June, I’ll see you there.

Cloudera Analyst Conference Makes The Case for Analytic & AI Insights at Scale

On April 9th and 10th, Amalgam Insights attended the fifth Cloudera’s Industry Analyst and Influencer Conference (which I’ll self-servingly refer to as the Analyst Conference since I attended as an industry analyst) in Santa Monica. Cloudera sought to make the case that it was evolving beyond the market offerings that it is currently best known for as a Hadoop distribution and commercial data lake in becoming a machine learning and analytics platform. In doing so, Cloudera was extremely self-aware of its need to progress beyond the role of multi-petabyte storage at scale to a machine learning solution.
Cloudera’s Challenges in Enterprise Machine Learning 
Please register or log into your Amalgam Insights Community account to read more.
Log In Register

Data and Analytic Strategies for Developing Ethical IT: a BrightTALK webinar

BI to AI on Trusted Data - An Amalgam Insights Research Theme
BI to AI on Trusted Data – An Amalgam Insights Research Theme

Recommended Audience: CIOs, Enterprise Architects, Data Managers, Analytics Managers, Data Scientists, IT Managers

Vendors Mentioned: Trifacta, Paxata, Datameer, Datawatch, Lavastorm, Alation, Tamr, Unifi, 1010Data, Podium Data, IBM, Domo, Microsoft, Information Builders, Board, Microstrategy, Cloudera, H20.ai, RapidMiner, Domino Data Lab, Dataiku, TIBCO, SAS, Amazon Web Services, Google, DataRobot.

In case you missed it, I just finished up my webinar on Data and Analytic Strategies for Developing Ethical IT. We are headed into a new algorithmic, statistical, and heterogenous data-defined model of IT where IT ethics and relevance are being challenged. In this webinar, we discussed:

  • Why IT is broken from a support and business perspective
  • The aspects of IT that can be fixed
  • What we can do as IT managers to fix IT
  • Data Prep, Data Unification, Business Intelligence, Data Science, and Machine Learning vendors that can help unlock the Black Boxes and Opt-Out disasters in IT
  • Key Recommendations

This webinar provides context to my ongoing research tracks of “BI to AI on Shared Data” and “IT Management at Scale.” To attend the webinar, please check the embedded view below or click to watch on BrightTALK


28 Hours as an Industry Analyst at Strata Data 2017

grid-725269_640
grid-725269_640
grid-725269_640

Companies Mentioned: Aberdeen Group, Actian, Alation, Arcadia Data, Attunity, BMC, Cambridge Semantics, Cloudera, Databricks, Dataiku, DataKitchen, Datameer, Datarobot, Domino Data Lab, EMA, HPE, Hurwitz and Associates, IBM, Informatica, Kogentix, LogTrust, Looker, < MesoSphere, Micro Focus, Microstrategy, Ovum, Paxata, Podium Data, Qubole, SAP, Snowflake, Strata Data, Tableau, Tamr, Tellius, Trifacta.

Last week, I attended Strata Data Conference at the Javitz Center in New York City to catch up with a wide variety of data science and machine learning users, enablers, and thought leaders. In the process, I had the opportunity to listen to some fantastic keynotes and to chat with 30+ companies looking for solutions, 30+ vendors presenting at the show, and attend with a number of luminary industry analysts and thought leaders including Ovum’s Tony Baer, EMA’s John Myers, Aberdeen Group’s Mike Lock, and Hurwitz & Associates’ Judith Hurwitz.

From this whirwind tour of executives, I took a lot of takeaways from the keynotes and vendors that I can share and from end users that I unfortunately have to keep confidential. To give you an idea of what an industry analyst notes, following are a short summary of takeaways I took from the keynotes and from each vendor that I spoke to:

Keynotes: The key themes that really got my attention is the idea that AI requires ethics, brought up by Joanna Bryson, and that all data is biased, which danah boyd discussed. This idea that data and machine learning have their own weaknesses that require human intervention, training, and guidance is incredibly important. Over the past decade, technologists have put their trust in Big Data and the idea that data will provide answers, only to find that a naive and “unbiased” analysis of data has its own biases. Context and human perspective are inherent to translating data into value: this does not change just because our analytic and data training tools are increasingly nuanced and intelligent in nature.

Behind the hype of data science, Big Data, analytic modeling, robotic process automation, DevOps, DataOps, and artifical intelligence is this fundamental need to understand that data, algorithms, and technology all have inherent biases as the following tweet shows:

Please register or log into your Amalgam Insights Community account to read more.
Log In Register

With Cloudera’s S-1, Hadoop and Big Data Finally Come of Age

On Friday, March 31st, Cloudera filed its S-1 with intention to IPO. The timing looks good considering the recent successful IPOs of Alteryx, Mulesoft, and Snap. But how does Cloudera actually match up with other tech companies in terms of being successful in the short and medium term?

Cloudera’s S-1 filing starts by describing the near-term growth potential of the Internet of Things and IDC’s estimate of 30 billion internet-connected mobile devices in 2020. Every analyst and consulting firm has some idea of whether this is going to be 20 billion, 30 billion, or 40 billion, but the most important aspects of this growth are that:

Please register or log into your Amalgam Insights Community account to read more.
Log In Register