Last week, I attended Strata Data Conference at the Javitz Center in New York City to catch up with a wide variety of data science and machine learning users, enablers, and thought leaders. In the process, I had the opportunity to listen to some fantastic keynotes and to chat with 30+ companies looking for solutions, 30+ vendors presenting at the show, and attend with a number of luminary industry analysts and thought leaders including Ovum’s Tony Baer, EMA’s John Myers, Aberdeen Group’s Mike Lock, and Hurwitz & Associates’ Judith Hurwitz.
From this whirwind tour of executives, I took a lot of takeaways from the keynotes and vendors that I can share and from end users that I unfortunately have to keep confidential. To give you an idea of what an industry analyst notes, following are a short summary of takeaways I took from the keynotes and from each vendor that I spoke to:
Keynotes: The key themes that really got my attention is the idea that AI requires ethics, brought up by Joanna Bryson, and that all data is biased, which danah boyd discussed. This idea that data and machine learning have their own weaknesses that require human intervention, training, and guidance is incredibly important. Over the past decade, technologists have put their trust in Big Data and the idea that data will provide answers, only to find that a naive and “unbiased” analysis of data has its own biases. Context and human perspective are inherent to translating data into value: this does not change just because our analytic and data training tools are increasingly nuanced and intelligent in nature.
Behind the hype of data science, Big Data, analytic modeling, robotic process automation, DevOps, DataOps, and artifical intelligence is this fundamental need to understand that data, algorithms, and technology all have inherent biases as the following tweet shows:
— Diane Kim (@_DianeKim) October 4, 2017
The most important lesson that we have to remember is that as we pursue advanced technologies over the next 25 years including AI, the Internet of Things, wearables, quantum computing, mechanical enhancement, DNA as identifier and data storage, genetic enhancement, neural interfaces, and eventually implants and potentially the Kurzweilian Singularity where intelligence becomes non-biological, we have to think about the ethics and assumptions at each point to ensure that technology serves humanity. As an industry analyst, my career path is defined in part by when these technologies reach proof-of-concept and production status. Although some of these trends are still 20-25 years away, I still look at them now to provide guidance when these new technologies enter the business world.
With that quick look into the future-facing mind of the analyst, here are a number of vendor announcements and positioning statements that caught my attention in the here and now as we struggle today with machine learning, data governance, and BI across a wide variety of data types.
Actian – Actian has been in a state of flux over the past few years as it quickly acquired a number of Big Data technologies and then started to phase out certain products. In speaking with Actian, Amalgam saw that there have been both executive changes and a renewed focus on new products over the past year including a recent annoucement of Apache Spark support for the newest version of Actian VectorH as well as near-term plans to develop further cloud and IoT products. Actian has stated a focus on bringing data management, integration, and analytics together for hybrid data, which includes both massive application and analytic workloads as well as the rapid data of IoT. At a time when performant data is increasingly important both at scale and at high velocity, Amalgam believes Actian is in a strong position to re-establish its status as a high performance and hybrid data vendor.
Alation – Alation’s collaborative data cataloging approach has been an important enabler for enterprise information management approaches, which helped justify its $23 million Series B round in July. At their booth, Alation prominently showcased its new partnership with Paxata to move directly from data lineage and categorization to data lineage. Amalgam looks forward to seeing how the market treats this partnership in light of demand for end-to-end data wrangling from raw source data to contextualized information.
Arcadia Data – Arcadia Data’s data visualization for Big Data has been a market-leading offering for the past couple of years. With the validation of its IoT analytics capabilities and the ongoing development of assisted insights for complex and hybrid data, Arcadia continues to build out improved visualization for larger and more varied forms data. In addition, Arcadia is building out additional support for the specific data associated with financial services organizations to show increased industry focus.
Attunity – Amalgam has covered Attunity’s data integration, replication, and change data capture capabilities for several years with the assumption that this approach of focusing on deltas and changes in data would be fundamental to supporting larger volumes of data. This strength was apparent in the Attunity 6.0 Data Integration Platform launch that showed increased support for cloud sources (including Amazon, Azure, and Snowflake), streaming data platforms, and integration with streaming metadata. Attunity is now being used for some of the largest data stores in the world and streaming data is becoming a standard analytic data source in the majority of enterprises that requires metadata, lineage, and centralized management.
BMC – BMC’s automation and support for Big Data workflows through Control-M is an important step for translating Hadoop-based Big Data into servicable and actionable data. Although BMC is not alone in this approach, both the global scale of BMC and existing relationships with related service and workload management personnel allow it to take a more business-based approach than many of its technology-focused competitors.
Cambridge Semantics is local to Amalgam Insights and stands out for its semantic layer across the “data lake.” This capabilities allows Cambridge Semantics to support a full knowledge graph, which is an important starting point for machine learning. One of the key gaps that Amalgam sees in end user inquiries is in having an incomplete categorization and contextualization of data that will be used for machine learning. In addition, graph analysis is still relatively uncommon in the enterprise. With the emergence of AI, businesses will need to catch up and Cambridge Semantics has a bright future.
Cloudera – Amalgam believes that the Cloudera Data Science Workbench has a key advantage over other data science workbenches in its direct integration with Cloudera Enterprise Data Hub, support for Hadoop engines (an advantage where others are quickly catching up), and depth of governance and infrastructure management. Amalgam is exploring the Data Science Workbench market as we believe this space needs to be separated from the larger world of predictive and advanced analytic solutions. Data science projects are not the same as analytic workflows and require a more programmatic approach.
Databricks – Databricks has grown rapidly with its $140 million round raised in August and the addition of multiple experienced executives to support its Data Science platform that combines analytic workflows, cloud infrastructure, and productization capabilities. Does this sound like a repeat of what you just read? There is an interesting matchup coming up in the near future between Cloudera and Databricks: the battle has already started, but the market is just starting to be built in earnest.
Dataiku – Dataiku has been on Amalgam’s radar for the past couple of years due to its end-to-end approach for translating data into analytic outputs and the ease of handing off data science production across team members. Dataiku raised a $28 million Series B in September of this year to expand the commercialization of its collaborative data science capabilities.
DataKitchen is a Cambridge, MA based startup that focuses on DataOps: bringing data agility and DevOps together in an integrated manner to support analytic outputs. Functionally, DataKitchen acts as a reusable “recipe”-based platform to support the orchestration of data workflow development, testing, and deployment. Its DataOps Manifesto is a series of best practices and concepts that should be standard in the enterprise.
Datameer – Datameer’s latest release supports ad-hoc self-service visualization and exploration of Big Data sources in seconds through a product called Visual Explorer. This capability is currently in private beta and reflects the expectations for data created by the likes of Tableau and Qlik, but supports these expectations on raw data rather than requiring model building, dimensional schemas, or transferring data to analytic marts or environments. Exploring data in the “lake” is a necessary step to open up Big Data for operational exploration.
DataRobot – DataRobot’s machine learning automation for model optimization is a unique capability in the machine learning world in democratizing predictive analytics. The combination of executive hirings and DataRobot’s partnership with Trifacta were front-and-center at Strata. Amalgam believes that DataRobot’s capabilities are going to accelerate machine learning and looks forward to seeing how the latest round of funding will accelerate DataRobot’s abilities to bring predictive models into production applications and provide more business visibility to the context and biases of models.
Domino Data Lab – Domino Data is a strong example of how financial services are driving technology innovation. Founded by Bridgewater Associates’ alumni, Domino Data provides a central environment for developing and deploying models. By providing an optimized computing environment based on data science job, experiments, and deployment demands, Domino supports a self-service environment at the individual level that also serves as a system of record for analytic coding.
HPE – Amalgam was interested in seeing what HPE would be discussing after spinning off Vertica and other software assets to Micro Focus. At Strata, HPE discussed its Hadoop solution in context of the hybrid and diverse cloud that is now HPE’s primary focus. As HPE focuses on its focus of being a next-generation infrastructure provider, Amalgam looks forward to seeing HPE’s hardware and services focus on being vendor-neutral from a software and distribution perspective.
IBM – IBM announced an Integrated Analytics System at Strata which went GA on September 29th to support deployment of data science capabilities from IBM’s Data Science Experience and Apache Spark. Currently, the System works with Neteza, PureData System for Analytics, DB2 Warehouse on Cloud, and Hortonworks, among other data sources. Amalgam is looking forward to IBM’s integration of this System with z Systems, which will provide IBM with an additional differentiated performance advantage.
Informatica – Informatica announced a GDPR data lake management solution, Cloudera Altus PaaS support, and data catalog support with Hortonworks Atlas. But what got my attention was a discussion with Murthy Mathiprakasam, director of product marketing where we discussed the potential role that Informatica can play in serving all of the data management roles on a single view of the data. This capability to serve the entire data team was largely missing on the show floor and in demonstrations as a whole, as most solutions in data science and Big Data are still focused on a specific aspect of data management and are building out their partner ecosystem to support an end-to-end capability. Since Informatica shifted its brand and focus from being “The Data Integration Company” to the “Enterprise Cloud Data Management” company, the advances and acquisitions in Big Data, cloud, Data governance, Security, and Master Data have taken advantage of Informatica’s foundational understanding of data lineage, metadata, and sourcing and continue to establish Informatica as a key leader in all aspects of defining business data value.
Kogentix – Kogentix was one of the biggest surprises I encountered at Strata. Founded in 2015, Kogentix provides an end-to-end solution for developing big data and machine learning applications and has quickly grown to over 200 employees through organic growth. Through a combination of software and consulting services as well as a well-rationalized application delivery capability, Kogentix has quickly established itself as a rising star in the world of machine learning by tackling the fundamental problem of rapidly implementing transparent machine learning into production environments. Amalgam will be interested in seeing how and when Kogentix chooses to take funding and push their growth into overdrive.
Logtrust – Over the past year, Logtrust has evolved its fast data and hybrid data analytics value proposition to solve a foundational problem in the enterprise: how to effectively analyze combinations of traditional data, Big Data, and streaming data without bringing in developer resources. Logtrust’s investors seem to agree with the value proposition as the firm announced a $35 million Series B round. By taking conducting models on query without touching the original data, Logtrust can help businesses to look at all the data streams and sources that have been either discovered or monitored over the past few years. Amalgam believes that this new family of vendors conducting analytics on a variety of data sources without the need to model provide an important role for secure and scalable analysis.
Looker provided a view of Looker 5, which it originally announced at Looker JOIN in mid-September. Amalgam is working on a separate analysis of Looker 5, as we believe that this version demonstrates Looker’s continued corporate shift from rapid analytic solutions to rapid application solutions.
Mesosphere – Containerization is taking a sudden twist with Mesosphere’s recent announcement to support Kubernetes and provide increased choice to operations teams, a position that was emphasized at Strata. Analyst Tom Petrocelli will be looking more deeply at software container strategy in the next couple of months, but it is interesting to see Kubernetes and Mesosphere’s Marathon container orchestration options.
Micro Focus was both showing off its Voltage SecureData Cloud for AWS to secure hybrid IT as well as Vertica 9. It was interesting to see Vertica’s new branding and positioning as a hardware-neutral solution. Amalgam believes this will be a positive move for Vertica, which will no longer be hidden in the previous HPE “HAVEn” Big Data platform and will be able to resume its role as an in-database analytic and machine learning powerhouse, which was also emphasized in the Vertica 9 launch.
MicroStrategy was previewing its 10.9 version, which came out in GA on October 2nd. In walking through the version, Amalgam saw Dossiers, which provides MicroStrategy customers with an interactive format. In particular, the mobile-friendly interface, collaborative comments, and personalized library for users to manage their content got Amalgam’s attention. In addition, the data certification and MicroStrategy on AWS improvements show focus on both the cloud and peer-identified data. Wisely, MicroStrategy did not try to shove “machine learning” into its product just for the sake of doing so.
Paxata – As previously mentioned, Paxata and Alation announced a partnership at Strata Data. In addition, Paxata also announced improved support for Microsoft Azure and what it is calling “Intelligent Ingest” to automate the identification and transfer of data from any format to any other format. This any-to-any ingestion capability is an important step for data consumption, as data formats are still a surprisingly time-consuming bottleneck for data analysts seeking to understand basic trends.
Podium Data – At Strata, Podium Data announced Data Conductor, a capability to catalog, access, control, and prioritize data categories and metadata catalog within an ecosystem while removing data duplication. The net-net is all about making all enterprise data contextualized, integrated, and useful and represents Podium Data’s solution to providing all data in potentially analytic form based on the demand and need for all data. This capability is the second installment of a release coming out in three installments. The first installment focused on data identification, this installment focuses on cataloging, and there will be a third installment focused on optimizing cloud resources on any cloud environment.
Qubole – Qubole laucnched the Qubole Data Service, a set of autonomous and self-service data optimization capabilities are focused on bringing all of the data processing capabilities together, including load and transfer, reporting, streaming, and machine learning. Qubole differentiates on its automation and removing data scut work and right-sizing tasks from Big Data management. Also, Qubole announced it’s integration with Microsoft Azure as well.
SAP – At Strata Data, SAP was focused on speaking about its Data Hub for end-to-end data governance, which brings together data quality, data lineage, metadata definitions, workflow definition and execution, content management, and access. Pricing will be tiered and based on the number of systems and nodes involved. End-to-end lineage and governance is increasingly becoming a must-have capability to make Big Data functional, trusted, and analytic.
Snowflake – Over the past year, Amalgam has been unable to escape Snowflake, which has excelled at providing a cost-sensitive and highly functional cloud data warehouse. Snowflake had no immediate announcements at Strata, but Amalgam noted that every analytics and BI player of note at Strata mentioned some sort of partnership with Snowflake as it has quickly emerged as a cloud data store that enterprises must consider to support high-performance analytics in the cloud.
Tableau – In preparation for next week’s Tableau Conference, Amalgam was interested in seeing what Tableau was emphasizing. We saw the progress that Tableau made in end user data prioritization and in supporting a variety of visualization capabilities. However, the pre-TC announcements were fairly incremental in nature. Analysts are waiting to see updates in last year’s eagerly-hyped Project Maestro, focused on data prep, as well as Hyper data engine updates. The stock market has been kinder to Tableau this year as the go-to-market messaging and productization around cloud offerings have been well-received by the market at large. Tableau’s data discovery and visualization capabilities are still Best-in-Class capabilities, but industry analysts always want more and we eagerly look forward to seeing next week’s announcements.
Tamr – Tamr’s data unification capability has proven to be a unique differentiator in the enterprise data market. By handling the contradictory challenge of keeping data in place while providing a unified and cleansed version of data for enterprise analysis, Tamr has established its role in Big Data and the future of machine learning prep with business results including GE’s identification of hundreds of millions of dollars in savings through improved procurement and supply chain management.
Tellius – Tellius is a natural language-based analytics solution somewhat comparable to IBM Watson Analytics and Thoughtspot. What stood out for Amalgam was that Tellius was cheaper than Thoughtspot, combined internal and external data easily, and also supported some predictive analytics capabilities across the enterprise data supported. The combination of natural language, BI, visualization, and analytics is an impressive combination. Amalgam believes that Tellius’ current combination of capabilities will eventually be standard. For now, Tellius’ current offering in production gives this vendor a feature-based advantage over the majority of BI solutions currently in place in terms of translating natural language-based queries into analytic insights.
Trifacta – Trifacta announced integration with DataRobot, which will be helpful to both organizations in supporting a broader range of data preparation. Together, data can be prepared from source to analytic model. In addition, Deutsch Boerse announced a strategic investment in Trifacta during the show as well. Amalgam’s take is that the data prep market that emerged a couple of years has quickly fragmented and evolved as each vendor has found its niche in unlocking value in the data pipeline. Trifacta has successfully launched the “data wrangling” use case and both scaled and expanded its offering more quickly than its competitors.
As you can tell from this summary, 24 hours at a tradeshow are a busy time for an industry analyst. This blog only represents the vendors I spoke to and I know that there are a lot of companies that I didn’t visit, partially because I had already spoken with them prior to this month. But hopefully this summary helps show how the increasing need to support machine learning and develop end-to-end governed data pipelines is becoming increasingly important across the data and analytics ecosystem. If you’d like a deeper dive on any of these vendors or trends that showed up as attendees discussed their next steps for machine learning, please feel free to reach out to me or the rest of the Amalgam team at email@example.com.
(Note: DataRobot is a current Amalgam Insights client. Amalgam analysts have previously worked with Arcadia Data, IBM, Informatica, Looker, SAP, Trifacta)