Why is a Data Science and Machine Learning Analyst at Elastic’s road show when they’re best known for search? In early September, Amalgam Insights attended Elastic{ON} in Boston, MA. Prior to the show, my understanding of Elastic was that they were primarily a search engine company. Still, the inclusion of a deep dive into machine learning interested me, and I was also curious to learn more about their security analytics, which were heavily emphasized in the agenda.
In exploring Elastic’s machine learning capabilities, I got a deep dive with Rich Collier, the Senior Principal Solutions Architect and Machine Learning specialist. Elastic acquired Prelert, an incident management company with unsupervised machine learning capabilities in September 2016 with the goal of incorporating real-time behavioral analytics into the Elastic Stack; in the interim two years, integrating Prelert has grown Elastic’s abilities to act on time-series data anomalies found in the Elasticsearch data store, offering an extension to the Elastic Stack called “Machine Learning” as part of their Platinum-level SaaS offerings.
Elastic Machine Learning users no longer have to define rules to identify abnormal time-series data, nor do they even need to code their own models – the Machine Learning extension analyzes the data to understand what “normal” looks like in that context, including what kind of shifts can be expected over different periods of time from point-to-point all the way up to seasonal patterns. From that, it learns when to throw an alert on encountering atypical data in real time, whether that data is log data, metrics, analytics, or a sudden upsurge in search requests for “$TSLA.” Learning from the data rather than configuring blunt rules makes for a more granular precision that reduces alerts on false positives on anomalous data.
The configuration for the Machine Learning extension is simple and requires no coding experience; front-line business analysts can customize the settings via pull-down menus and other graphical form fields to suit their needs. To simplify the setup process even further, Elastic offers a list of “machine learning recipes” on their website for several common use cases in IT operations and security analytics; given how graphically oriented the Elastic stack is, I wouldn’t be surprised to see these “recipes” implemented as default configuration options in the future. Doing so would simplify the configuration from several minutes of tweaking individual settings to selecting a common profile in a click or two.
Elastic also stated that one long-term goal is to “operationalize data science for everyone.” At the moment, that’s a fairly audacious claim for a data science platform, let alone a company best known for search and search analytics. One relevant initiative Kearns mentioned in the keynote was the debut of the Elastic Common Schema, a common set of fields for ingesting data into Elasticsearch. Standardizing data collection makes it easier to correlate and analyze data through these relational touch points, and opens up the potential for data science initiatives in the future, possibly through partnerships or acquisitions. But they’re not trying to be a general-purpose data science company right now; they’re building on their core of search, logging, security, and analytics; machine learning ventures are likely to fall within this context. Currently, that offering is anomaly detection on time series data.
Providing users who aren’t data scientists with the ability to do anomaly detection on time series data may be just one step in one category of data modeling, but having that sort of specialized tool accessible to data and business analysts would help organizations better understand the “periodicity” of typical data. Retailers could track peaks and valleys in sales data to understand purchasing patterns, for example, while security analysts could focus on responding to anomalies without having to define what anomalous data looks like ahead of time as a collection of rules.
Elastic’s focus on making this one specific machine learning tool accessible to non-data-scientists reminded me of Google’s BigQuery ML initiative – take one very specific type of machine learning query, and operationalize it for easy use by data and business analysts to address common business queries. Then, once they’ve perfected that tool, they’ll move onto building the next one.
Improving the quality of data acquired and stored in Elasticsearch from search results will be key to improving on the user experience. I spoke with Steve Kearns, the Senior Director of Product Management at Elastic, who delivered the keynote speech with a sharp focus on “scale, speed, and relevance” for improving search results. Better search data can be used to optimize machine learning applied to that data. With how Elastic has created the Machine Learning extension focused on anomaly detection across time series data – data Elasticsearch specializes in collecting, such as log data – this can support more accurate data analysis and better business results for data-driven organizations.
Overall, It was intriguing to see how machine learning is being incorporated into IT solutions that aren’t directly supporting data science environments. Enabling growth in the use of machine learning tactics effectively spreads the use of data across an organization, bringing companies closer to the advantages of the data-driven ideal. Elastic’s Machine Learning capability potentially opens up a specific class of machine learning for a broader spectrum of Elastic users without requiring them to acquire coding and statistics backgrounds; this positions Elastic as a provider of a specific type of machine learning services in the present, and makes it more plausible to consider them as a company for providing machine learning services in the future.