On August 15, 2018, Oracle announced the availability of GraphPipe, a network protocol designed to transmit machine learning data between remote processes in a standardized manner, with the goal of simplifying the machine learning model deployment process. The spec is now available on Oracle’s GitHub, along with clients and servers that have implemented the spec for Python and Go (with a Java client soon to come); and a TensorFlow plugin that allows remote models to be included inside TensorFlow graphs.
Oracle’s goal with GraphPipe is to standardize the process of model deployment regardless of the frameworks utilized in the model creation stage.
Deploying the machine learning models that data scientists produce continues to be a sluggish process. Most frameworks provide JSON or protocol buffers as their deployment options, but each of these options has a drawback. JSON isn’t sufficiently performant under heavy demand, such as for custom applications requiring realtime data. Protocol buffers perform nearly as well as the flat buffers that GraphPipe uses, but the primary deployment tool using protocol buffers right now is TensorFlow-serving. Since TensorFlow-serving primarily supports TensorFlow models, this limits its applicability, and adds the complex overhead of TensorFlow to the deployment process. If your model requires the use of multiple frameworks, the complexity of extracting that model into production multiplies, often demanding that custom code be written. GraphPipe aims to address these challenges by making the transmission of data more efficient while also permitting custom inputs and outputs to and from a given model. Oracle’s goal with GraphPipe is to standardize the process of model deployment regardless of the frameworks utilized in the model creation stage.
With the acquisition of DataScience.com earlier this year, Oracle now has a vested interest in ensuring that their customers have a “power user” option for model deployment. Enterprises develop in unique ways such that their software requirements demand personalized configuration and customization above and beyond that of smaller businesses, who can adapt more easily to out-of-the-box software. Implementing the GraphPipe protocol into machine learning model production pipelines offers larger organizations flexibility in deploying and querying machine learning models at speed.
Oracle also has a long history of participating in the open source community – their employees make contributions to household names like Java, MySQL, and Kubernetes, along with newer projects such as GraalVM and Wookiee. Numerous Oracle business services have been built around these works to monetize them, which ensures Oracle plays a strong role in determining future innovation across a wide array of technologies. Even with this being an initial release in a still-experimental field, it makes sense for Oracle to dedicate resources to expediting machine learning model deployment across the board.
Organizations considering data science initiatives need to think about how they are going to get their machine learning models out into production. For data science to be useful to your organization, eventually, models will need to be deployed. What does your current end-to-end data science workflow look like? (Do you have one, or are your models still stuck behind various obstacles to useful deployment?) How standardized is your model deployment process – and what constraints do you have in the form of preferred inputs (the frameworks your data scientists are using) and desired outputs (reports, dashboards, APIs, services, and embedded apps that may or may not exist yet)? Companies that want to use machine learning more widely need to standardize their organization’s model deployment process in order to scale their machine learning efforts efficiently.
While GraphPipe marks an important step for Oracle into model deployment, Oracle is far from the only enterprise staking out territory in the competitive machine learning space; other companies are likely to recognize the opportunity in the relatively neglected model deployment process in the market, and follow suit. Even so, this initial release is limited to clients and servers in Python and Go (with a Java server coming soon), which limits the immediate rollout to those with eventual endpoints coded in Python and Go. With that in mind, if your organization is putting models into production in Python, Go, or Java, or seeking to do so in the near future, data scientists and software engineers in such organizations should set aside time to test GraphPipe in the next quarter or so. The pressure to bring models into production to drive business results is only going to increase as line of business executives expect to see results. In addition to testing GraphPipe, all data scientists and software engineers tasked with deploying models should look for upcoming opportunities to test other power options seeking to make machine learning more pervasive throughout your organization.