Last week at Google Next ‘18, Google announced a new beta capability in their BigQuery cloud data warehouse: BigQuery ML, which lets data analysts apply simple machine learning models to data residing in BigQuery data warehouses.
Data analysts know databases and SQL, but generally don’t have a lot of experience in building machine learning models using Python or R. An additional issue is the expense, time-consumption, and possible regulatory violations of moving data out of storage in order to send it through machine learning models. BigQuery ML aims to address these problems by letting data analysts push data through linear regression models (to predict a numeric value) or binary logistic regression models (to classify a value into one of two categories, such as “high” or “low”), using simple extensions of SQL on Google databases, run in place.
Though BigQuery ML is in beta (which has a flexible definition given that this is Google), and it is currently limited to just the two predictive model types mentioned, this covers common business queries, such as predicting the cost of something or the likelihood of customer churn. Google Cloud may be the underdog in the cloud storage market compared to AWS or Azure, but for companies and departments trying to do relatively simple modeling of the scenarios mentioned above, BigQuery ML puts a fair bit of power in the hands of data analysts.
Recommendations for Organizations Considering BigQuery ML
If you’re interested in testing out BigQuery ML, you’ll first need data stored in Google BigQuery on Google Cloud. It is accessible on the Google Cloud Platform Free tier, but charges will apply beyond specific storage and processing limits. Once your data has been loaded into BigQuery, you can access it and process it through BigQuery’s web interface, via the command line, or via BigQuery’s REST API. It can also be accessed via external tools such as Jupyter notebooks or BI platforms. Looker has already announced its integration with BigQuery ML, and that it looks forward to integrating it into Looker Blocks. Look for other Google Cloud Platform partners to add similar functionality in the near future.
If you’re a data analyst looking to try out BigQuery ML, you’ll want to brush up on your statistics knowledge of linear regression and (binary) logistic regression to understand the results of your models; it’s time to dust off your stats textbook.
Finally, BigQuery ML is in beta – given that Google has put up warnings on the documentation pages that this product “might be changed in backward-incompatible ways and is not subject to any SLA or deprecation policy,” treat it as the testing ground that it is and don’t put it into production just yet.
[…] partners are likely to follow suit, though again, the capability remains in beta. I provide recommendations for organizations considering testing out the BigQuery ML capabilities in an earlier […]
[…] on making this one specific machine learning tool accessible to non-data-scientists reminded me of Google’s BigQuery ML initiative – take one very specific type of machine learning query, and operationalize it for easy […]