arrow_back

Predict Visitor Purchases with BigQuery ML

Get access to 700+ labs and courses

Predict Visitor Purchases with BigQuery ML

Lab 1 година 30 годин universal_currency_alt 5 кредитів show_chart Початковий
info This lab may incorporate AI tools to support your learning.
Get access to 700+ labs and courses

Overview

BigQuery Machine Learning (BigQuery ML) enables users to create and execute machine learning models in BigQuery using SQL queries. The goal is to democratise machine learning by enabling SQL practitioners to build models using their existing tools and to increase development speed by eliminating the need for data movement.

There is an ecommerce dataset that has millions of Google Analytics records for the Google Merchandise Store loaded into BigQuery. In this lab you will use this data to create a model that predicts whether a visitor will make a transaction.

What you'll learn

How to create, evaluate and use machine learning models in BigQuery

What you'll need

  • A Browser, such as Chrome or Firefox
  • Basic knowledge of SQL or BigQuery

Setup and requirements

For each lab, you get a new Google Cloud project and set of resources for a fixed time at no cost.

  1. Sign in to Qwiklabs using an incognito window.

  2. Note the lab's access time (for example, 1:15:00), and make sure you can finish within that time.
    There is no pause feature. You can restart if needed, but you have to start at the beginning.

  3. When ready, click Start lab.

  4. Note your lab credentials (Username and Password). You will use them to sign in to the Google Cloud Console.

  5. Click Open Google Console.

  6. Click Use another account and copy/paste credentials for this lab into the prompts.
    If you use other credentials, you'll receive errors or incur charges.

  7. Accept the terms and skip the recovery resource page.

Open BigQuery Console

  1. In the Google Cloud Console, select Navigation menu > BigQuery.

The Welcome to BigQuery in the Cloud Console message box opens. This message box provides a link to the quickstart guide and lists UI updates.

  1. Click Done.

Task 1. Create a dataset

  1. Create a new dataset within your project by clicking the three dots next to your project ID in the Explorer section, then clicking on Create dataset.

The Create dataset dialog opens.

  1. Enter bqml_lab for Dataset ID, and click on CREATE DATASET (accepting the other default values).

Task 2. Explore the data

The data we will use in this lab sits in the bigquery-public-data project, that is available to all. Let's take a look at a sample of this data.

  1. Add the query to Untitled query box, and click the Run button.
#standardSQL SELECT IF(totals.transactions IS NULL, 0, 1) AS label, IFNULL(device.operatingSystem, "") AS os, device.isMobile AS is_mobile, IFNULL(geoNetwork.country, "") AS country, IFNULL(totals.pageviews, 0) AS pageviews FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*` WHERE _TABLE_SUFFIX BETWEEN '20160801' AND '20170631' LIMIT 10000;

The data tables have a lot of columns, but there are few of interest to us that we will use to create our ML model. Here the visitor's device's operating system is used, whether said device is a mobile device, the visitor's country or region and the number of page views will be used as the criteria for whether a transaction has been made. In this case, label is what you're trying to fit to (or predict).

This data will be the training data for the ML model you create. The training data is being limited to those collected from 1 August 2016 to 31 June 2017. This is done to save the last month of data for "prediction". It is further limited to 10,000 data points to save some time.

  1. Let's save this as the training data. Click on the Save and then select Save view from the dropdown to save this query as a view. In the popup, select Dataset as bqml_lab and type training_data as the Table Name and click Save.

Task 3. Create a model

  • Now replace the query with the following to create a model to predict whether a visitor will make a transaction:
#standardSQL CREATE OR REPLACE MODEL `bqml_lab.sample_model` OPTIONS(model_type='logistic_reg') AS SELECT * from `bqml_lab.training_data`;

In this case, bqml_lab is the name of the dataset, sample_model is the name of the model, training_data is the transactions data we looked at in the previous task. The model type specified is binary logistic regression.

Running the CREATE MODEL command creates a Query Job that will run asynchronously so you can, for example, close or refresh the BigQuery UI window.

[Optional] Model information & training statistics

If interested, you can get information about the model by clicking on bqml_lab dataset on the left-hand menu and then click the sample_model dataset in the UI. Under Details, you should find some basic model info and training options used to produce the model. Under Training, you should see a table similar to this:

A six-column table with 11 rows of data relating to iteration, data loss, learn rate, completion time

Task 4. Evaluate the Model

  • Now replace the query with the following:
#standardSQL SELECT * FROM ml.EVALUATE(MODEL `bqml_lab.sample_model`);

In this query, you use the ml.EVALUATE function to evaluate the predicted values against the actual data, and it shares some metrics of how the model performed. You should see a table similar to this:

Results table

Task 5. Use the model

  1. Now click SQL query and run the below query:
#standardSQL SELECT IF(totals.transactions IS NULL, 0, 1) AS label, IFNULL(device.operatingSystem, "") AS os, device.isMobile AS is_mobile, IFNULL(geoNetwork.country, "") AS country, IFNULL(totals.pageviews, 0) AS pageviews, fullVisitorId FROM `bigquery-public-data.google_analytics_sample.ga_sessions_*` WHERE _TABLE_SUFFIX BETWEEN '20170701' AND '20170801';

You'll realize the SELECT and FROM portions of the query is similar to that used to generate training data. There is the additional fullVisitorId column which you will use for predicting transactions by individual user.The WHERE portion reflects the change in time frame (July 1 to August 1 2017).

  1. Let's save this July data so we can use it in the next 2 steps to make predictions using our model. Click on the Save and then select Save view from the dropdown to save this query as a view. In the popup, select Dataset as bqml_lab and type july_data as the Table Name and click Save.

  2. Predict purchases per country/region

With this query you will try to predict the number of transactions made by visitors of each country or region, sort the results, and select the top 10 by purchases:

#standardSQL SELECT country, SUM(predicted_label) as total_predicted_purchases FROM ml.PREDICT(MODEL `bqml_lab.sample_model`, ( SELECT * FROM `bqml_lab.july_data`)) GROUP BY country ORDER BY total_predicted_purchases DESC LIMIT 10;

In this query, you're using ml.PREDICT and the BigQuery ML portion of the query is wrapped with standard SQL commands. For this lab you''re interested in the country and the sum of purchases for each country, so that's why SELECT, GROUP BY and ORDER BY. LIMIT is used to ensure you only get the top 10 results.

You should see a table similar to this:

A three column table with 10 rows of data relating to total predicted purchases per country

  1. Predict purchases per user

Here is another example. This time you will try to predict the number of transactions each visitor makes, sort the results, and select the top 10 visitors by transactions:

#standardSQL SELECT fullVisitorId, SUM(predicted_label) as total_predicted_purchases FROM ml.PREDICT(MODEL `bqml_lab.sample_model`, ( SELECT * FROM `bqml_lab.july_data`)) GROUP BY fullVisitorId ORDER BY total_predicted_purchases DESC LIMIT 10;

You should see a table similar to this:

A three column table with 10 rows of data relating to total predicted purchases per full visitor ID

End your lab

When you have completed your lab, click End Lab. Google Cloud Skills Boost removes the resources you’ve used and cleans the account for you.

You will be given an opportunity to rate the lab experience. Select the applicable number of stars, type a comment, and then click Submit.

The number of stars indicates the following:

  • 1 star = Very dissatisfied
  • 2 stars = Dissatisfied
  • 3 stars = Neutral
  • 4 stars = Satisfied
  • 5 stars = Very satisfied

You can close the dialog box if you don't want to provide feedback.

For feedback, suggestions, or corrections, please use the Support tab.

Copyright 2022 Google LLC All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.

Before you begin

  1. Labs create a Google Cloud project and resources for a fixed time
  2. Labs have a time limit and no pause feature. If you end the lab, you'll have to restart from the beginning.
  3. On the top left of your screen, click Start lab to begin

Use private browsing

  1. Copy the provided Username and Password for the lab
  2. Click Open console in private mode

Sign in to the Console

  1. Sign in using your lab credentials. Using other credentials might cause errors or incur charges.
  2. Accept the terms, and skip the recovery resource page
  3. Don't click End lab unless you've finished the lab or want to restart it, as it will clear your work and remove the project

This content is not currently available

We will notify you via email when it becomes available

Great!

We will contact you via email if it becomes available

One lab at a time

Confirm to end all existing labs and start this one

Use private browsing to run the lab

Use an Incognito or private browser window to run this lab. This prevents any conflicts between your personal account and the Student account, which may cause extra charges incurred to your personal account.