arrow_back

Product Discovery - Ingesting Retail Catalog and User Event Data

Sign in Join
Get access to 700+ labs and courses

Product Discovery - Ingesting Retail Catalog and User Event Data

Lab 1 hour 30 minutes universal_currency_alt 5 Credits show_chart Introductory
info This lab may incorporate AI tools to support your learning.
Get access to 700+ labs and courses

Overview

The Cloud Retail service and the Retail API enables customers to build end-to-end personalized recommendation systems without requiring a high level of expertise in machine learning, recommendation system, or Google Cloud. In order to use the Retail API product recommendations and product search services you must create or import product catalog data and user event data related to that catalog.

In this lab, you will prepare an environment for the Retail Recommendations AI and Product Search services by uploading product catalog and user event data using a variety of techniques. You will explore some common data ingestion errors, and examine Retail catalog and event data using the Cloud Console and the Retail API.

This lab uses a subset of the Google Merchant Center dataset for the product catalog. Data exported directly from Google Merchant Center uses a schema that is not compatible with the Retail data ingestion API so the dataset used in the lab has been modified to conform to the Retail Schema.

Objectives

In this lab, you will learn how to complete the following tasks:

  • Enable the Retail API.
  • Import Product Catalog and User Event data from BigQuery and Cloud Storage.
  • Examine data import events and errors.
  • Examine Product Catalog and User Event Data.
  • Upload user event data using the Retail API.

Setup and requirements

Qwiklabs setup

For each lab, you get a new Google Cloud project and set of resources for a fixed time at no cost.

  1. Sign in to Qwiklabs using an incognito window.

  2. Note the lab's access time (for example, 1:15:00), and make sure you can finish within that time.
    There is no pause feature. You can restart if needed, but you have to start at the beginning.

  3. When ready, click Start lab.

  4. Note your lab credentials (Username and Password). You will use them to sign in to the Google Cloud Console.

  5. Click Open Google Console.

  6. Click Use another account and copy/paste credentials for this lab into the prompts.
    If you use other credentials, you'll receive errors or incur charges.

  7. Accept the terms and skip the recovery resource page.

Start Cloud Shell

While in Google Cloud you can operate Google Cloud remotely from your own machine. This lab uses both the Google Cloud Console and the Cloud Shell, a command line environment running in Google Cloud.

  1. From the Cloud Console, click Activate Cloud Shell.

    Activate Cloud Shell icon highlighted

    Note: If you've never started Cloud Shell before, you are presented with an intermediate screen describing what it is. If that's the case, click Continue and you won't ever see it again.

    Here's what that one-time screen looks like:

    Cloud Shell dialog box

    It should only take a few moments to provision and connect to Cloud Shell.

    Cloud Shell provides you with terminal access to a virtual machine hosted in the cloud. The virtual machine includes all the development tools that you'll need. It offers a persistent 5GB home directory and runs in Google Cloud, greatly enhancing network performance and authentication. Much, if not all, of your work in this lab can be done through the Cloud Console and Cloud Shell using only a browser.

    Once connected to Cloud Shell, you should see that you are already authenticated and that the project is already set to your project ID.

  2. Run the following command in Cloud Shell to confirm that you are authenticated:

    gcloud auth list

    Output:

    Credentialed Accounts ACTIVE: * ACCOUNT: {{{user_0.username| Lab User Name}}}
  3. To set the active account, run:

    gcloud config set account {{{user_0.username| Lab User Name}}} Note: The gcloud command-line tool is the powerful and unified command-line tool in Google Cloud. It comes preinstalled in Cloud Shell. Among its features, gcloud offers tab completion in the shell. For more information, refer to the gcloud CLI overview guide.
  4. Run the following command to confirm that you are using the correct project for this lab:

    gcloud config list project

    Output:

    [core] project = {{{project_0.project_id | Project ID}}}
  5. If the correct project is not listed, you can set it with this command:

    gcloud config set project {{{project_0.project_id| Project ID}}}

    Output:

    Updated property [core/project].

Task 1. Enable the Retail API

Before you can begin using the Retail Recommendations AI or Retail Search APIs, you must enable the Retail API.

  1. On the Navigation menu (Navigation menu), click View All Products under the Artificial Intelligence section select Search for Retail.

  2. Click Turn On API.

  3. Click Continue and accept the data terms by clicking Accept.

  4. Click Get Started.

Task 2. Import product catalog and user event data

In this task, you will import product catalog data from BigQuery and user event data from Cloud Storage.

Import Merchant Center products table schema data from BigQuery

The merchant_center.products table contains catalog data that has been exported from a test account on Google Merchant Center using the Google Merchant Center products table schema. This data set can be imported as catalog data using the older Recommendations AI Console or API. The Retail API, that replaces the Recommendations AI API, does not currently support importing data that uses the Merchant Center products table schema and all data imports must use the Retail Schema. You will still try to import this data using the Retail API in order to see how to inspect data import errors.

  1. In the GCP Console, click Search for Retail > Data to open the Retail Data management page.

  2. Make sure the Catalog tab is selected and click Import.

  3. Configure the import parameters as follows to import the product catalog:

    • For Import type, select Product Catalog
    • For Source of data, select BigQuery
    • For Import Branch, select Branch 0
  4. For Big Query table, click Browse.

  5. Enter products in the search box and click Search.

  6. Select the radio button for products - Dataset: merchant_center table.

You are unable to proceed as the source table does not have an id field.

There are many more issues with the data because of the schema.

Import Retail products schema data from BigQuery

In this task, import product data into the catalog from a BigQuery table that uses the Retail products schema.

  1. In the GCP Console, on the Navigation menu (Navigation menu), click View All Products under the Artificial Intelligence section select Search for Retail > Data to open the Retail Data management page.

  2. Make sure the Catalog tab is selected and click Import.

  3. Configure the import parameters as follows to import the product catalog:

    • For Import type, select Product Catalog
    • For Import Branch, select Branch 0
    • For Source of data, select BigQuery
  4. For Big Query table, click Browse.

  5. Enter products in the search box and click Search.

  6. Select the radio button for products - Dataset: retail table.

  7. Click Select.

Note: If you click the table name you will open the Data Catalog page and will need to return to the Retail products import page.
  1. Click Import.

You need to wait for a pop-up message to appear with a message similar to the following:

Successfully scheduled import operation import-products-6583047802807380211. It may take up to 5 minutes to see your new long running operation in the Integration Activity panel.

When the import task is scheduled you will also see the details of a gcloud scheduler command displayed that you can use to schedule a regular data import task.

  1. Click Cancel to close the import page and return to the Retail Data page to check the status of your catalog data import task.

  2. Click X to close the popup that appeared to tell you that the import operation was successfully scheduled.

  3. In the Search for Retail navigation menu, click Data and then click Activity Status to monitor the progress of the import task.

The import task will take a minute or two for the status of the import task in the Product catalog import activity section to change to Succeeded. A total of 1268 items will have been imported.

Import user event data from Cloud Storage

In this task, import user event data from a BigQuery table.

  1. In the GCP Console, on the Navigation menu (Navigation menu), click View All Products under the Artificial Intelligence section select Search for Retail > Data to open the Retail Data management page.

  2. Make sure the Events tab is selected and click Import.

  3. Configure the import parameters as follows to import the product catalog:

    • For Import type, select User Events
    • For Source of data, select Google Cloud Storage
  4. For Google Cloud Storage location, click the Browse button.

  5. Navigate to the storage bucket called and select the file recent_retail_events.json.

  6. Click the Filename to make sure it is selected.

  7. Click Select.

  8. Click Import.

You need to wait for a pop-up message to appear with a message similar to the following:

Successfully scheduled import operation import-products-6583047802807380211. It may take up to 5 minutes to see your new long running operation in the Integration Activity panel

When the import task is scheduled you will also see the details of a gcloud scheduler command displayed that you can use to schedule a regular event import task.

  1. Wait for the import task to be scheduled with the gcloud scheduler command displayed.

  2. Click Cancel to close the import page and return to the Retail Data page to check the status of your event data import task.

  3. Click X to close the popup that appeared to tell you that the import operation was successfully scheduled.

  4. In the Search for Retail navigation menu, click Data and then click Activity Status to monitor the progress of the import task.

The import task will take a minute or two for the status of the import task in the User events import activity section to change to Succeeded. Approximately 32,000 items will have been imported and 5 items will have failed.

Task 3. Examine data import events and errors

In this task, you will examine the data import jobs and explore some of the errors logged by the import tasks when invalid data is encountered.

  1. In the Search for Retail navigation menu, click Data and then click Activity Status to monitor the progress of the import task.

  2. Click the User Events tab and then click the View full error logs in the Detail column to examine the errors.

This will open the /error folder in the Cloud Storage bucket where your source data was located.

  1. Click the name of the file that corresponds to the event data file you imported. It will be about 1 kilobyte in size.

  2. Click Download to download the file and then open it on your machine to examine the error details. You will see five events that failed to import due to a variety of issues with the data schema in those events.

{ "code": 3, "message": "'userEvent.productDetails' is required for eventType add-to-cart.", "details": [{ "@type": "type.googleapis.com/google.protobuf.Struct", "value": { "line_number": 475 } }] } { "code": 3, "message": "link: Cannot find field.", "details": [{ "@type": "type.googleapis.com/google.protobuf.Struct", "value": { "line_number": 478 } }] }
  1. Return to the Cloud Console and close the Cloud Storage tab.

  2. Open the Retail Activity Status tab, and click Close to close the Activity Status pop-out.

Task 4. Examine Product Catalog and User Event Data

In this task, you will examine the product and event data that you have imported.

  1. In the Search for Retail navigation menu, click Data and then make sure the Catalog tab is selected.

  2. for Branch Name, leave the branch set to Branch 0 (Default).

  3. The Catalog product list displays the 1268 product records that were uploaded to the catalog, of which 746 are in stock.

  4. For Filter, enter GGOEGCBT136699.

This displays the product record for the Google Yellow YoYo. Note that the product is out of stock.

  1. Click the Link icon to try to open the link. The page that is opened says that Sorry, This Page is Not Available.

  2. Close the product tab that was opened to return to the Search for Retail Data page.

  3. For Filter, enter GGOECAEB163612.

This displays the product record for the Google Black Cloud Tee. Note that this product is in stock.

  1. Click the Link icon to open the link. The product page on the Google Merchandise store opens.

  2. Close the product tab that was opened to return to the Search for Retail Data page.

Task 5. Upload user event data using the Retail API

You will now use curl and other command line utilities to make calls to the Retail Recommendations AI API to explore how to make requests, get recommendations and then filter and refine the results.

Create an IAM service account to authenticate requests

  1. Create an environment variable to store the Project ID:

    export PROJECT_ID=$(gcloud config get-value core/project)
  2. Create an IAM service account for controlled access to the Retail API:

    export SA_NAME="retail-service-account" gcloud iam service-accounts create $SA_NAME --display-name $SA_NAME
  3. Bind the service account to the Retail Editor IAM role:

    gcloud projects add-iam-policy-binding ${PROJECT_ID} \ --member="serviceAccount:$SA_NAME@${PROJECT_ID}.iam.gserviceaccount.com" \ --role="roles/retail.editor"

Allow the lab user account to use impersonation with the new service account

Creating a role binding on the service account for the lab user with the Service Account Token Creator role allows the lab user to use service account impersonation to safely generate limited duration authentication tokens for the service account. These tokens can then be used to interactively test access to APIs and services.

  1. Create a role binding on the Retail API service account for your user account to permit impersonation:

    export USER_ACCOUNT=$(gcloud config list --format 'value(core.account)') gcloud iam service-accounts add-iam-policy-binding $SA_NAME@$PROJECT_ID.iam.gserviceaccount.com --member "user:$USER_ACCOUNT" --role roles/iam.serviceAccountTokenCreator
  2. Generate a temporary access token for the Retail API:

    export ACCESS_TOKEN=$(gcloud auth print-access-token --impersonate-service-account $SA_NAME@$PROJECT_ID.iam.gserviceaccount.com )
This command may fail as it can take up to 10 minutes for the Service Account Token Creator role to propagate. Retry this command after 1 minute if it fails, and retry until it succeeds. You will also see a warning informing you that the command is using impersonation. This is expected.

Submit a user event to the Retail API

You will upload a sample user event to the Retail API by passing user event data in JSON format to the Retail API userEvents:write method.

  1. Store sample user event JSON data in an environment variable:

    DATA='{ "eventType": "detail-page-view", "visitorId": "GA1.3.1260529204.1622654859", "productDetails": [{ "product": { "id": "GGOEGDHB163199" } }, { "product": { "id": "GGOEAAKQ137410" } } ] }'
  2. Store the REST API URL for writing user event data to your catalog using the Retail API userEvents.write method in an environment variable:

    URL="https://retail.googleapis.com/v2/projects/${PROJECT_ID}/locations/global/catalogs/default_catalog/userEvents:write?access_token=${ACCESS_TOKEN}"

    This is the REST API URL for writing user event data to the Retail API. Note that the URL includes bash environment variable substitutions for the Project ID and the access token inline parameter called access_token. This token authenticates the request using the service account you generate previously using impersonation.

  3. Upload a user event using the REST API using curl:

    curl -H 'Content-Type: application/json' -X POST -d "${DATA}" $URL

    You have used curl to call the userEvents:write method passing the event data as a JSON data payload in the POST request.

    { "eventType": "detail-page-view", "visitorId": "GA1.3.1260529204.1622654859", "eventTime": "2021-06-28T18:39:26.691324Z", "productDetails": [ { "product": { "name": "projects/610724409905/locations/global/catalogs/default_catalog/branches/0/products/GGOEGDHB163199", "id": "GGOEGDHB163199", "type": "PRIMARY", "primaryProductId": "GGOEGDHB163199", "categories": [ "Drinkware" ], "title": "Google Chrome Dino Light Up Water Bottle", "priceInfo": { "currencyCode": "USD", "price": 24 }, "availability": "IN_STOCK", "uri": "https://shop.googlemerchandisestore.com/Google+Redesign/Google+Chrome+Dino+Light+Up+Water+Bottle", "images": [ { "uri": "https://shop.googlemerchandisestore.com/store/20160512512/assets/items/images/GGOEGDHB163199.jpg" } ] } }, { "product": { "name": "projects/610724409905/locations/global/catalogs/default_catalog/branches/0/products/GGOEAAKQ137410", "id": "GGOEAAKQ137410", "type": "PRIMARY", "primaryProductId": "GGOEAAKQ137410", "categories": [ "Apparel" ], "title": "Android Iconic Sock", "priceInfo": { "currencyCode": "USD", "price": 17 }, "availability": "IN_STOCK", "uri": "https://shop.googlemerchandisestore.com/Google+Redesign/Apparel/Android+Iconic+Sock", "images": [ { "uri": "https://shop.googlemerchandisestore.com/store/20160512512/assets/items/images/GGOEAAKQ137410.jpg" } ] } } ] }

This will display the formatted data in the response if the upload was successful, including the associated product data such as the product url and image, as well as the event time stamp. Otherwise you will see an error. Most common errors will relate to either a malformed payload, URL, or invalid token.

Congratulations!

Congratulations, you've successfully imported Retail product catalog and user event data using a variety of techniques and explored some common data ingestion errors, and examined Retail catalog and event data using the Cloud Console and the Retail API.

End your lab

When you have completed your lab, click End Lab. Google Cloud Skills Boost removes the resources you’ve used and cleans the account for you.

You will be given an opportunity to rate the lab experience. Select the applicable number of stars, type a comment, and then click Submit.

The number of stars indicates the following:

  • 1 star = Very dissatisfied
  • 2 stars = Dissatisfied
  • 3 stars = Neutral
  • 4 stars = Satisfied
  • 5 stars = Very satisfied

You can close the dialog box if you don't want to provide feedback.

For feedback, suggestions, or corrections, please use the Support tab.

Before you begin

  1. Labs create a Google Cloud project and resources for a fixed time
  2. Labs have a time limit and no pause feature. If you end the lab, you'll have to restart from the beginning.
  3. On the top left of your screen, click Start lab to begin

Use private browsing

  1. Copy the provided Username and Password for the lab
  2. Click Open console in private mode

Sign in to the Console

  1. Sign in using your lab credentials. Using other credentials might cause errors or incur charges.
  2. Accept the terms, and skip the recovery resource page
  3. Don't click End lab unless you've finished the lab or want to restart it, as it will clear your work and remove the project

This content is not currently available

We will notify you via email when it becomes available

Great!

We will contact you via email if it becomes available

One lab at a time

Confirm to end all existing labs and start this one

Use private browsing to run the lab

Use an Incognito or private browser window to run this lab. This prevents any conflicts between your personal account and the Student account, which may cause extra charges incurred to your personal account.