Anleitung und Anforderungen für Lab-Einrichtung
Schützen Sie Ihr Konto und Ihren Fortschritt. Verwenden Sie immer den privaten Modus und Lab-Anmeldedaten, um dieses Lab auszuführen.

Vector search with BigQuery

Lab 45 Minuten universal_currency_alt 5 Guthabenpunkte show_chart Einsteiger
info Dieses Lab kann KI-Tools enthalten, die den Lernprozess unterstützen.
Dieser Inhalt ist noch nicht für Mobilgeräte optimiert.
Die Lernumgebung funktioniert am besten, wenn Sie auf einem Computer über einen per E‑Mail gesendeten Link darauf zugreifen.

Overview

BigQuery Vector Search Vector search allows you to find the most similar items in your dataset by comparing the mathematical representations of their features, known as embeddings, rather than relying on exact keyword matches.

Have you ever been shopping online and been impressed by how a website can recommend products that are incredibly similar to what you're searching for, even if you can't quite describe them with words? This is often powered by vector search, a technique that goes beyond simple keyword matching. Instead of just searching for text, vector search analyzes the underlying features and characteristics of an item, allowing it to find things that are conceptually or visually similar. In this lab, you'll get hands-on experience using the powerful and scalable vector search capabilities built directly into BigQuery.

For a global retailer like Cymbal E-commerce, this technology is a game-changer for customer experience. Imagine a customer searching for a "lightweight jacket for hiking." A traditional keyword search might miss the perfect product if its description uses "windbreaker" instead of "jacket." With vector search, Cymbal can analyze embeddings—numerical representations of product images and descriptions—to return a list of all relevant outerwear, regardless of the exact terminology. This helps customers discover the products they want faster, making them happier and boosting sales. Let's dive in and build this for Cymbal!

In this lab, you learn how to use BigQuery to perform vector searches.

Note: In this lab, you'll use a public database of patents instead of Cymbal's proprietary product listings and detailed descriptions for privacy reasons. The process is the same; you only need to change the dataset, table, and column names to match your source data.

What you'll do

  • Use an ML model to generate embeddings
  • Create a vector index
  • Use the VECTOR_SEARCH function against the embeddings you created

Setup and requirements

Before you click the Start Lab button

Read these instructions. Labs are timed and you cannot pause them. The timer, which starts when you click Start Lab, shows how long Google Cloud resources will be made available to you.

This hands-on lab lets you do the lab activities yourself in a real cloud environment, not in a simulation or demo environment. It does so by giving you new, temporary credentials that you use to sign in and access Google Cloud for the duration of the lab.

What you need

To complete this lab, you need:

  • Access to a standard internet browser (Chrome browser recommended).
  • Time to complete the lab.
Note: If you have a personal Google Cloud account or project, do not use it for this lab. Note: If you are using a Pixelbook, open an Incognito window to run this lab.

Log in to Google Cloud Console

  1. Using the browser tab or window you are using for this lab session, copy the Username from the Connection Details panel and click the Open Google Console button.
Note: If you are asked to choose an account, click Use another account.
  1. Paste in the Username, and then the Password as prompted.
  2. Click Next.
  3. Accept the terms and conditions.

Since this is a temporary account, which will last only as long as this lab:

  • Do not add recovery options
  • Do not sign up for free trials
  1. Once the console opens, view the list of services by clicking the Navigation menu (Navigation menu icon) at the top-left.

Navigation menu

Verify or enable required APIs

  1. In the Google Cloud Console, enter BigQuery API in the top search bar.

  2. Click on the result for BigQuery API under Marketplace.

  3. If the API is not already enabled, click Enable to enable the API.

  4. Repeat steps 1-3 for BigQuery Connection API and again for Agent Platform API.

Task 1. Create a remote model for text embedding generation

In this task, you create a Agent Platform text embedding generation model that is required for a vector search and to create the embeddings for the database.

Create an AI model

  1. In the Google Cloud Console Navigation menu (Navigation menu), navigate to BigQuery > Studio.

  2. Select the Untitled query tab.

  3. Enter the following code:

    CREATE OR REPLACE MODEL `bqml_lab.embedding_model` REMOTE WITH CONNECTION DEFAULT OPTIONS (ENDPOINT = '{{{project_0.startup_script.text_embedding_model_id | model_name | disablehighlight}}}');
  4. Click Run. The query will take a couple of minutes to run. If you get an error regarding a missing service account, simply rerun the same query.

Note: If a service account does not exist error appears, simply retry the query after a short wait as the service account may still be provisioning

Click Check my progress to verify the objective. Create an ML model

  1. Replace the query with the following code:

    CREATE OR REPLACE TABLE `bqml_lab.embeddings` AS SELECT * FROM ML.GENERATE_EMBEDDING( MODEL `bqml_lab.embedding_model`, ( SELECT title, url, abstract AS content FROM `bqml_lab.patent_data` LIMIT 200000)) WHERE LENGTH(ml_generate_embedding_status) = 0;
  2. Click Run. The query will take about 5 minutes to run.

Click Check my progress to verify the objective. Create a table named 'embeddings' by using a BigQuery ML model

  1. Replace the query with the following code:

    CREATE OR REPLACE VECTOR INDEX my_index ON `bqml_lab.embeddings`(ml_generate_embedding_result) OPTIONS(index_type = 'IVF', distance_type = 'COSINE', ivf_options = '{"num_lists":500}');
  2. Click Run. The query will take a few minutes to run.

Click Check my progress to verify the objective. Create a vector index on the 'embeddings' table

  1. Enter the following code to check the status of the index:

    SELECT table_name, index_name, index_status, coverage_percentage, last_refresh_time, disable_reason FROM `{{{project_0.project_id | "Project ID"}}}.bqml_lab.INFORMATION_SCHEMA.VECTOR_INDEXES`;
  2. Click Run. The index is ready to be used when the coverage_percentage column value is greater than 0 and the last_refresh_time column value isn't NULL. If the index is not ready the first time you run the above query rerun it occasionally to check on the index status. Continue the lab when you see results indicating the index is ready to use.

Perform a text similarity search using the vector index

You will use the VECTOR_SEARCH function to search for entries relevant to your search term which is the phrase "improving online shopper search results" in this example. The model you use to generate the embeddings when searching must be the same as the one you use to generate the embeddings in the table you are comparing against, otherwise the search results won't be accurate.

  1. In the BigQuery query tab replace the code with the following:

    SELECT query.query, base.title, base.content FROM VECTOR_SEARCH( TABLE `bqml_lab.embeddings`, 'ml_generate_embedding_result', ( SELECT ml_generate_embedding_result, content AS query FROM ML.GENERATE_EMBEDDING( MODEL `bqml_lab.embedding_model`, (SELECT 'improving online shopper search results' AS content)) ), top_k => 5, options => '{"fraction_lists_to_search": 0.01}');
  2. Click Run.

  3. Optional: If you want to try other searches replace 'improving online shopper search results' in the code with a different search.

Click Check my progress to verify the objective. Perform a text similarity search using the vector index

Congratulations!

You created an AI model, embeddings for your data, a vector index, and queried using the index to find the items most closely related to your search term.

Vorbereitung

  1. Labs erstellen ein Google Cloud-Projekt und Ressourcen für einen bestimmten Zeitraum
  2. Labs haben ein Zeitlimit und keine Pausenfunktion. Wenn Sie das Lab beenden, müssen Sie von vorne beginnen.
  3. Klicken Sie links oben auf dem Bildschirm auf Lab starten, um zu beginnen

Privates Surfen verwenden

  1. Kopieren Sie den bereitgestellten Nutzernamen und das Passwort für das Lab
  2. Klicken Sie im privaten Modus auf Konsole öffnen

In der Konsole anmelden

  1. Melden Sie sich mit Ihren Lab-Anmeldedaten an. Wenn Sie andere Anmeldedaten verwenden, kann dies zu Fehlern führen oder es fallen Kosten an.
  2. Akzeptieren Sie die Nutzungsbedingungen und überspringen Sie die Seite zur Wiederherstellung der Ressourcen
  3. Klicken Sie erst auf Lab beenden, wenn Sie das Lab abgeschlossen haben oder es neu starten möchten. Andernfalls werden Ihre bisherige Arbeit und das Projekt gelöscht.

Diese Inhalte sind derzeit nicht verfügbar

Bei Verfügbarkeit des Labs benachrichtigen wir Sie per E-Mail

Sehr gut!

Bei Verfügbarkeit kontaktieren wir Sie per E-Mail

Es ist immer nur ein Lab möglich

Bestätigen Sie, dass Sie alle vorhandenen Labs beenden und dieses Lab starten möchten

Privates Surfen für das Lab verwenden

Am besten führen Sie dieses Lab in einem Inkognito- oder privaten Browserfenster aus. So vermeiden Sie Konflikte zwischen Ihrem privaten Konto und dem Teilnehmerkonto, die zusätzliche Kosten für Ihr privates Konto verursachen könnten.