Instructions et exigences de configuration de l'atelier

Protégez votre compte et votre progression. Utilisez toujours une fenêtre de navigation privée et les identifiants de l'atelier pour exécuter cet atelier.

Vector search with BigQuery

Atelier 45 minutes universal_currency_alt 5 crédits show_chart Débutant

info Cet atelier peut intégrer des outils d'IA pour vous accompagner dans votre apprentissage.

Overview
Setup and requirements
Task 1. Create a remote model for text embedding generation
Congratulations!

Ce contenu n'est pas encore optimisé pour les appareils mobiles.

Pour une expérience optimale, veuillez accéder à notre site sur un ordinateur de bureau en utilisant un lien envoyé par e-mail.

Overview

BigQuery Vector Search Vector search allows you to find the most similar items in your dataset by comparing the mathematical representations of their features, known as embeddings, rather than relying on exact keyword matches.

Have you ever been shopping online and been impressed by how a website can recommend products that are incredibly similar to what you're searching for, even if you can't quite describe them with words? This is often powered by vector search, a technique that goes beyond simple keyword matching. Instead of just searching for text, vector search analyzes the underlying features and characteristics of an item, allowing it to find things that are conceptually or visually similar. In this lab, you'll get hands-on experience using the powerful and scalable vector search capabilities built directly into BigQuery.

For a global retailer like Cymbal E-commerce, this technology is a game-changer for customer experience. Imagine a customer searching for a "lightweight jacket for hiking." A traditional keyword search might miss the perfect product if its description uses "windbreaker" instead of "jacket." With vector search, Cymbal can analyze embeddings—numerical representations of product images and descriptions—to return a list of all relevant outerwear, regardless of the exact terminology. This helps customers discover the products they want faster, making them happier and boosting sales. Let's dive in and build this for Cymbal!

In this lab, you learn how to use BigQuery to perform vector searches.

Note: In this lab, you'll use a public database of patents instead of Cymbal's proprietary product listings and detailed descriptions for privacy reasons. The process is the same; you only need to change the dataset, table, and column names to match your source data.

What you'll do

Use an ML model to generate embeddings
Create a vector index
Use the VECTOR_SEARCH function against the embeddings you created

Setup and requirements

Before you click the Start Lab button

Read these instructions. Labs are timed and you cannot pause them. The timer, which starts when you click Start Lab, shows how long Google Cloud resources will be made available to you.

This hands-on lab lets you do the lab activities yourself in a real cloud environment, not in a simulation or demo environment. It does so by giving you new, temporary credentials that you use to sign in and access Google Cloud for the duration of the lab.

What you need

To complete this lab, you need:

Access to a standard internet browser (Chrome browser recommended).
Time to complete the lab.

Note: If you have a personal Google Cloud account or project, do not use it for this lab.

Note: If you are using a Pixelbook, open an Incognito window to run this lab.

Log in to Google Cloud Console

Using the browser tab or window you are using for this lab session, copy the Username from the Connection Details panel and click the Open Google Console button.

Note: If you are asked to choose an account, click Use another account.

Paste in the Username, and then the Password as prompted.
Click Next.
Accept the terms and conditions.

Since this is a temporary account, which will last only as long as this lab:

Do not add recovery options
Do not sign up for free trials

Once the console opens, view the list of services by clicking the Navigation menu () at the top-left.

Navigation menu

Verify or enable required APIs

In the Google Cloud Console, enter BigQuery API in the top search bar.
Click on the result for BigQuery API under Marketplace.
If the API is not already enabled, click Enable to enable the API.
Repeat steps 1-3 for BigQuery Connection API and again for Agent Platform API.

Task 1. Create a remote model for text embedding generation

In this task, you create a Agent Platform text embedding generation model that is required for a vector search and to create the embeddings for the database.

Create an AI model

In the Google Cloud Console Navigation menu (), navigate to BigQuery > Studio.
Select the Untitled query tab.
Enter the following code:
CREATE OR REPLACE MODEL `bqml_lab.embedding_model` REMOTE WITH CONNECTION DEFAULT OPTIONS (ENDPOINT = '{{{project_0.startup_script.text_embedding_model_id | model_name | disablehighlight}}}');
Click Run. The query will take a couple of minutes to run. If you get an error regarding a missing service account, simply rerun the same query.

Note: If a service account does not exist error appears, simply retry the query after a short wait as the service account may still be provisioning

Click Check my progress to verify the objective. Create an ML model

Replace the query with the following code:
CREATE OR REPLACE TABLE `bqml_lab.embeddings` AS SELECT * FROM ML.GENERATE_EMBEDDING( MODEL `bqml_lab.embedding_model`, ( SELECT title, url, abstract AS content FROM `bqml_lab.patent_data` LIMIT 200000)) WHERE LENGTH(ml_generate_embedding_status) = 0;
Click Run. The query will take about 5 minutes to run.

Click Check my progress to verify the objective. Create a table named 'embeddings' by using a BigQuery ML model

Replace the query with the following code:
CREATE OR REPLACE VECTOR INDEX my_index ON `bqml_lab.embeddings`(ml_generate_embedding_result) OPTIONS(index_type = 'IVF', distance_type = 'COSINE', ivf_options = '{"num_lists":500}');
Click Run. The query will take a few minutes to run.

Click Check my progress to verify the objective. Create a vector index on the 'embeddings' table

Enter the following code to check the status of the index:
SELECT table_name, index_name, index_status, coverage_percentage, last_refresh_time, disable_reason FROM `{{{project_0.project_id | "Project ID"}}}.bqml_lab.INFORMATION_SCHEMA.VECTOR_INDEXES`;
Click Run. The index is ready to be used when the coverage_percentage column value is greater than 0 and the last_refresh_time column value isn't NULL. If the index is not ready the first time you run the above query rerun it occasionally to check on the index status. Continue the lab when you see results indicating the index is ready to use.

Perform a text similarity search using the vector index

You will use the VECTOR_SEARCH function to search for entries relevant to your search term which is the phrase "improving online shopper search results" in this example. The model you use to generate the embeddings when searching must be the same as the one you use to generate the embeddings in the table you are comparing against, otherwise the search results won't be accurate.

In the BigQuery query tab replace the code with the following:
SELECT query.query, base.title, base.content FROM VECTOR_SEARCH( TABLE `bqml_lab.embeddings`, 'ml_generate_embedding_result', ( SELECT ml_generate_embedding_result, content AS query FROM ML.GENERATE_EMBEDDING( MODEL `bqml_lab.embedding_model`, (SELECT 'improving online shopper search results' AS content)) ), top_k => 5, options => '{"fraction_lists_to_search": 0.01}');
Click Run.
Optional: If you want to try other searches replace 'improving online shopper search results' in the code with a different search.

Click Check my progress to verify the objective. Perform a text similarity search using the vector index

Congratulations!

You created an AI model, embeddings for your data, a vector index, and queried using the index to find the items most closely related to your search term.

Vector search with BigQuery

Overview

What you'll do

Setup and requirements

Before you click the Start Lab button

What you need

Log in to Google Cloud Console

Verify or enable required APIs

Task 1. Create a remote model for text embedding generation

Create an AI model

Perform a text similarity search using the vector index

Congratulations!

Avant de commencer

Utilisez la navigation privée

Connectez-vous à la console

Utilisez la navigation privée pour effectuer l'atelier