始める前に
- ラボでは、Google Cloud プロジェクトとリソースを一定の時間利用します
- ラボには時間制限があり、一時停止機能はありません。ラボを終了した場合は、最初からやり直す必要があります。
- 画面左上の [ラボを開始] をクリックして開始します
Create your first DEA pipeline
/ 40
Customize the Data Engineering Agent with pipeline instructions
/ 40
Enhance the pipeline with data quality checks
/ 20
Create your first DEA pipeline
/ 40
Customize the Data Engineering Agent with pipeline instructions
/ 40
Enhance the pipeline with data quality checks
/ 20
The traditional data engineering lifecycle—ingesting raw data, cleaning it, managing dependencies, and building analytical models—has historically required hundreds of lines of manual SQL and complex configuration files. We are now entering the era of Agentic Data Engineering, where the focus shifts from manual execution to high-level partnership.
An AI Agent in this context is not just a chatbot; it is a specialized collaborator designed to understand data schemas, identify file types, and translate complex business intent into production-ready architectures. In an agentic workflow, you provide the "intent"—the governance rules and business goals—while the agent handles the "execution," such as generating SQL, handling PII anonymization, and managing table dependencies.
In this lab, you act as a Lead Data Engineer partnering with an automated Data Engineering (DE) Agent to help build a data pipeline for the fictional ecommerce company: theLook and its marketing team. Your goal is not to manually write the ETL scripts, but to prompt the agent to design, build, and verify a complex customer segmentation pipeline using diverse data sources and formats. You guide the agent through multi-format data ingestion (Parquet, CSV, Avro), implementing PII anonymization, adapting it to your team’s needs, and have it create an RFM model.
In this lab, you learn how to:
You should be generally familiar with the basics of data engineering and the basics of ETL or ELT pipelines. A basic understanding of config files is beneficial but not required.
Read these instructions. Labs are timed and you cannot pause them. The timer, which starts when you click Start Lab, shows how long Google Cloud resources are made available to you.
This hands-on lab lets you do the lab activities in a real cloud environment, not in a simulation or demo environment. It does so by giving you new, temporary credentials you use to sign in and access Google Cloud for the duration of the lab.
To complete this lab, you need:
Click the Start Lab button. If you need to pay for the lab, a dialog opens for you to select your payment method. On the right is the Lab setup and access panel with the following:
Note that the lab timer is located near the top of the page, showing the remaining time.
Click Open Google Cloud console (or right-click and select Open Link in Incognito Window if you are running the Chrome browser).
The lab spins up resources, and then opens another tab that shows the Sign in page.
Tip: Arrange the tabs in separate windows, side-by-side.
If necessary, copy the Username below and paste it into the Sign in dialog.
You can also find the Username in the Lab setup and access panel.
Click Next.
Copy the Password below and paste it into the Welcome dialog.
You can also find the Password in the Lab setup and access panel.
Click Next.
Click through the subsequent pages:
After a few moments, the Google Cloud console opens in this tab.
Cloud Shell is a virtual machine that is loaded with development tools. It offers a persistent 5GB home directory and runs on the Google Cloud. Cloud Shell provides command-line access to your Google Cloud resources.
Click Activate Cloud Shell at the top of the Google Cloud console.
Click through the following windows:
When you are connected, you are already authenticated, and the project is set to your Project_ID,
gcloud is the command-line tool for Google Cloud. It comes pre-installed on Cloud Shell and supports tab-completion.
Output:
Output:
gcloud, in Google Cloud, refer to the gcloud CLI overview guide.
In this task, you create your first pipeline with the help of the Data Engineering Agent and use a natural language prompt to create a plan for the pipeline. Finally, you use your knowledge of data operations to assess what it generates.
The Welcome to BigQuery in the Cloud Console message box opens. This message box provides a link to the quickstart guide and the release notes.
The BigQuery console opens.
At the top of the BigQuery console, next to the + icon, click the down arrow, and then select Pipeline.
Select Run with user credentials of
If the Ask Agent is not selectable at the top of the screen, hover over it, and then enable the Gemini in Data Analytic API.
When it's enabled, a dialog box titled Ask Agent appears.
Enter the prompt:
Observe three things:
Press ENTER or .
It generates a list of steps and assumptions. Read through them and validate it's what you want. There are a few things to watch for:
Once everything is in working order, tell the agent you approve.
Afterwards a node diagram appears on the screen.
Click Check my progress to verify the objective.
You created your first pipeline! But you are not quite done yet. Notice how, while the names are consistent and readable, you really had no control of how they were made or what they were named. Your team has consistent naming conventions and rules for creating datasets. In the next section, you use pipeline instructions along with the Data Engineering Agent to adapt the agent to your team's needs.
Now that you have created your first pipeline DAG, you are ready to go a step further and learn how to customize the agent to adapt to your workflows.
At the top of the BigQuery console, next to the + icon, click the down arrow, and then select Pipeline.
Select Run with user credentials of
Click Ask Agent towards the top of the screen or the Gemini logo at the bottom.
Before entering anything in the box, click Pipeline Instructions.
Select Create instructions file to open a blank GEMINI.md file.
This is where you enter directives to the DE agent to create your pipeline and any naming conventions you have. Your team:
*) identifiers for granting table level permissions for certain groupsCopy and paste the following into the GEMINI.md file:
After that is pasted, you can go back to the pipeline tab and see that the above Manage instructions, it now says "1 instruction file added".
Click Save.
Now copy and paste the below prompt into the dialog box:
Press ENTER or .
It generates a list of steps and assumptions. Read through them and validate it’s what you want. Once again, read through the assumptions and ensure it captured all the correct data file names.
Once it all looks good, type ’approve’.
After nodes appear on the page, click the top node.
It should say create_dataset. You should now be able to read the query that encompasses this node. It says CREATE SCHEMA IF NOT EXISTS {{{project_0.project_id|PROJECT_ID}}}.theLook_marketing. This shows that DEA did read your pipeline instructions and is incorporating them into your workflow.
Click Run > Run all tasks.
Once finished, select the prod_rfm_analysis node, and on the left hand side, select Data preview.
In this view, observe the column names. The bars represent the distribution of values in the table.
Click Check my progress to verify the objective.
Now that the RFM analysis pipeline is built, enhance it by adding a data quality check using the Data Engineering Agent. You want to ensure that all monetary values in your final analysis table are non-negative as is typical of RFM analyses. This task demonstrates how you can use DEA to iteratively refine pipelines and incorporate data governance best practices like data quality assertions both by having DEA act on the pipeline or acting yourself.
Return to the pipeline interface from Task 2.
Click Ask Agent.
Enter the following prompt:
Review the plan proposed by the agent. It should indicate modification of the script for the prod_rfm_analysis node.
Type approve to apply the changes.
Once updated, click on the prod_rfm_analysis node.
Examine the generated SQL code. You should see an added ASSERT statement within the pre_operations or post_operations block.
Click Apply at the top of the screen.
Click Run > Run all tasks again.
If any data violated this assertion, the pipeline would fail at this step, highlighting the data quality issue.
What happens when you want to make another assertion but perhaps make a mistake? How would you amend that? Oftentimes it's easier to correct a mistake manually rather than describe the change to be made to the agent and have it remedy it.
In the agent chatbox you have used in previous steps, insert this prompt and run:
After it creates the node, run it.
Navigate to the Execution tab of the pipeline.
Observe that this execution fails.
Why? The check aims to ensure the validity of your data by verifying ages are within expectations. The website does not allow purchases from anyone younger than 12 and the oldest person to ever live was 122 years old. Look back at the prompt: the max value is 23 instead of 123. Rather than tell the Agent to fix it, you will fix it.
Go back to the Pipeline tab.
Click the node you just created named 'check_reasonable_age'.
In the bottom panel, click Open > In new tab.
In the new tab that opens, change 23 to 123.
Return to the tab that contains the pipeline and look at the node's contents now.
Observe that it has updated here as well.
Run the task and go to the Execution tab once again.
Observe that now the execution succeeds.
Return to the pipeline tab and click Apply at the top of the page once again.
Click Check my progress to verify the objective.
You have already completed the lab tasks to explore the bulk of what Data Engineering Agent can do. This optional section aims to show you how you can extend in BigQuery to get quick visualisations of what the DEA outputs.
In the BigQuery Explorer, find the theLook_marketing dataset and expand it.
Locate the prod_rfm_analysis table.
Click the menu icon (three dots) next to the table name and select Open In > Python Notebook. This opens a new Colab Enterprise notebook.
Run the first two cells either by clicking into the cell and using the CTRL + ENTER keyboard shortcut or by pressing the "▶" that appears when a cell is selected.
Under the second cell, observe a button that says "Visualize with results" materializes. Click it.
After the chart appears, on the right-hand panel, there is a section called Breakdown Dimension.
Click Add Dimension.
Select traffic_source_type.
A bar chart now shows average_monetary_value by traffic_source_type. In Task 2, you completed steps to achieve a quick way to see the distribution of values; you have built upon that task and discovered an easy way to visualize more in depth for quick quality checks before pushing the data created by your pipelines to your data teams. Feel free to play around with the other functionality in the chart.
You have successfully built and deployed a sophisticated data pipeline using the Data Engineering Agent.
By completing this lab, you have demonstrated proficiency in:
You’ve seen how the Data Engineering Agent shrinks the gap between a business requirement and a production-ready pipeline. You are now ready to apply these automated patterns to your own data engineering challenges!
...helps you make the most of Google Cloud technologies. Our classes include technical skills and best practices to help you get up to speed quickly and continue your learning journey. We offer fundamental to advanced level training, with on-demand, live, and virtual options to suit your busy schedule. Certifications help you validate and prove your skill and expertise in Google Cloud technologies.
Manual Last Updated May 12, 2026
Lab Last Tested May 12, 2026
Copyright 2026 Google LLC. All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.
このコンテンツは現在ご利用いただけません
利用可能になりましたら、メールでお知らせいたします
ありがとうございます。
利用可能になりましたら、メールでご連絡いたします
1 回に 1 つのラボ
既存のラボをすべて終了して、このラボを開始することを確認してください
ラボを開始するには、この簡単な手順を完了してください。