访问 700 多个实验和课程

Discover and Protect Sensitive Data Across Your Ecosystem: Challenge Lab

实验 1 小时 30 分钟 universal_currency_alt 5 个积分 show_chart 中级
info 此实验可能会提供 AI 工具来支持您学习。
访问 700 多个实验和课程

GSP522

Google Cloud self-paced labs logo

Overview

In a challenge lab you’re given a scenario and a set of tasks. Instead of following step-by-step instructions, you will use the skills learned from the labs in the course to figure out how to complete the tasks on your own! An automated scoring system (shown on this page) will provide feedback on whether you have completed your tasks correctly.

When you take a challenge lab, you will not be taught new Google Cloud concepts. You are expected to extend your learned skills, like changing default values and reading and researching error messages to fix your own mistakes.

To score 100% you must successfully complete all tasks within the time period!

This lab is recommended for students who have enrolled in the Discover and Protect Sensitive Data Across Your Ecosystem course. Are you ready for the challenge?

Challenge Scenario

You are a data engineer at Cymbal Cars and have been tasked with identifying and protecting sensitive data for your customers (car owners) across your organization's data ecosystem.

Your colleagues have previously completed some work to identify and redact sensitive data in your organization's Cloud Storage files and BigQuery tables (particularly US Social Security numbers) and in your organization's Gen AI model responses.

To ensure your Cloud Storage files and BigQuery assets continue to be periodically scanned and protected, you want to set up Sensitive Data Protection discovery and run jobs to identify and redact other sensitive data such as credit card numbers.

For your organization's Gen AI models, you also want to expand on your colleague's previous work to redact responses when credentials are identified in responses.

In this challenge, you use your knowledge of Sensitive Data Protection tools to implement discovery and protection for data in Cloud Storage and BigQuery and use the Python Client for Cloud Data Loss Prevention (DLP) API to identify and redact Gen AI model responses that contain credentials.

Topics tested

  • Creating and scheduling discovery scan configurations for Cloud Storage
  • Creating de-identify templates and running de-identify jobs on Cloud Storage files
  • Creating IAM tags for sensitive data and applying them to BigQuery data to grant conditional access
  • Writing Python functions to redact and block Gen AI model responses containing sensitive data as identified by the Cloud Data Loss Prevention (DLP) API

Setup and requirements

Throughout the lab, use the following details for this lab environment:

  • Log into the Google Cloud console as Username 1 ().
  • For Project ID, use:
  • For Location, use: (unless otherwise specified)

Before you click the Start Lab button

Read these instructions. Labs are timed and you cannot pause them. The timer, which starts when you click Start Lab, shows how long Google Cloud resources are made available to you.

This hands-on lab lets you do the lab activities in a real cloud environment, not in a simulation or demo environment. It does so by giving you new, temporary credentials you use to sign in and access Google Cloud for the duration of the lab.

To complete this lab, you need:

  • Access to a standard internet browser (Chrome browser recommended).
Note: Use an Incognito (recommended) or private browser window to run this lab. This prevents conflicts between your personal account and the student account, which may cause extra charges incurred to your personal account.
  • Time to complete the lab—remember, once you start, you cannot pause a lab.
Note: Use only the student account for this lab. If you use a different Google Cloud account, you may incur charges to that account.

Task 1. Enable sensitive data protection for Cloud Storage

Your team has a Cloud Storage bucket named gs://-car-owners that contains files for interactions with car owners. Most of these files have already had sensitive data redacted by your colleagues but there are some new CSV files (.csv) that have been added to bucket and contain credit card numbers (for example, sample-chat-log-data-10.csv).

Your goals are to identify and redact credit card numbers in the new CSV files and enable daily discovery for the bucket to monitor for new instances of sensitive data moving forward.

To help you achieve these goals, complete the following subtasks.

Expand the hints below for some helpful guidance to get started!

Create and schedule a discovery scan configuration to run daily for Cloud Storage

Helpful hint for discovery scan!

Property Value
Select scope Scan selected project
Managed schedules Edit Default schedule to specify Reprofile Daily for On a schedule and When inspect template changes
Select inspection template Create a new inspection template
Save data profile copies to BigQuery Set Dataset ID to cs_discovery and Table ID to cs_data_profiles in the current project
Set location to store configuration Multi_region > us (multiple regions in United States)
Display name for configuration Cloud Storage Daily Discovery

Create a de-identify template to redact credit card numbers in structured data (such as CSV files)

Helpful hint for de-identify template!

Property Value
Template ID us_ccn_deidentify
Data transformation type Record
Display name De-identify Credit Card Numbers
Location type Multi_region > global (Global)
Field for Transformation Rule message
Transformation type Match on infoType
Transformation Method Replace with infoType name

Use the de-identify template to run a de-identify job on the CSV files in the Cloud Storage bucket

Helpful hint for de-identify job!

Property Value
Job ID us_ccn_deidentify
Location type Multi_region > us (multiple regions in United States)
URL gs://-car-owners/
Scan recursively Enable this option
Sampling 100%
Sampling method No sampling
Structured de-identification template Specify the path to the de-identify template you created in step 2
Export transformation details to BigQuery Set Dataset ID to cs_transformations and Table ID to deidentify_ccn in the current project
Cloud Storage output location gs://-car-owners-transformed

Click Check my progress to verify the objective. Enable sensitive data protection for Cloud Storage.

Task 2. Enable sensitive data protection for BigQuery

Data on car owners and their purchases are also stored in BigQuery for analytics, and some of the datasets contain sensitive data. You have been tasked with creating a tag in IAM for sensitive personally identifiable information (SPII) and using it to grant conditional access for certain users to access only BigQuery datasets that have a tag of no SPII.

To help you achieve this goal, complete the following subtasks.

Expand the hints below for some helpful guidance to get started!

Create a tag in IAM for sensitive personally identifiable information (SPII)

Helpful hint for creating the tag!

Property Value
Tag key SPII
Tag key description Flag for sensitive personally identifiable information (SPII)
Tag key value 1 Yes
Tag key value 1 description Contains sensitive personally identifiable information (SPII)
Tag key value 2 No
Tag key value 2 description Does not contain sensitive personally identifiable information (SPII)

Grant conditional access for Username 2 to only BigQuery datasets that have a tag for no SPII

Helpful hint for granting conditional access!

  1. Update IAM settings for Username 2 () to add a condition (specifically access to only BigQuery datasets that have been tagged with a value of No for SPII).
Property Value
IAM Roles for Username 2 Replace Viewer with Browser, and keep BigQuery Data Viewer to add a condition.
Condition title No SPII Access Only
Condition type 1 and operator Select tag and has value
Value path for condition type 1 /SPII/No
  1. Tag the BigQuery dataset named orders with a value of No for SPII.

Unlike the car_owners dataset, the orders dataset does not contain SPII, but instead contains details on orders only.

Optional testing: If you would like to see this conditional access in action, you can log into the project as Username 2, and go to BigQuery. Refresh the page until the dataset named orders is the only dataset remaining in the Explorer list because Username 2 now only has access to datasets tagged with No for SPII.

Note that it may take a few minutes for the condition to be applied.

Click Check my progress to verify the objective. Enable sensitive data protection for BigQuery.

Task 3. Protect sensitive data in Gen AI model responses

Your team already has a Python function that identifies and redacts or blocks sensitive data types in Gen AI model responses. You have been asked to expand the function to block Gen AI model responses that contain US Vehicle Identification Numbers, which are sensitive data consisting of a unique 17-digit code assigned to every on-road motor vehicle in North America.

To help you achieve this goal, complete the following subtasks using the notebook provided in this lab environment:

  1. Update an existing Python function to block Gemini 2.0 Flash model responses when a US VIN has been included.
  2. Generate example text with the following prompt to test your updated function: Is 4Y1SL65848Z411439 an example of a US Vehicle Identification Number (VIN)?
    • When generating the response, be sure to set the temperature to 0, so that the highest probability results are returned for the progress check below.

Be sure to use the pre-created notebook named deidentify-model-response-challenge-lab.ipynb in the workbench instance named vertex-ai-jupyterlab.

  • For Project ID, use:
  • For Location, use:
Note: If you do not see notebooks in JupyterLab, please follow these additional steps to reset the instance:

1. Close the browser tab for JupyterLab, and return to the Workbench home page.

2. Select the checkbox next to the instance name, and click Reset.

3. After the Open JupyterLab button is enabled again, wait one minute, and then click Open JupyterLab.

Helpful hint for updating and testing the Python function!

Helpful hint for setting the temperature to 0!

Click Check my progress to verify the objective. Protect sensitive data in Gen AI model responses.

Congratulations!

In this lab, you created and scheduled a discovery scan configuration for Cloud Storage, and then you created a de-identify template and used it to run a de-identify job on Cloud Storage files. You also created IAM tags and applied them to BigQuery data to grant conditional access. Last, you updated a Python function to redact and block Gen AI model responses containing sensitive data as identified by the Cloud Data Loss Prevention (DLP) API.

Discover and Protect Sensitive Data Across Your Ecosystem skill badge

Google Cloud training and certification

...helps you make the most of Google Cloud technologies. Our classes include technical skills and best practices to help you get up to speed quickly and continue your learning journey. We offer fundamental to advanced level training, with on-demand, live, and virtual options to suit your busy schedule. Certifications help you validate and prove your skill and expertise in Google Cloud technologies.

Manual Last Updated September 10, 2025

Lab Last Tested September 10, 2025

Copyright 2025 Google LLC. All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.

准备工作

  1. 实验会创建一个 Google Cloud 项目和一些资源,供您使用限定的一段时间
  2. 实验有时间限制,并且没有暂停功能。如果您中途结束实验,则必须重新开始。
  3. 在屏幕左上角,点击开始实验即可开始

使用无痕浏览模式

  1. 复制系统为实验提供的用户名密码
  2. 在无痕浏览模式下,点击打开控制台

登录控制台

  1. 使用您的实验凭证登录。使用其他凭证可能会导致错误或产生费用。
  2. 接受条款,并跳过恢复资源页面
  3. 除非您已完成此实验或想要重新开始,否则请勿点击结束实验,因为点击后系统会清除您的工作并移除该项目

此内容目前不可用

一旦可用,我们会通过电子邮件告知您

太好了!

一旦可用,我们会通过电子邮件告知您

一次一个实验

确认结束所有现有实验并开始此实验

使用无痕浏览模式运行实验

请使用无痕模式或无痕式浏览器窗口运行此实验。这可以避免您的个人账号与学生账号之间发生冲突,这种冲突可能导致您的个人账号产生额外费用。