访问 700 多个实验和课程

Discover and Protect Sensitive Data Across Your Ecosystem: Challenge Lab

实验 1 小时 30 分钟 universal_currency_alt 5 个积分 show_chart 中级

info 此实验可能会提供 AI 工具来支持您学习。

GSP522
Overview
Challenge Scenario
Setup and requirements
Task 1. Enable sensitive data protection for Cloud Storage
Task 2. Enable sensitive data protection for BigQuery
Task 3. Protect sensitive data in Gen AI model responses
Congratulations!

访问 700 多个实验和课程

GSP522

Google Cloud self-paced labs logo

Overview

In a challenge lab you’re given a scenario and a set of tasks. Instead of following step-by-step instructions, you will use the skills learned from the labs in the course to figure out how to complete the tasks on your own! An automated scoring system (shown on this page) will provide feedback on whether you have completed your tasks correctly.

When you take a challenge lab, you will not be taught new Google Cloud concepts. You are expected to extend your learned skills, like changing default values and reading and researching error messages to fix your own mistakes.

To score 100% you must successfully complete all tasks within the time period!

This lab is recommended for students who have enrolled in the Discover and Protect Sensitive Data Across Your Ecosystem course. Are you ready for the challenge?

Challenge Scenario

You are a data engineer at Cymbal Cars and have been tasked with identifying and protecting sensitive data for your customers (car owners) across your organization's data ecosystem.

Your colleagues have previously completed some work to identify and redact sensitive data in your organization's Cloud Storage files and BigQuery tables (particularly US Social Security numbers) and in your organization's Gen AI model responses.

To ensure your Cloud Storage files and BigQuery assets continue to be periodically scanned and protected, you want to set up Sensitive Data Protection discovery and run jobs to identify and redact other sensitive data such as credit card numbers.

For your organization's Gen AI models, you also want to expand on your colleague's previous work to redact responses when credentials are identified in responses.

In this challenge, you use your knowledge of Sensitive Data Protection tools to implement discovery and protection for data in Cloud Storage and BigQuery and use the Python Client for Cloud Data Loss Prevention (DLP) API to identify and redact Gen AI model responses that contain credentials.

Topics tested

Creating and scheduling discovery scan configurations for Cloud Storage
Creating de-identify templates and running de-identify jobs on Cloud Storage files
Creating IAM tags for sensitive data and applying them to BigQuery data to grant conditional access
Writing Python functions to redact and block Gen AI model responses containing sensitive data as identified by the Cloud Data Loss Prevention (DLP) API

Setup and requirements

Throughout the lab, use the following details for this lab environment:

Log into the Google Cloud console as Username 1 ().
For Project ID, use:
For Location, use: (unless otherwise specified)

Before you click the Start Lab button

Read these instructions. Labs are timed and you cannot pause them. The timer, which starts when you click Start Lab, shows how long Google Cloud resources are made available to you.

This hands-on lab lets you do the lab activities in a real cloud environment, not in a simulation or demo environment. It does so by giving you new, temporary credentials you use to sign in and access Google Cloud for the duration of the lab.

To complete this lab, you need:

Access to a standard internet browser (Chrome browser recommended).

Note: Use an Incognito (recommended) or private browser window to run this lab. This prevents conflicts between your personal account and the student account, which may cause extra charges incurred to your personal account.

Time to complete the lab—remember, once you start, you cannot pause a lab.

Note: Use only the student account for this lab. If you use a different Google Cloud account, you may incur charges to that account.

Task 1. Enable sensitive data protection for Cloud Storage

Your team has a Cloud Storage bucket named gs://-car-owners that contains files for interactions with car owners. Most of these files have already had sensitive data redacted by your colleagues but there are some new CSV files (.csv) that have been added to bucket and contain credit card numbers (for example, sample-chat-log-data-10.csv).

Your goals are to identify and redact credit card numbers in the new CSV files and enable daily discovery for the bucket to monitor for new instances of sensitive data moving forward.

To help you achieve these goals, complete the following subtasks.

Expand the hints below for some helpful guidance to get started!

Create and schedule a discovery scan configuration to run daily for Cloud Storage

Helpful hint for discovery scan!

Property	Value
Select scope	Scan selected project
Managed schedules	Edit Default schedule to specify Reprofile Daily for On a schedule and When inspect template changes
Select inspection template	Create a new inspection template
Save data profile copies to BigQuery	Set Dataset ID to cs_discovery and Table ID to cs_data_profiles in the current project
Set location to store configuration	Multi_region > us (multiple regions in United States)
Display name for configuration	Cloud Storage Daily Discovery

Create a de-identify template to redact credit card numbers in structured data (such as CSV files)

Helpful hint for de-identify template!

Property	Value
Template ID	us_ccn_deidentify
Data transformation type	Record
Display name	De-identify Credit Card Numbers
Location type	Multi_region > global (Global)
Field for Transformation Rule	message
Transformation type	Match on infoType
Transformation Method	Replace with infoType name

Use the de-identify template to run a de-identify job on the CSV files in the Cloud Storage bucket

Helpful hint for de-identify job!

Property	Value
Job ID	us_ccn_deidentify
Location type	Multi_region > us (multiple regions in United States)
URL	gs://-car-owners/
Scan recursively	Enable this option
Sampling	100%
Sampling method	No sampling
Structured de-identification template	Specify the path to the de-identify template you created in step 2
Export transformation details to BigQuery	Set Dataset ID to cs_transformations and Table ID to deidentify_ccn in the current project
Cloud Storage output location	gs://-car-owners-transformed

Click Check my progress to verify the objective. Enable sensitive data protection for Cloud Storage.

Task 2. Enable sensitive data protection for BigQuery

Data on car owners and their purchases are also stored in BigQuery for analytics, and some of the datasets contain sensitive data. You have been tasked with creating a tag in IAM for sensitive personally identifiable information (SPII) and using it to grant conditional access for certain users to access only BigQuery datasets that have a tag of no SPII.

To help you achieve this goal, complete the following subtasks.

Expand the hints below for some helpful guidance to get started!

Create a tag in IAM for sensitive personally identifiable information (SPII)

Helpful hint for creating the tag!

Property	Value
Tag key	SPII
Tag key description	Flag for sensitive personally identifiable information (SPII)
Tag key value 1	Yes
Tag key value 1 description	Contains sensitive personally identifiable information (SPII)
Tag key value 2	No
Tag key value 2 description	Does not contain sensitive personally identifiable information (SPII)

Grant conditional access for Username 2 to only BigQuery datasets that have a tag for no SPII

Helpful hint for granting conditional access!

Update IAM settings for Username 2 () to add a condition (specifically access to only BigQuery datasets that have been tagged with a value of No for SPII).

Property	Value
IAM Roles for Username 2	Replace Viewer with Browser, and keep BigQuery Data Viewer to add a condition.
Condition title	No SPII Access Only
Condition type 1 and operator	Select tag and has value
Value path for condition type 1	/SPII/No

Tag the BigQuery dataset named orders with a value of No for SPII.

Unlike the car_owners dataset, the orders dataset does not contain SPII, but instead contains details on orders only.

Optional testing: If you would like to see this conditional access in action, you can log into the project as Username 2, and go to BigQuery. Refresh the page until the dataset named orders is the only dataset remaining in the Explorer list because Username 2 now only has access to datasets tagged with No for SPII.

Note that it may take a few minutes for the condition to be applied.

Click Check my progress to verify the objective. Enable sensitive data protection for BigQuery.

Task 3. Protect sensitive data in Gen AI model responses

Your team already has a Python function that identifies and redacts or blocks sensitive data types in Gen AI model responses. You have been asked to expand the function to block Gen AI model responses that contain US Vehicle Identification Numbers, which are sensitive data consisting of a unique 17-digit code assigned to every on-road motor vehicle in North America.

To help you achieve this goal, complete the following subtasks using the notebook provided in this lab environment:

Update an existing Python function to block Gemini 2.0 Flash model responses when a US VIN has been included.
Generate example text with the following prompt to test your updated function: Is 4Y1SL65848Z411439 an example of a US Vehicle Identification Number (VIN)?
- When generating the response, be sure to set the temperature to 0, so that the highest probability results are returned for the progress check below.

Be sure to use the pre-created notebook named deidentify-model-response-challenge-lab.ipynb in the workbench instance named vertex-ai-jupyterlab.

For Project ID, use:
For Location, use:

Note: If you do not see notebooks in JupyterLab, please follow these additional steps to reset the instance:

1. Close the browser tab for JupyterLab, and return to the Workbench home page.

2. Select the checkbox next to the instance name, and click Reset.

3. After the Open JupyterLab button is enabled again, wait one minute, and then click Open JupyterLab.

Helpful hint for updating and testing the Python function!

Helpful hint for setting the temperature to 0!

Click Check my progress to verify the objective. Protect sensitive data in Gen AI model responses.

Congratulations!

In this lab, you created and scheduled a discovery scan configuration for Cloud Storage, and then you created a de-identify template and used it to run a de-identify job on Cloud Storage files. You also created IAM tags and applied them to BigQuery data to grant conditional access. Last, you updated a Python function to redact and block Gen AI model responses containing sensitive data as identified by the Cloud Data Loss Prevention (DLP) API.

Discover and Protect Sensitive Data Across Your Ecosystem skill badge

Google Cloud training and certification

...helps you make the most of Google Cloud technologies. Our classes include technical skills and best practices to help you get up to speed quickly and continue your learning journey. We offer fundamental to advanced level training, with on-demand, live, and virtual options to suit your busy schedule. Certifications help you validate and prove your skill and expertise in Google Cloud technologies.

Manual Last Updated September 10, 2025

Lab Last Tested September 10, 2025

Copyright 2025 Google LLC. All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.

Discover and Protect Sensitive Data Across Your Ecosystem: Challenge Lab

GSP522

Overview

Challenge Scenario

Topics tested

Setup and requirements

Before you click the Start Lab button

Task 1. Enable sensitive data protection for Cloud Storage

Create and schedule a discovery scan configuration to run daily for Cloud Storage

Create a de-identify template to redact credit card numbers in structured data (such as CSV files)

Use the de-identify template to run a de-identify job on the CSV files in the Cloud Storage bucket

Task 2. Enable sensitive data protection for BigQuery

Create a tag in IAM for sensitive personally identifiable information (SPII)

Grant conditional access for Username 2 to only BigQuery datasets that have a tag for no SPII

Task 3. Protect sensitive data in Gen AI model responses

Congratulations!

Google Cloud training and certification

准备工作

使用无痕浏览模式

登录控制台

使用无痕浏览模式运行实验