arrow_back

Monitoring BigQuery Workloads

로그인 가입
700개 이상의 실습 및 과정 이용하기

Monitoring BigQuery Workloads

실습 1시간 30분 universal_currency_alt 크레딧 5개 show_chart 입문
info 이 실습에는 학습을 지원하는 AI 도구가 통합되어 있을 수 있습니다.
700개 이상의 실습 및 과정 이용하기

Overview

Storing and querying massive datasets can be time consuming and expensive without the right infrastructure. BigQuery is a serverless and fully managed enterprise data warehouse that enables fast and cost-effective queries using the processing power of Google's infrastructure. In BigQuery, storage and compute resources are decoupled, which provides you the flexibility to store and query your data according to your organization's needs and requirements.

BigQuery makes it easy to estimate query resource usage and costs using a variety of tools including the BigQuery query validator in the Google Cloud console, the dry-run flag in the bq command-line tool, the Google Cloud Pricing Calculator, and using the API and client libraries.

In this lab, you use the BigQuery query validator and the bq command-line tool to estimate the amount of data to be processed before running a query. You also use a SQL query and the API to determine resource usage after a query has run successfully.

Objectives

In this lab, you learn how to:

  • Use the BigQuery query validator to estimate the amount of data to be processed by a query.
  • Determine slot usage for executed queries using a SQL query and the API.
  • Complete a dry run of a query to estimate the amount of data to be processed by a query.

Setup and Requirements

Qwiklabs setup

For each lab, you get a new Google Cloud project and set of resources for a fixed time at no cost.

  1. Sign in to Qwiklabs using an incognito window.

  2. Note the lab's access time (for example, 1:15:00), and make sure you can finish within that time.
    There is no pause feature. You can restart if needed, but you have to start at the beginning.

  3. When ready, click Start lab.

  4. Note your lab credentials (Username and Password). You will use them to sign in to the Google Cloud Console.

  5. Click Open Google Console.

  6. Click Use another account and copy/paste credentials for this lab into the prompts.
    If you use other credentials, you'll receive errors or incur charges.

  7. Accept the terms and skip the recovery resource page.

How to start your lab and sign in to the Console

  1. Click the Start Lab button. If you need to pay for the lab, a pop-up opens for you to select your payment method. On the left is a panel populated with the temporary credentials that you must use for this lab.

    Credentials panel

  2. Copy the username, and then click Open Google Console. The lab spins up resources, and then opens another tab that shows the Choose an account page.

    Note: Open the tabs in separate windows, side-by-side.
  3. On the Choose an account page, click Use Another Account. The Sign in page opens.

    Choose an account dialog box with Use Another Account option highlighted

  4. Paste the username that you copied from the Connection Details panel. Then copy and paste the password.

Note: You must use the credentials from the Connection Details panel. Do not use your Google Cloud Skills Boost credentials. If you have your own Google Cloud account, do not use it for this lab (avoids incurring charges).
  1. Click through the subsequent pages:
  • Accept the terms and conditions.
  • Do not add recovery options or two-factor authentication (because this is a temporary account).
  • Do not sign up for free trials.

After a few moments, the Cloud console opens in this tab.

Note: You can view the menu with a list of Google Cloud Products and Services by clicking the Navigation menu at the top-left. Cloud Console Menu

Activate Google Cloud Shell

Google Cloud Shell is a virtual machine that is loaded with development tools. It offers a persistent 5GB home directory and runs on the Google Cloud.

Google Cloud Shell provides command-line access to your Google Cloud resources.

  1. In Cloud console, on the top right toolbar, click the Open Cloud Shell button.

    Highlighted Cloud Shell icon

  2. Click Continue.

It takes a few moments to provision and connect to the environment. When you are connected, you are already authenticated, and the project is set to your PROJECT_ID. For example:

Project ID highlighted in the Cloud Shell Terminal

gcloud is the command-line tool for Google Cloud. It comes pre-installed on Cloud Shell and supports tab-completion.

  • You can list the active account name with this command:
gcloud auth list

Output:

Credentialed accounts: - @.com (active)

Example output:

Credentialed accounts: - google1623327_student@qwiklabs.net
  • You can list the project ID with this command:
gcloud config list project

Output:

[core] project =

Example output:

[core] project = qwiklabs-gcp-44776a13dea667a6 Note: Full documentation of gcloud is available in the gcloud CLI overview guide .

Task 1. Use the query validator to estimate the amount of data to be processed

When you enter a query in the Google Cloud console, the BigQuery query validator verifies the query syntax and provides an estimate of the number of bytes to be processed by the query.

In this task, you query a public dataset (New York Citi Bikes) maintained by the BigQuery public datasets program. Using this dataset, you learn how to use the query validator to validate a SQL query and to estimate the amount of data to be processed by a query before you run it.

  1. In the Google Cloud console, in the Navigation menu (Navigation menu), under Analytics, click BigQuery.

The Welcome to BigQuery in the Cloud Console message box opens. This message box provides a link to the quickstart guide and the release notes.

  1. Click Done.

  2. In the SQL Workspace toolbar, click the Editor tab to open the SQL query editor.

bq_editor.PNG

  1. In the BigQuery query editor, paste the following query but do not run the query:
SELECT COUNT(*) FROM `bigquery-public-data.new_york_citibike.citibike_trips` WHERE start_station_name LIKE '%Broadway%';

When executed, this query returns the count of the station names that contain the text "Broadway" in the column named start_station_name in the citibike_trips table.

  1. In the query editor toolbar, notice the circular check icon, which activates the query validator and confirms that the query is valid.

BigQuery will automatically run the query validator when you add or modify the code in the query editor.

A green or red check displays above the query editor depending on whether the query is valid or invalid. If the query is valid, the validator also displays the amount of data to be processed if you choose to run the query.

bq_unsaved_query.PNG

According to the query validator, this query will process 1.06 GB when run.

  1. Click Run.

The query returns the number of records (5,414,611) that contain the text "Broadway" in the column named start_station_name.

bq_query_results.PNG

Click Check my progress to verify the objective. Estimate the amount of data processed by a query

Task 2. Determine slot usage using a SQL query

BigQuery uses slots (virtual CPUs) to execute SQL queries, and it automatically calculates how many slots each query requires, depending on query size and complexity. After you run a query in the Google Cloud console, you receive both the results and a summary of the amount of resources that were used to execute the query.

In this task, you identify the job ID of the query executed in the previous task and use it in a new SQL query to retrieve additional information about the query job.

  1. In the Query results, click on the Job information tab.

bq_job_information.PNG

  1. Identify the line for the Job ID, and use the provided value to select the project ID and job ID.

For example, the value qwiklabs-gcp-01-5f4dee7a15a3:US.bquxjob_403a14df_185dd37737a begins with the project ID, followed by the location where the job was executed, and ends with the job ID. The syntax for :US. identifies the location where the job was executed.

The project ID is the first part qwiklabs-gcp-01-5f4dee7a15a3 (before :US.), while the job ID is the last part bquxjob_403a14df_185dd37737a (after :US.).

Note: You can copy the full value to a text editor or document to make it easier to select the project ID and the job ID.

bq_job_id.PNG

  1. In the query editor, copy and paste the following query, replacing 'YOUR_ID' with your job ID (such as 'bquxjob_403a14df_185dd37737a'):
SELECT query, reservation_id, CONCAT('*****@',REGEXP_EXTRACT(user_email,r'@(.+)')) AS user_email, total_bytes_processed, total_slot_ms, job_stages FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT WHERE job_id = 'YOUR_ID';

bq_second_query.PNG

When executed, this query returns the slot usage of the query job previously executed on the Citi Bikes public dataset.

  1. Click Run.

The output of this query provides a table that shows the query stages and the associated slot usage for each stage.

Since an individual task in a query is executed by one slot, the sum of values in the column named job_stages.completed_parallel_inputs is the number of total slots used to run the query.

However, after a single slot has completed the first task assigned to it, it can be reassigned to complete another task.

So understanding the total slot time used to run the query (value provided in the column named total_slot_ms) is also important. Specifically, the slot time (in milliseconds, or ms) is provided for the entire query job and for each stage of the query, which represents the amount of slot time used to complete that stage.

For example, a query may complete 150 tasks, but if each task executes quickly, the query may actually use a lower number of slots, such as 100, rather than 150 slots.

bq_job_stages_computed_parallel_inputs.PNG

Click Check my progress to verify the objective. Determine slot usage using SQL query

Task 3. Determine slot usage using an API call

You can also retrieve information about a specific query job using the API. In BigQuery, you can use the API directly by making requests to the server, or you can use client libraries in your preferred language: C#, Go, Java, Node.js, PHP, Python, or Ruby.

In this task, you use the Google APIs Explorer to test the BigQuery API and retrieve the slot usage for the query that you ran in a previous task.

  1. In a new Incognito browser tab, navigate to the BigQuery API page for the jobs.get method.

  2. In the Try this method window, input your project ID and job ID that you identified in the previous task.

For example, qwiklabs-gcp-01-5f4dee7a15a3 for the project ID and bquxjob_403a14df_185dd37737a for the job ID.

bq_api_method.PNG

  1. Click Execute.

If asked to confirm your login, select the student username that you used to login to Google Cloud for the previous tasks:

  1. Review the API response for each stage and for the entire query job.

To see the value for completed parallel inputs for the first stage, scroll down to statistics > query > queryPlan > name: S00 > completedParallelInputs.

bq_api_response.PNG

To see the total slots used for the entire query job, scroll down to the end of the results to review the value for totalSlotMs.

bq_api_response_total_slot_time.png

Task 4. Complete a dry run of a query to estimate the amount of data processed

In the bq command-line tool, you can use the --dry_run flag to estimate the number of bytes read by the query before you run the query. You can also use the dryRun parameter when submitting a query job using the API or client libraries. Dry runs of queries do not use query slots, and you are not charged for performing a dry run.

In this task, you learn how to complete a dry run of a query using the bq command-line tool in Cloud Shell.

  1. In Cloud Shell, run the following command:
bq query \ --use_legacy_sql=false \ --dry_run \ 'SELECT COUNT(*) FROM `bigquery-public-data`.new_york_citibike.citibike_trips WHERE start_station_name LIKE "%Lexington%"'

The output shows you the estimated amount of bytes to be processed by the query before you run the query to retrieve any results.

Query successfully validated. Assuming the tables are not modified, running this query will process 1135353688 bytes of data.

Now that you know how many bytes will be processed by the query, you have the information needed to decide on your next steps for your workflow.

Click Check my progress to verify the objective. Complete a dry run of a query

End your lab

When you have completed your lab, click End Lab. Google Cloud Skills Boost removes the resources you’ve used and cleans the account for you.

You will be given an opportunity to rate the lab experience. Select the applicable number of stars, type a comment, and then click Submit.

The number of stars indicates the following:

  • 1 star = Very dissatisfied
  • 2 stars = Dissatisfied
  • 3 stars = Neutral
  • 4 stars = Satisfied
  • 5 stars = Very satisfied

You can close the dialog box if you don't want to provide feedback.

For feedback, suggestions, or corrections, please use the Support tab.

Copyright 2022 Google LLC All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.

시작하기 전에

  1. 실습에서는 정해진 기간 동안 Google Cloud 프로젝트와 리소스를 만듭니다.
  2. 실습에는 시간 제한이 있으며 일시중지 기능이 없습니다. 실습을 종료하면 처음부터 다시 시작해야 합니다.
  3. 화면 왼쪽 상단에서 실습 시작을 클릭하여 시작합니다.

시크릿 브라우징 사용

  1. 실습에 입력한 사용자 이름비밀번호를 복사합니다.
  2. 비공개 모드에서 콘솔 열기를 클릭합니다.

콘솔에 로그인

    실습 사용자 인증 정보를 사용하여
  1. 로그인합니다. 다른 사용자 인증 정보를 사용하면 오류가 발생하거나 요금이 부과될 수 있습니다.
  2. 약관에 동의하고 리소스 복구 페이지를 건너뜁니다.
  3. 실습을 완료했거나 다시 시작하려고 하는 경우가 아니면 실습 종료를 클릭하지 마세요. 이 버튼을 클릭하면 작업 내용이 지워지고 프로젝트가 삭제됩니다.

현재 이 콘텐츠를 이용할 수 없습니다

이용할 수 있게 되면 이메일로 알려드리겠습니다.

감사합니다

이용할 수 있게 되면 이메일로 알려드리겠습니다.

한 번에 실습 1개만 가능

모든 기존 실습을 종료하고 이 실습을 시작할지 확인하세요.

시크릿 브라우징을 사용하여 실습 실행하기

이 실습을 실행하려면 시크릿 모드 또는 시크릿 브라우저 창을 사용하세요. 개인 계정과 학생 계정 간의 충돌로 개인 계정에 추가 요금이 발생하는 일을 방지해 줍니다.