시작하기 전에
- 실습에서는 정해진 기간 동안 Google Cloud 프로젝트와 리소스를 만듭니다.
- 실습에는 시간 제한이 있으며 일시중지 기능이 없습니다. 실습을 종료하면 처음부터 다시 시작해야 합니다.
- 화면 왼쪽 상단에서 실습 시작을 클릭하여 시작합니다.
BigQuery is Google's fully managed, NoOps, low cost analytics database. With BigQuery you can query terabytes and terabytes of data without having any infrastructure to manage, or needing a database administrator. BigQuery uses SQL and can take advantage of the pay-as-you-go model. BigQuery allows you to focus on analyzing data to find meaningful insights.
BigQuery Machine Learning (BQML, product in beta) is a new feature in BigQuery where data analysts can create, train, evaluate, and predict with machine learning models with minimal coding.
In this lab, you will explore millions of New York City yellow taxi cab trips available in a BigQuery public dataset. Then you will create a machine learning model inside of BigQuery to predict the fare of the cab ride given your model inputs. Lastly, you will evaluate the performance of your model and make predictions.
In this lab, you will learn how to perform the following tasks:
For each lab, you get a new Google Cloud project and set of resources for a fixed time at no cost.
Sign in to Google Skills using an incognito window.
Note the lab's access time (for example, 1:15:00), and make sure you can finish within that time.
There is no pause feature. You can restart if needed, but you have to start at the beginning.
When ready, click Start lab.
Note your lab credentials (Username and Password). You will use them to sign in to the Google Cloud Console.
Click Open Google Console.
Click Use another account and copy/paste credentials for this lab into the prompts. If you use other credentials, you'll receive errors or incur charges.
Accept the terms and skip the recovery resource page.
The Welcome to BigQuery in the Cloud Console message box opens. This message box provides a link to the quickstart guide and lists UI updates.
Question: How many trips did Yellow taxis take each month in 2015?
The result is output as follows.
Question: What was the average speed of Yellow taxi trips in 2015?
The result is output as follows.
During the day, the average speed is around 11-12 MPH; but at 5:00 AM the average speed almost doubles to 21 MPH. Intuitively this makes sense since there is likely less traffic on the road at 5:00 AM.
You will now create a machine learning model in BigQuery to predict the price of a cab ride in New York city given the historical dataset of trips and trip data. Predicting the fare before the ride could be very useful for trip planning for both the rider and the taxi agency.
The New York City Yellow Cab dataset is a public dataset provided by the city and has been loaded into BigQuery for your exploration. Browse the complete list of fields and then preview the dataset to find useful features that will help a machine learning model understand the relationship between data about historical cab rides and the price of the fare.
Your team decides to test whether these below fields are good inputs to your fare forecasting model:
Note a few things about the query:
taxitrips does the bulk of the extraction for the NYC dataset, with the SELECT containing your training features and label.The sample results are as indicated below.
Question: What is the label (correct answer)?
In this case, total_fare is the label (that we will be predicting). You created this field out of tolls_amount and fare_amount, so you could ignore customer tips as part of the model, as they are discretionary.
Next, create a new BigQuery dataset which will also store your ML models.
In the Explorer pane, click the View actions icon next to your project ID. Click Create datatset.
In the Create dataset dialog:
taxi.Now that you have your initial features selected, you are now ready to create your first ML model in BigQuery.
There are the two model types to choose from.
| Model | Model Type | Label Data type | Example |
|---|---|---|---|
| Forecasting | linear_reg | Numeric value (typically an integer or floating point) | Forecast sales figures for next year given historical sales data. |
| Classification | logistic_reg | 0 or 1 for binary classification | Classify an email as spam or not spam given the context. |
| Multiclass Classification | logistic_reg | These models can be used to predict multiple possible values such as whether an input is "low-value", "medium-value", or "high-value". Labels can have up to 50 unique values. | Classify an email as spam, normal priority, or high importance. |
Question: Which model type should you choose? Since you are predicting a numeric value (cab fare) you want to use linear regression.
-- paste the previous training dataset query here with the training dataset query you created earlier (omitting the #standardSQL line):Wait for the model to train (about 5 - 10 minutes).
After your model is trained, you will see the result This statement created a new model named <Project-ID>:taxi.taxifare_model which indicates that your model has been successfully trained.
Look inside your taxi dataset and confirm taxifare_model now appears.
Next, you will evaluate the performance of the model against new unseen evaluation data.
For linear regression models, you want to use a loss metric like Root Mean Squared Error. You want to keep training and improving the model until it has the lowest RMSE.
In BQML, mean_squared_error is simply a queryable field when evaluating your trained ML model. Simply add a SQRT() to get RMSE.
Now that training is complete, you can evaluate how well the model performs with this query using ML.EVALUATE:
You are now evaluating the model against a different set of taxi cab trips with your params.EVAL filter.
After the model runs, review your model results (your model RMSE value will vary slightly).
| Row | rmse |
|---|---|
| 1 | 9.477056435999074 |
After evaluating your model you get a RMSE of $9.47.
Knowing whether or not this loss metric is acceptable to productionalize your model is entirely dependent on your benchmark criteria, which is set before model training begins. Benchmarking is establishing a minimum level of model performance and accuracy that is acceptable.
You want to make sure that you aren't overfitting your model to your data. Overfitting your model will make it perform worse on new, unseen data.
ML.TRAINING_INFO:This will select all the information from each iteration of the model training. It will include the training iteration number, the training loss, and the evaluation loss.
To compare training and evaluation loss, let's explore the difference in the loss curves visually.
Click on Open in > Looker Studio in the BigQuery Cloud Console. This will open Data Studio with the data from your query connected as an input source.
When prompted, click the Get Started button.
Select Authorize, when asked if Google Data Studio can access your data.
Click on Get Started and acknowledge the Terms of Service. Click Accept.
Select No, thanks for all in preferences and click Done.
Refresh the tab to load the data.
Once in Looker Studio, click on the Add Chart > Combo Chart icon.
The training loss matches the evaluation loss nearly identically, which indicates that we are not overfitting the model.
Excellent! Let's move on to prediction.
Now you will see the model's predictions for taxi fares alongside the actual fares and other features for those rides.
Tip: Add warm_start = true to your model options if you are retraining new data on an existing model for faster training times. Note that you cannot change the feature columns (this would necessitate a new model).
You can use the Chicago taxi trips dataset to bring in the bigquery-public-data project if you want to explore modeling on other datasets like forecasting fares for Chicago taxi trips.
In this lab, you have successfully built an ML model in BigQuery to forecast taxi cab fare for New York City cabs.
When you have completed your lab, click End Lab. Google Skills removes the resources you’ve used and cleans the account for you.
You will be given an opportunity to rate the lab experience. Select the applicable number of stars, type a comment, and then click Submit.
The number of stars indicates the following:
You can close the dialog box if you don't want to provide feedback.
For feedback, suggestions, or corrections, please use the Support tab.
Copyright 2026 Google LLC All rights reserved. Google and the Google logo are trademarks of Google LLC. All other company and product names may be trademarks of the respective companies with which they are associated.
현재 이 콘텐츠를 이용할 수 없습니다
이용할 수 있게 되면 이메일로 알려드리겠습니다.
감사합니다
이용할 수 있게 되면 이메일로 알려드리겠습니다.
한 번에 실습 1개만 가능
모든 기존 실습을 종료하고 이 실습을 시작할지 확인하세요.