このラボでは、学習をサポートする AI ツールが組み込まれている場合があります。

Introduction

In this lab, you'll learn how to Clean your data in Python. Data cleaning in Python refers to the process of preparing and pre-processing raw data to make it suitable for analysis or machine learning tasks. It involves identifying and handling issues such as missing values, outliers, duplicates, and formatting inconsistencies.

Disclaimer: For optimal performance and compatibility, it is recommended to use either Google Chrome or Mozilla Firefox browsers while accessing the labs.

Start your lab

You'll need to start the lab before you can access the materials. To do this, click the green “Start Lab” button at the top of the screen.

Lab Start button

After you click the “Start Lab” button, you will see a Jupyter Notebook, where you will be performing further steps in the lab. You should have a jupyter notebook that looks like this:

Jupyter Notebook

What you'll do

You'll perform the following objectives in this lab:

  • Annotated Jupyter guide mirroring module content with additional context and code explanations.
  • You will explore common data science scenario i.e, missing data, you will practice finding missing values in your data, removing them or determining an alternative path.

Accessing the notebooks within the Jupyter Notebook

To complete this lab, you will open a Jupyter Notebook and follow instructions to enter code and written responses where prompted. The Jupyter notebook will autosave as you work, or you can manually save it by clicking the Save and Checkpoint button or by selecting Save and Checkpoint from the File menu.

Save

Tips

As you complete the lab, note the following features:

  • Sections: Step-by-step instructions in each section lead you through the lab.
  • Code blocks: Code blocks allow you to practice key Python coding concepts. Add code where prompted and then click the Run button to execute your code and view any possible output.

Run

  • Questions: Thought questions offer moments to pause and think about concepts and your output as you move through the lab.
  • Hints: Hidden hints provide suggestions you can use to complete your work.
Note: The main.ipynb file will be provided at the beginning of the lab. Ensure that you use this file to complete first task. After completing the Task 1, navigate to the files icon and to choose the next task file, simply double-click on the notebook file as per mentioned tasks.

Access_Lab

Steps to download and upload a CSV file:

In this lab, you will perform operations on CSV data corresponding to the tasks outlined in the instructions. Retrieve the CSV file attached to the task instructions and proceed to upload it into the Jupyter Notebook using the following steps:

  • Click on the CSV file name specified in the task instructions, and the CSV file will be downloaded to your designated download directory.

  • Next, within your lab's Jupyter Notebook, simply select the Upload File button, choose the desired CSV files, and then click on Upload.

  • The process of uploading the CSV file has commenced, and you can locate the progress indicators at the bottom of the Jupyter Notebook.

Upload CSV

Task 1: Annotated follow-along guide: Work with missing data in a Python notebook

The follow-along guide is an annotated Jupyter notebook organized to match the content from each module. It contains the same code shown in the videos for that module. In addition to content that is identical to what is covered in the videos, you’ll often find additional information throughout the guide to explain the purpose of each concept covered, why the code is written in a certain way, and tips for running the code.

Use the following CSV data for this task:

  1. Click the files icon to access Jupyter notebook file.
  2. Open the main.ipynb file, by clicking on the file name.

Task 2: Address missing data

In this task, you will explore common data science scenario: missing data. You will be presented with a business scenario and a dataset that has missing values you need to remove in order to navigate the scenario. You will practice finding missing values in your data, removing them or determining an alternative path.

This task uses a dataset called Unicorn_Companies.csv. It represents a list of private companies with a value of over $1 billion as of March 2022. The data includes the name of the country where the company was founded, its current valuation, funding, industry, top investors, year it was founded, and the year it reached a $1 billion valuation.

Use the following CSV data for this task:

  1. Click the files icon to access Jupyter notebook file.
  2. Open the task_2.ipynb file, by clicking on the file name.
  3. ### YOUR CODE HERE ### indicates where you should write code. Be sure to replace this with your own code before running the code cell.

End your lab

Before you end the lab, make sure you’re satisfied that you’ve completed all the tasks, and follow these steps:

  • Click End Lab and then click Submit. Ending the lab will remove your access to the Jupyter Notebook. You won’t be able to access the work you've completed in it again.

始める前に

  1. ラボでは、Google Cloud プロジェクトとリソースを一定の時間利用します
  2. ラボには時間制限があり、一時停止機能はありません。ラボを終了した場合は、最初からやり直す必要があります。
  3. 画面左上の [ラボを開始] をクリックして開始します

シークレット ブラウジングを使用する

  1. ラボで使用するユーザー名パスワードをコピーします
  2. プライベート モードで [コンソールを開く] をクリックします

コンソールにログインする

    ラボの認証情報を使用して
  1. ログインします。他の認証情報を使用すると、エラーが発生したり、料金が発生したりする可能性があります。
  2. 利用規約に同意し、再設定用のリソースページをスキップします
  3. ラボを終了する場合や最初からやり直す場合を除き、[ラボを終了] はクリックしないでください。クリックすると、作業内容がクリアされ、プロジェクトが削除されます

このコンテンツは現在ご利用いただけません

利用可能になりましたら、メールでお知らせいたします

ありがとうございます。

利用可能になりましたら、メールでご連絡いたします

1 回に 1 つのラボ

既存のラボをすべて終了して、このラボを開始することを確認してください

シークレット ブラウジングを使用してラボを実行する

このラボを実行するには、シークレット モードまたはシークレット ブラウジング ウィンドウを使用することをおすすめします。これにより、個人アカウントと受講者アカウントの競合を防ぎ、個人アカウントに追加料金が発生することを防ぎます。

ラボを開始するには、この簡単な手順を完了してください。