Chapter 2 ML Setup

2.1 Google Colaboratory (Colab)

The easiest way to get started with Machine Learning (for FREE!) is using Google Colaboratory. Let us look at the easiest out-of-the-box setup to get you started.

  1. Link your Google Account to store notebooks.

  2. Setting up storage.

  • Adding files can easily be done by the Colab UI on the left pane of the notebook by clicking on the folder icon and clicking on the upload symbol. However, if your instance times out, which it easily can depending on your network, RAM usage, whether you are using Colab Pro or the Free version, you might prefer a solution which is more permanent.

  • The permanent solution uses your Google Drive to host your files, especially larger datasets. There are ways to connect Colab to external data storage too like AWS. However, for most research purposes and smaller projects the Google Drive approach works well. You can check out other approaches at data storage connection. Let us now look at the Google Drive mounting method (reference)

from google.colab import drive
drive.mount('/content/drive')

# Access the URL after running the command above and get authorized

# To avoid slowdowns, transfer data from google drive to colab notebook
zip_path = '/content/drive/My Drive/Data/example-dataset.zip'

!cp "{zip_path}" .

!unzip -q example-dataset.zip

# Remove .zip file after you unzip it
!rm split-garbage-dataset.zip

# Make sure it's there
!ls

Here, we mount the drive using the first two commands. Next to speed training we transfer the actual data to the Colab notebook from drive. Here we have first stored our dataset as a zip file. We need to rerun these commands everytime we reconnect our Colab instance, but it can greatly speed up training and data management.

  1. Installing Python packages
!pip3 install package-name

Straightforward and most packages should be installed out-of-the-box. You can run other system commands by starting with the ! (exclamation mark) too.

2.2 Visual Studio (VS) Code (Local Option)

  1. Extensions

If you have a really powerful machine, or if you just prefer running your files locally, I’d recommend using VS Code. Not only is it open-source, but it also has poweful extensions to emulate notebooks, code completion and a ton of python specific tools. My top recommended extensions are (in no specific order):

  • Python for VSCode (for syntax)
  • Visual Studio IntelliCode (for notebooks)
  • Pylance (supercharge python)
  • Code Runner (easily run code snippets)
  1. Installing Python packages

You will need to have python3 and pip3 installed.

pip3 install package-name