in #100DaysOfCode, #100DaysOfData, #100DaysOfTensorflow

Import Python packages from GitHub into a Colab

For today’s challenge, we will learn how to import a Python package from GitHub and use it on Colab.

During the next days, I will explore Tensorflow for at least 1 hour per day and post the notebooks, data and models to this repository.

Today’s notebook is available here.

Why is it so important?

Learning how to import a package may seem a secondary task, but sometimes you will want to use a module or something you created on a Tensorflow project.

You can upload documents and other stuff using the Colab interface. But if you stop a session, you will have to manually upload everything back.

A simpler way of uploading things is cloning a GitHub repository and using command line to automate this step.

Import and install

First thing to do is cloning the repository you want to use.

Then use %cd to navigate to the directory you want.

⚠️ If you have run the clone command before, you will see a message saying that the files are already downloaded. You can restart your session (everything will be erased) or use the pull command to get the changes made on the repository before installing the requirements and the package itself.

The package I am trying to install depends on other packages to work. Let’s install these using the !pip command and the requirements.txt file.

Then, I will proceed to the install of the package itself. Use the command !python setup.py install to do it.

The last thing to do is importing your package!

# clone package repository
!git clone https://github.com/vallantin/atalaia.git

# navigate to atalaia directory
%cd atalaia

# get modifications made on the repo
!git pull origin master

# install packages requirements
!pip install -r requirements.txt

# install package
!python setup.py install

# import it
from atalaia.atalaia import Atalaia

Installing packages can have unexpected effects on your environment.

Different packages can have the same dependencies, but use different versions of them.

If you find problems, try to force/re-install the packages that are not working.

#!pip install --upgrade --force-reinstall **some_package_here**

Finally, import the other packages and pray for the best 🙈. Hopefully, you won’t spend one hour debugging as I did.

# do other imports to test if we have errors
import tensorflow as tf
import tensorflow_hub as hub
import tensorflow_datasets as tfds

from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

import numpy as np
import pandas as pd

from pprint import pprint

Conclusion: what we learned today

Importing external packages is useful when you need to use something you developed locally.

Since not everyone has an external GPU at home, training Deep Learning models on Colab is very handy.

Do you want to connect? It will be a pleasure to discuss Machine Learning with you. Drop me a message on LinkedIn.

Leave a Reply