This post is part 1 of the "Making a Python Package" series:
- Making a Python Package
- Making a Python Package II - writing docstrings
- Making a Python Package III - making an installable package
- Making a Python Package IV - writing unit tests
- Making a Python Package V - Testing with Tox
- Making a Python Package VI - including data files
- Making a Python Package VII - deploying
- Making a Python Package VIII - summary
Note: To get the material for this blog post, visit the v0.1 tag of Romans! Github project. To get it locally, run
# get the repo and put it in roman_package $ git clone https://github.com/kiwidamien/roman.git roman_package $ cd roman_package # now get the version of the repo corresponding to the steps in this article $ git checkout tags/v0.1
Making a Python Package I: Making a Roman numerals module
You have written some Python code that you want to be used in other projects. Maybe it is a way of recursively grabbing information off SoundCloud, or scraping people's date of birth from their Wikipedia page, or calculating the change in scores when permuting a single feature to estimate its importance. Maybe you have written an ETL (extract-tranform-load) pipeline, and we want to be sure that everyone is using the same definition and process.
To keep the example simple, we will try and convert some code that we have written that converts Roman numerals to integers (and vice-versa) into a package that we can
- use anywhere on our system by typing
- allow our colleagues to install
- allow anyone to install
by turning our function into a package.
If this is your first Python package, it may be the first time you are sharing code with the world at large. We will also go through some of the "best practices" you should follow, particularly when sharing your code with a wider audience.
By the end of this article, you will have
- Seen the functions we want to package
- Made a Python module (i.e. something you can import from the current directory only)
The original code for
During one of our projects, we wrote the following code in
roman.py to work with Roman Numerals:
# roman.py ROMAN_SYMBOLS = [ ('M', 1000), ('CM', 900), ('D', 500), ('CD', 400), ('C', 100), ('XC', 90), ('L', 50), ('XL', 40), ('X', 10), ('IX', 9), ('V', 5), ('IV', 4), ('I', 1) ] def roman_string_to_int(numeral_string): """ Converts a Roman numeral string to integer form """ total = 0 for symbol, value in ROMAN_SYMBOLS: while numeral_string.startswith(symbol): total += value numeral_string = numeral_string[len(symbol):] return total def int_to_roman_string(number): """ Converts a positive integer into a Roman numeral """ result = '' for symbol, value in ROMAN_SYMBOLS: result += (number//value) * symbol number = number % value return result
If we open Python or a jupyter notebook in this directory, we can import it without a problem:
>>> import roman >>> roman.int_to_roman_string(22) 'XXII'
If we had a different project somewhere else, Python would not be able to find
roman.py! You don't want to copy and paste this file to each directory it is going to be used, as over time you are likely to have several different versions of the file. After all, there could be bugs in the function that we have written!
We want to be able to install our "roman" functions so they can be accessed from anywhere. With this in mind, we create the following directory structure on our computer (it doesn't matter where):
roman_project +-- roman +-- roman.py +-- README.md
README.md contains any information you want to describe the
roman.py package. Ultimately you should be putting your project on Github, so it is accessible to the rest of the world (or at least your colleagues if you use private repos).
A module is a single
*.py file that contains some code we would like to import.
A package is a collection of modules in a directory. The way that Python tells that we have a package is if the directory contains a
__init__.py file. Even if that file is empty, it tells Python "this directory contains a collection of modules that are meant to be imported". To create the empty file, run
roman_project directory. Your directory structure should look like this:
roman_project +-- roman +-- __init__.py +-- roman.py +-- README.md
Importing our module
roman_project directory, we can try importing our file:
# Must lauch python from "roman_project"! >>> import roman # Success! Now try running one of the commands..... >>> roman.roman_string_to_int('V') Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: module 'roman' has no attribute 'roman_string_to_int'
It turns out we need to tell Python which file to look in. The following does work:
# Need to import from roman/roman.py? Use roman.roman >>> import roman.roman >>> roman.roman.roman_string_to_int('V') 5
Putting something useful in
import roman at the prompt goes to the
roman directory, and opens
__init__.py (which is blank). Typing
import roman.roman imports
roman/roman.py. We don't really want to call
roman.roman when we do imports. The way around this is to import our files in
__init__.py itself. Change
__init__.py to contain the following:
from .roman import roman_string_to_int, int_to_roman_string # Note .XXXXXX means "import XXXXXX.py from current directory" __version__ = '0.1.0' __author__ = 'Damien Martin'
This imports our functions into
__init__.py (which is read when we call
import roman). The
__version__ are used in the help docstring of the module, and when checking the version of the package.
Go back to the
roman_package directory, and start the Python interpreter. Now run the following:
>>> import roman # now this works! >>> roman.roman_string_to_int('V') 5 >>> help(roman)
You should see a help screen, with the author and version number set the way they were set in
A package with multiple files
In order to demonstrate how multiple files work, the version of this project on Github also has
roman/temperature.py. The temperature module contains functions that convert the temperature between Kelvin, Fahrenheit, and Celsius. While it doesn't really have anything to do with Roman numerals, it is a relatively easy example, and helps us understand how to deal with multiple modules in a package.
This article ends with the directory structure
roman_project +-- roman +-- __init__.py +-- roman.py +-- temperature.py +-- README.md
and the file
from .roman import int_to_roman_string, roman_string_to_int from .temperature import convert, convert_all __version__ = '0.1.0' __author__ = 'Damien Martin'
The downside to all of this work is that we can still only import our python module from
roman_project! We will correct that in article 3 of this series, but first we need to tidy up our docstrings!
Summary and next steps
So far we have made a local package, which is only importable from the current directory. To do this:
- We placed all the Python modules (
*.pyfiles) into the subdirectory
__init__.pyto the subdirectory
roman(to make it a package)
__init__.py, imported the files we want access to
__init__.py, added some metadata (namely the
In the next article in this series, we look at writing good docstrings for our functions.