CNLearn - Let's Make a Package

My previous code in this project was ugly, truly ugly. Why? Well I kept having import statements such as:

from src.search.textutils import extract_chinese_characters
from src.db.settings import SessionLocal

Terrible, absolutely terrible. Wouldn’t it be much nicer to have:

from cnlearn.db.settings import SessionLocal
from cnlearn.db.crud import (
    get_simplified_character,
    get_simplified_word_containing_char,
    get_simplified_word,
    get_word_and_character,
)

Ah, much more beautiful. So how do we accomplish that? We create a package! This is a big big topic, so we’ll just focus on what I’m doing for the current package. Please note that the packaging might change later on depending on what else we add. We are going to mostly base this on https://packaging.python.org/en/latest/tutorials/packaging-projects/, using the static setup.cfg with pyproject.toml “method” and using setuptools.

pyproject.toml

What is in this file? Well, for basic use we need to have:

[build-system]
requires = ["setuptools", "wheel"]
build-backend = "setuptools.build_meta"

What do those lines mean? Well [build-system] is just the section of the pyproject.toml file (there can be other sections as well as we will see when we add black and mypy) that tells Python what is being used to package our application and install it from source. With PEPs 517 and 518, what used to be in the setup.cfg/setup.py setup_required configuration field was added to pyproject.toml in order to list the build dependencies. The build-backend we won’t get into, but it’s related to inspecting the users machine in order to be able to change those requirements and is the name of the Python object that will create the build. There are other build systems including flit and poetry.

setup.cfg

What’s in here? There’s some basic metadata,

[metadata]
name = CNLearn
version = 0.0.2
author = Yours Truly
author_email = realEmail@fake.com
description = GUI/TUI maybe Chinese dictionary
long_description = file: README.md
long_description_content_type = text/markdown
license_files = LICENSE

with the name, version, author, email, description etc.

We then have some build options,

[options]
package_dir=
    =src
packages=find:
install_requires =
    sqlalchemy
    pydantic
    jieba

including package_dir which provides a mapping of packages to directory names. In this case, our package name is empty and so represents the root package, which is the src directory.

Packages is a list of Python import packages that we want to include inside the distribution package. By writing the find: directive we instruct the builder to automatically discover all packages and subpackages in [options.packages.find].

[options.packages.find]
where=src
include_package_data = True

Since we only have cnlearn in there, that’s what the list of packages will be.

The install_requires directive declares the required dependencies for our project. Since our project (so far) relies on SQLAlchemy, Pydantic and Jieba I have included those in there. I should really be putting the version numbers in there but I’ll add those later on when the project is a bit more complete.

What about our database file you might say? Well, actually, I would like to not provide the database file with the installation (and later on I will make it dynamically downloaded perhaps) but for now, if we were to build our package, the file would not get included and so our program would crash. How do we fix it then?

[options.package_data]
* = *.db

This is saying that for all the packages, include any *.db files.

Finally, I also want to ccreate a wheel for our setuptools based project. So let’s add:

[bdist_wheel]
universal=1

Universal means that I am building a pure Python wheel, which I am. I think?

Now let’s create a virtual environment where we will build and install our project into.

python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install setuptools
pip install wheel
python -m build

Successfully built CNLearn-0.0.2.tar.gz and CNLearn-0.0.2-py2.py3-none-any.whl

Yay!

Installing the package and running the tests

pip install dist/CNLearn-0.0.2-py2.py3-none-any.whl
cd tests
pytest

All good :)

Appendix A: Database Location Change

By the way, in src/cnlearn/db/settings.py, I had to change the path of the database file since it must be relative to the file wherever it is installed.

path = os.path.dirname(os.path.abspath(__file__))
db = os.path.join(path, 'dictionary.db')
SQLALCHEMY_DATABASE = "sqlite+pysqlite:////{db}".format(db=db)

Next time we’ll add the build process on the .github yml files (for the Actions workflow) and continue with the GUI.