CNLearn FastAPI - Logging, Logging and More Logging

We are going to add some logging to our project. How do we log? Well, we read Logging HOWTO first. Then we read it again and again until we really understand the concepts.

There are 3 things we are going to do in this (VERY LONG) post:

  1. Understand logging
  2. Add structured logging to our project (using structlog)
  3. Understanding how to set the logging used by uvicorn
  4. Profit (wait what?)

Logging Concepts

We are going to dive straight into the deep end and look at what the various part of logging are in Python’s logging library. There are four main things to understand: loggers, handlers, filters and formatters. The overall log flow is very well illustrated here in the Python documentation.

Loggers

The logger objects are what we interact with in our code. If we were to do:

import logging

logger = logging.getLogger()
special = logging.getLogger("special")

We would have two loggers: the root logger and a special logger with the name of “special”. Please note that if no name is passed to the getLogger function, the root logger is returned. Both of these variables, logger and special are references to two logger objects. One thing to note is that loggers are never instantiated directly, but always “created”/returned through the logging.getLogger function. That is because if you call that function with the same logger name, a reference to the same Logger object will always be returned regardless of where it gets used (i.e. in different modules for example).

Loggers can also be hierarchical, dot separated the same way a Python package would:

parentLogger = logging.getLogger("abc")
childLogger = logging.getLogger("abc.def")

A recommended construction for Python loggers is to use the module name when getting the reference to the logger object:

logger = logging.getLogger(__name__)

Ok let’s use our root logger (logger variable) and the special logger. I will be doing this in a Python REPL first (can be in IPython, doesn’t matter) first to illustrate a few concepts:

import logging

logger = logging.getLogger()
special = logging.getLogger("special")

logger.debug("debugging root logger")  # no output
special.debug("debugging special logger")  # no output

logger.info("root information")  # no output
special.info("special information")  # no output

logger.warning("root warning")  # outputs: root warning
special.warning("special warning")  # outputs: special warning

Ok so the first two didn’t output anything, the last two did. Why? We’ll come back to that. We need to look at logging levels first and then at handlers.

Log Levels

Not all log records are critical. Some you might want to see when debugging your application, some you might want to always see.

╔══════════╦═══════╗
║  Level   ║ Value ║
╠══════════╬═══════╣
║ NOTSET   ║     0 ║
║ DEBUG    ║    10 ║
║ INFO     ║    20 ║
║ WARNING  ║    30 ║
║ ERROR    ║    40 ║
║ CRITICAL ║    50 ║
╚══════════╩═══════╝

When we use logger.warning(message) we are saying that the LogRecord created has a level of WARNING (value 30). Then, the following things can have levels:

  • the logger -> if the logger level is higher than the LogRecord level, the record will be ignored by the logger
  • the handler -> if the handler level is higher than the LogRecord level, the record will be ignored by the handler

If we look at the levels of our two loggers above:

print(logger.level)  # outputs: 30 (i.e. WARNING)
print(special.level)  # outputs 0 (i.e. NOTSET)

we see that the root logger has an explicit level set (WARNING by default) but the special logger had a value of 0 (i.e. NOTSET). We saw that when we used the root logger to emit debug and info logs, nothing was seen (as 10 and 20 levels were not higher than 30). The special logger level was 0 though. Why did we not see DEBUG and INFO but we did see WARNING? To answer that, we need to learn about handlers.

Handlers

Handlers send the log records (from the loggers) to a destination (could be your stdin/stderr, console, file, network, etc.). They look at the log messages’ severity and then decide whether to continue with emitting the log message. If the severity is greater or equal to the handler’s level, it gets emitted (please note we are ignoring filters for now, we’ll cover those at the end).

There are a bunch of handlers defined in the standard library for your convenience. The one you’re most likely to use is the StreamHandler and it sends your log output to stdout, stderr or any other file-like object. There’s also a useful one for library developers, the NullHandler which sends your log messages to…nowhere (or you can close your eyes!?)

How do handlers “emit” the log messages? Well, the LogRecords must first be formatted by….Formatters!

Formatters

Formatters take a log record and format it to the format you specify. For example, if you initialise a formatter as

import logging

formatter = logging.Formatter(fmt="%(name)s %(levelname)s %(message)s")

and you set it as the formatter for a handler, add the handler to a logger and log a message (assuming the level of logger and handler is less or equal to the level of the message). Some of the things I wrote just now will make more sense when we go through a complete logging example.

logger.info("For your information")

the output will be root INFO For your information. The root part is the name of the logger (remember if there’s no name, it’s the root logger). INFO is the level name and For your information is of course the message we logged. Where did I come up with %(name)s, %(levelname)s and %(message)s? These are all LogRecord attributes and there’s a lot more. I also use the %(NAME)CONVERSIONTYPE format where % is the formatting style (can also be { for using {NAME} or $ using string template substitutions). Most often, you’ll be using the % style where NAME refers to one of the LogRecord attributes and CONVERSIONTYPE is one of the Python string conversion types but fear not, they tell you what it should be in the LogRecord attributes table.

Filters

I’ve ignored filters this entire time! Mostly because I just haven’t used them much (i.e. at all) but for completeness, let’s cover them as well. In an earlier section, I mentioned that we can add levels (and as such, level filtering) to both loggers and handlers. What if we want to do some further filtering? Maybe we want to filter based on whether the log was created on a line with an even/odd line number, or perhaps we want to filter based on if it occurred outside of midnight-6am (doctors recommend log handlers get at least 6 hours of sleep every night).

Filters (despite their name and what that implies) can be used for a lot more. Filters can actually modify the LogRecord that they receive, which means we could add extra attributes to them. For example, in an HTTP request, we might want to add the IP of the user, or if they’re logged in, their username. Those extra attributes can then be accessed by the Formatter (e.g. %(ip)s, %(what_the_user_had_for_lunch)s, %(lunch_money_amount)f etc). The Filter is actually a lot more powerful than just filtering and (I assume) some libraries might take advantage of that (although this is something I’ll need to confirm.)

Logging Example (and configuring logging)

There are three (maybe 4 if we look at basic config, although technically that’s still programmatically) main ways you can configure logging:

Let’s look at one example and how it would be set up in each of the cases. We have the following requirements:

  • loggers in each of the modules we have (easy, we’ll do logger = logging.getLogger(__name__))
  • if they are INFO and above, we want them to be written to a file for recording in one format (asctime, levelno, logger name and message)
  • for any that are DEBUG and above, we want to write them to std err (i.e. console) in some other format (asctime, levelno, logger name, lineno and message) The format difference is that the console handler also includes the line number.

Ok so sounds like we need two formatters, we need two handlers (one console one file). Let’s start by doing it programmatically. Let’s have a secret_functions.py file and a main.py file. In the secret_function.py file, we will have the following:

import logging

logger = logging.getLogger(__name__)


def secret_message():
    logger.debug("Secret function is about to run and we are debugging.")
    print("hi")
    logger.info("Secret function just ran and we are informing.")

We will import this function in our main.py file and call the function. Notice we are also getting a reference to the ‘secret_functions’ logger by using logger = logging.getLogger(__name__). Note that we’re not configuring anything with this logger: no handlers, no formatters, no anything. In fact, if we look at its attributes (using logger.__dict__), we get the following:

{
  'filters': [],
  'name': 'secret_functions',
  'level': 0,
  'parent': <RootLogger root (DEBUG)>,
  'propagate': True,
  'handlers': [],
  'disabled': False,
  '_cache': {10: True},
  'manager': <logging.Manager object at 0x102fccf90>
}

We see it’s parent is the RootLogger (with a DEBUG level, we’ll look at its setup in just a moment). The important thing to note is that it has no level set and no filters/handlers AND that propagate is True. As per the documentation:

Child loggers propagate messages up to the handlers associated with their ancestor loggers. Because of this, it is unnecessary to define and configure handlers for all the loggers an application uses. It is sufficient to configure handlers for a top-level logger and create child loggers as needed. (You can, however, turn off propagation by setting the propagate attribute of a logger to False.)

This means that we can just create/reference different loggers modules as we wish and let them propagate to their parent/ancestors loggers and let those handle the …handling. If we, however, set propagate of this logger as False, it wouldn’t get logged. Ok now let’s look at our root logger and how we configure it.

Programmatically (is that the right word?)

Programmatically means we create the loggers, handlers, formatters in the code and then combine them together. We set up the root logger by calling getLogger with no argument and that returns a reference to the root logger. We set it’s level as DEBUG so that only DEBUG or higher are passed through to its handlers.

import logging
from secret_functions import secret_message

# let's create our console handler
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)

We then create our two handlers:

  • a file handler that will write to application.log and have a level of INFO
  • a console stream handler that will output to stderr/out and have a level of DEBUG
# create file handler which logs INFO messages
fh = logging.FileHandler("application.log")
fh.setLevel(logging.INFO)
# create console handler with a debug log level
ch = logging.StreamHandler()
ch.setLevel(logging.DEBUG)

We then create two formatters: formatter and formatter with line no, the latter containing the line number as well. We use some of the log record attributes we mentioned earlier, the full list can be found here.

# create formatter and add it to the handlers
formatter = logging.Formatter("Time: %(asctime)s - Level: %(levelname)s - LoggerName: %(name)s - Message: %(message)s")
formatter_with_line_no = logging.Formatter(
    "Time: %(asctime)s - Level: %(levelname)s - LoggerName: %(name)s - LineNumber: %(lineno)s - Message: %(message)s"
)

We set the formatters on the correct handlers and add the handlers to the root logger. We then use the root logger to log some messages: one with debug level and one with info level. We also call our secret message function which will also log a message with debug level and one with info level.

fh.setFormatter(formatter)
ch.setFormatter(formatter_with_line_no)
# add the handlers to the logger
logger.addHandler(fh)
logger.addHandler(ch)

logger.info("This is a info message from main")
logger.debug("This is a debug message from main")

secret_message()

Ok so to summarise we have:

  • a root logger with level DEBUG with two handlers
  • the console handler will handle messages with DEBUG and above
  • the file handler will handle messages with INFO and above
  • the console and file handlers have slightly different formatters (the console handler also includes a line number)

Now let’s run this. A simple python main.py will suffice.

We will get the following in our terminal:

Time: 2023-08-26 20:36:32,914 - Level: INFO - LoggerName: root - LineNumber: 24 - Message: This is a info message from main
Time: 2023-08-26 20:36:32,914 - Level: DEBUG - LoggerName: root - LineNumber: 25 - Message: This is a debug message from main
Time: 2023-08-26 20:36:32,914 - Level: DEBUG - LoggerName: secret_functions - LineNumber: 7 - Message: Secret function is about to run and we are debugging.
hi
Time: 2023-08-26 20:36:32,914 - Level: INFO - LoggerName: secret_functions - LineNumber: 9 - Message: Secret function just ran and we are informing.

and we will have the following in our application.log file:

Time: 2023-08-26 20:36:32,914 - Level: INFO - LoggerName: root - Message: This is a info message from main
Time: 2023-08-26 20:36:32,914 - Level: INFO - LoggerName: secret_functions - Message: Secret function just ran and we are informing.

So what happened? Let’s start with the file logs. We only see the INFO level logs. One of them is from the root logger (in our main.py file) and one from our secret_functions logger in the secret_functions.py file. In the console, we saw the same two INFO level messages BUT we also saw the DEBUG ones.

Nice! Ok we can move on to configuring the exact some loggers, but this time in a config file.

Config File

According to the documentation on the configuration file format, which we all read every day when we wake up and before we sleep of course, the configuration file format that it expects is based on the ConfigParser ones. While we won’t get into the details of that too much, we’ll look at the parts that we need. It must contains the following mandatory sections: [loggers], [handlers], [formatters]. For each logger, handler or formatter that we use in those sections, there must be a corresponding section depending on the entity we are referencing. For example, if we specify there is a logger named hello, there must be a [logger_hello] section. If we reference a consoleHandler handler, there must be a [handler_consoleHandler] section. If we reference a formatter_with_line_no formatter, there must be a [formatter_formatter_with_line_no] section. Ok let’s start with the loggers in a logging.ini file.

[loggers]
keys=root

Since we have a root logger in the keys for the loggers, we will need a corresponding [logger_root] section. For our handlers, we have:

[handlers]
keys=consoleHandler,fileHandler

Since we have those two handlers, we will need a [handler_consoleHandler] section and a [handler_fileHandler] section respectively. For our formatters, we have:

[formatters]
keys=formatter,formatter_with_line_no

Since we have those two formatters, we will need a [formatter_formatter] section and a [formatter_formatter_with_line_no] section respectively. Now let’s finish defining the logger, the handlers and the formatters.

[logger_root]
level=DEBUG
handlers=consoleHandler,fileHandler

[handler_consoleHandler]
class=StreamHandler
level=DEBUG
formatter=formatter_with_line_no
args=(sys.stdout,)

[handler_fileHandler]
class=FileHandler
level=INFO
formatter=formatter
args=("application.log",)

[formatter_formatter]
format=Time: %(asctime)s - Level: %(levelname)s - LoggerName: %(name)s - Message: %(message)s"

[formatter_formatter_with_line_no]
format=Time: %(asctime)s - Level: %(levelname)s - LoggerName: %(name)s - LineNumber: %(lineno)s - Message: %(message)s

Please note we specified the arguments ofr the two handlers. Otherwise, pretty much it’s the same as doing programmatically. Let’s use it!

# main_file.py
import logging.config
from secret_functions import secret_message

logging.config.fileConfig("logging.ini")

logger = logging.getLogger()

logger.info("This is a info message from main")
logger.debug("This is a debug message from main")

secret_message()

We do python main_file.py and we get:

Time: 2023-09-05 19:02:09,725 - Level: INFO - LoggerName: root - LineNumber: 8 - Message: This is a info message from main
Time: 2023-09-05 19:02:09,726 - Level: DEBUG - LoggerName: root - LineNumber: 9 - Message: This is a debug message from main
hi

Uh..what happened? Where did our secret_functions module go? Well, sometimes it’s good to read the documentation. One of the arguments for fileConfig(…) is:

disable_existing_loggers – If specified as False, loggers which exist when this call is made are left enabled. The default is True because this enables old behaviour in a backward-compatible way. This behaviour is to disable any existing non-root loggers unless they or their ancestors are explicitly named in the logging configuration.

by default, that’s True. Since we already imported the secret_message function from the secret_functions module, that logger already existed when this call was made. The default configuration disables that so we don’t see it. If we change it to:

logging.config.fileConfig("logging.ini", disable_existing_loggers=False)

you’ll get the same output we got earlier. The following in our terminal:

Time: 2023-08-26 20:36:32,914 - Level: INFO - LoggerName: root - LineNumber: 24 - Message: This is a info message from main
Time: 2023-08-26 20:36:32,914 - Level: DEBUG - LoggerName: root - LineNumber: 25 - Message: This is a debug message from main
Time: 2023-08-26 20:36:32,914 - Level: DEBUG - LoggerName: secret_functions - LineNumber: 7 - Message: Secret function is about to run and we are debugging.
hi
Time: 2023-08-26 20:36:32,914 - Level: INFO - LoggerName: secret_functions - LineNumber: 9 - Message: Secret function just ran and we are informing.

and we will have the following in our application.log file:

Time: 2023-08-26 20:36:32,914 - Level: INFO - LoggerName: root - Message: This is a info message from main
Time: 2023-08-26 20:36:32,914 - Level: INFO - LoggerName: secret_functions - Message: Secret function just ran and we are informing.

Dict Config

Now let’s see how we configure it in a dictionary.

logging_config_dict = {
    "version": 1,
    "disable_existing_loggers": False,
    "formatters": {
        "formatter": {
            "format": "Time: %(asctime)s - Level: %(levelname)s - LoggerName: %(name)s - Message: %(message)s",
        },
        "formatter_with_line_no": {
            "format": "Time: %(asctime)s - Level: %(levelname)s - LoggerName: %(name)s - LineNumber: %(lineno)s - Message: %(message)s"
        },
    },
    "handlers": {
        "console_handler": {
            "level": "DEBUG",
            "class": "logging.StreamHandler",
            "formatter": "formatter_with_line_no",
        },
        "file_handler": {
            "level": "INFO",
            "class": "logging.FileHandler",
            "formatter": "formatter",
            "filename": "application.log",
        },
    },
    "loggers": {
        "root": {
            "level": "DEBUG",
            "handlers": ["console_handler", "file_handler"],
        },
    },
}

It’s very similar to the file one, just without the section naming. Instead, there’s a hierarchy as you’d expect: a formatters key, a handlers key, a loggers key. Within each one, you can have other keys to specify whatever you need. There’s not much else we’re going to cover here EXCEPT for the following part. Although we’re not using that in these example, you can specify a custom handler or a custom formatter. Taking from here,

where a ‘configuring dict’ is mentioned, it will be checked for the special ‘()’ key to see if a custom instantiation is required.

In the User-defined objects section, gives an example for a custom formatter specified in a dict config for logging.

{
  '()' : 'my.package.customFormatterFactory',
  'bar' : 'baz',
  'spam' : 99.9,
  'answer' : 42
}

If the special ‘()’ key contains a callable, e.g. '()': SomeClass, then that instatiation will be done directly: SomeClass(bar='bar', spam=99.9, answer=42)'. If, however, a string is specific, normal import mechanisms will be used to locate the callable and then instantiate. The example above is equivalent to my.package.customFormatterFactory(bar=‘baz’, spam=99.9, answer=42)`.

(Extra) YAML Config

Since we looked at dict config, let’s look at yet another way of specifying it, in a YAML file. Please note you need to install pyyaml to use that.

version: 1
disable_existing_loggers: false
formatters:
  formatter:
    format: 'Time: %(asctime)s - Level: %(levelname)s - LoggerName: %(name)s - Message: %(message)s"
'
  formatter_with_line_no:
    format: 'Time: %(asctime)s - Level: %(levelname)s - LoggerName: %(name)s - LineNumber: %(lineno)s - Message: %(message)s'
handlers:
  consoleHandler:
    class: logging.StreamHandler
    level: DEBUG
    formatter: formatter_with_line_no
    stream: ext://sys.stdout
  fileHandler:
    class: logging.FileHandler
    level: INFO
    formatter: formatter
    filename: application.log
loggers:
  root:
    level: DEBUG
    handlers: [consoleHandler,fileHandler]
    propagate: no

Won’t sasy too much about this, the format is similar to the dict format. In fact, that’s why the logging is configured as follows:

import logging.config

import yaml
from secret_functions import secret_message

with open("logging.yaml", "r") as f:
    log_cfg = yaml.safe_load(f.read())
    logging.config.dictConfig(log_cfg)

The yaml is read and then used in dictConfig.

So now you know how logging works in Python. There is sooo much more to explore, and to be honest, you could write a book on this. Or you know, read the documentation! (I should too…)

Adding structlog (structured logging) to CNLearn

What is structured logging?

You’ve probably seen logs like:

Time: 2023-08-26 20:36:32,914 - Level: INFO - LoggerName: secret_functions - Message: Secret function just ran and we are informing.

Now imagine there were a billion of those. And you wanted to find what the message was that one time on what was it? August 26th? 27th? 255th? What was the exact name? Was it a INFO or a WARNING? I hope you can see where I’m going with this. Having them in this format is great for seeing them live as they happen, but not that great for searching and finding. A structured log is a log that has a structure that allows easier searching and filtering. Can be in any structured format (XML, JSON, if I can think of another one I’d write it here, etc) but normally you’ll see them in JSON format. OK so let’s add that to our project.

Now that you’re an expert in everything logging, we’re going to be using Structlog for our application to have structured (JSON) logs. We’re not going to go over everything from structlog as they have a thorough documentation but we’ll still explain some of the major concepts.

Bound Loggers

When we did logger = logging.getLogger() remember we said that what you’re getting back is a a reference to a logger, you don’t “instantiate it” directly. We then use that to log various messages that get formatted and handled by handlers.

In structlog, you will use logger = structlog.get_logger(). I do wonder why they didn’t stick to the same method name (and while wondering, say this in the documentation: If you prefer CamelCase, there’s an alias for your reading pleasure: structlog.getLogger.) but anyway, that returns a bound logger. It’s called that because you bind key/value pairs to it, think of it as progressively building up a dictionary. And in fact, a bound logger is made up of three parts:

  • a context dictionary that is received by each log entry logged from this logger specifically. I.e. anything that’s in the common context dictionary gets merged with whatever else you add
  • a list of processors that each log entry goes through. Each one of them takes the previous’ output/return value and returns something to the following one in the last.
  • a wrapped logger: this is responsible for outputting whatever log entry was returned by the last processor.

The original bound logger or the context dictionary is never mutated. Everytime you pass it through a processor, a copy of the bound logger is returned with new context with whatever you’ve added.

Finally, the method of the wrapped logger is called: if we called logger.info, the info method of the wrapped logger will be called. Normally, unless configured otherwise, there will be a FilteringBoundLogger (which filters by level) wrapped around a PrintLogger (which prints events to a file, keep in mind that by default that is sys.stdout, i.e. whatever that was configured for your process, which is 99.999% your console). When that method (i.e. debug, info, warning, error, critical) is called, the bound logger makes a copy of its context becoming the event dictionary. If you passed it any keyword arguments, those get added to the event dict. A “event” key is added to the event dict where the value is the first positional argument of the method call (i.e. the “message”). If you have a list of processors, they run successively on the event dictionary with each one receiving the previous processor’s result. Once it passes the final processor, the method with the same name on wrapped logger is called.

Let’s look at a quick example together with the default configuration.

import structlog

logger = structlog.get_logger()

logger.info("hi")
logger.debug("debugging")
logger.warning("uh oh warning")
logger.error("there is an eror")
logger.critical("shutting down")

The output will be as follows:

2023-10-02 21:53:41 [info     ] hi
2023-10-02 21:53:41 [debug    ] debugging
2023-10-02 21:53:41 [warning  ] uh oh warning
2023-10-02 21:53:41 [error    ] there is an eror
2023-10-02 21:53:41 [critical ] shutting down

Notice that by default, they are all shown. There is no filtering done. Didn’t I say the default bound logger is a FilteringBoundLogger. Yes, but by default, there is no filtering that’s done).

What if we did want to add some level filtering so that only warning and above went through? Well, we’d have to configure structlog.

import logging
import structlog

structlog.configure(wrapper_class=structlog.make_filtering_bound_logger(logging.WARNING))
logger = structlog.get_logger()

logger.info("hi")
logger.debug("debugging")
logger.warning("uh oh warning")
logger.error("there is an eror")
logger.critical("shutting down")

Now, when we run this file, we will only see the warning, error and critical logs.

We also mentioned binding additional context data to the bound logger using the bind method. Let’s see it in action.

import logging
import structlog

logger = structlog.get_logger()
logger = logger.bind(writing="the post")

logger.info("hi")
logger.debug("debugging")

The output we see will also contain whatever we bound itno the context dictionary.

2023-10-02 22:35:04 [info     ] hi                             writing=the post
2023-10-02 22:35:04 [debug    ] debugging                      writing=the post

Now that we’ve seen structlog in action, let’s look at how it gets configured.

Structlog Configuration

How do we configure structlog and what aspects can we configure? We need to call structlog.configure() when our application starts.

Wrapper Classes

The wrapper class is the class that’s used for wrapping loggers. Remember we said that the bound logger that we get when we call get_logger is a bound logger, they don’t do any I/O themselves. They manage the context but only proxy log calls to the underlying wrapped logger. By default the type of bound logger that we get is structlog.BoundLogger. If you choose another one, they are all subtypes of BindableLogger (implemented as Protocols) and some of the options include:

  • FilteringBoundLogger - BindableLogger that filters by level (we saw that earlier, can be instantiated with make_filtering_bound_logger)
  • structlog.stdlib.BoundLogger - the standard lbrary version of structlog.BoundLogger for convenience if integrating with standard library logging
  • a Twisted specific version structlog.twisted.BoundLogger -> won’t go into detail in this one, haven’t really used Twisted myself

We’ll look at using FilteringBoundLogger and maybe the stdlib.BoundLogger .

Logger Factories

The wrapped logger is the one that was wrapped by the bound logger, which proxied its called to the wrapped logger. The logger factory returnss the logger that gets wrapped and returned. By default, it’s just a function that returns a logger. Then, when we call structlog.get_logger("name of my logger"), that logger factory will be used to instantiate the wrapped logger. By default, it is the PrintLoggerFactory which returns a PrintLogger. Just like the wrapper class, it can be used to return a stdlib Logger as well as a twisted Logger. There are other factories including WriteLoggerFactory, BytesLoggerFactory,

Processors

Finally, we look at processors. They are just callables and structlog uses processors extensively to process the event dictionaries. Each processor passes the return value to the next processor until the final ones passes its return value to the wrapped logging method. It’s a chain of functions (callables), each one of which receiving three positional parameters: logger, method_name and event_dict. The logger parameter is the wrapped logger object, the method_name is the log method called (i.e. debug, info, warning, etc.) and the event_dict is the current context combined with the current event. It’s worth notting that the wrapped logger only gets the return value from the last processor. That means that regardless what we do to the event_dict in the previous ones, if the last processor decides to do its own thing, it’s only the last one that will appear. Some processor chaining examples can be seen here.

The last processor has an extra job: it needs to transform the event_dict to a format that the wrapped logger knows how to process. THat could be a str, bytes or byets array, but can also be a (args, kwargs) that gets passed as log_method(*args, **kwargs) and can also just be a dictionary passed as log_method(**kwargs). We’ll be using the JSONRenderer mostly.

CNLearn Logging

With that short introduction, let’s first discuss how we added logging to CNLearn and then look at the code implementation. It’s worth mentioning that uvicorn is used as the server for now. For the Development environment, we use a ConsoleRenderer whereas for Staging/Production we use the JSONRenderer. We configure the logging at start up. For the uvicorn logging that it does (showing you which endpoints were accessed when), we use a middleware that will call structlog to log the same things (mostly because I still couldn’t get the uvicorn logger to work the way I wanted with structlog). Summary done, let’s look at implementation.

def configure_logging() -> structlog.BoundLogger:
    common_processors: list[Processor] = [
        structlog.contextvars.merge_contextvars,
        structlog.processors.add_log_level,
        structlog.stdlib.add_logger_name,
        structlog.stdlib.PositionalArgumentsFormatter(),
    ]
    logging.basicConfig(
        format="%(message)s",
    )
    if settings.ENVIRONMENT == "Development":
        logging.getLogger().setLevel(logging.DEBUG)
        structlog.configure(
            processors=common_processors
            + [
                structlog.processors.TimeStamper(fmt="%Y-%m-%d %H:%M.%S"),
                structlog.processors.StackInfoRenderer(),
                structlog.dev.ConsoleRenderer(),
            ],
            context_class=dict,
            logger_factory=structlog.stdlib.LoggerFactory(),
            wrapper_class=structlog.make_filtering_bound_logger(logging.DEBUG),
            cache_logger_on_first_use=False,
        )
    else:
        logging.getLogger().setLevel(logging.INFO)
        structlog.configure(
            processors=common_processors
            + [
                structlog.processors.TimeStamper(fmt="iso"),
                structlog.processors.JSONRenderer(serializer=orjson.dumps),
            ],
            logger_factory=structlog.stdlib.LoggerFactory(),
            context_class=dict,
            wrapper_class=structlog.make_filtering_bound_logger(logging.INFO),
            cache_logger_on_first_use=False,
        )

    log: structlog.BoundLogger = structlog.get_logger()
    log.info("Starting logging")

    return log

We start by defining what our common processors will be regardless of the Environment we’re in. They will later be supplemented by a few more. These include:

  • merging the context vars (we didn’t discuss this in detail, more info here but the gist is that it allows us to have some context local to the current execution context. For our web application, think of it as context local to the current request. We use this in our middleware to add a request id for log tracing)
  • adding the log level
  • adding the logger name
  • processes and formats any positional arguments passed to the log method

Since our logger_factor is a standard library logger factory (will still attempt to pipe uvicorn through structlog at some point) and our wrapper class is a filtering bound logger which only logs some min_level or higher, we also specify our standard library logging configuration. I leave it with a basic config of just outputting message (since that’s what our final processor will send to it).

For Development, I set the root logger level as DEBUG. For others, it’s set as INFO. For development we add a easily readable timestamper processor (by specifying the time format), as well as any stack info and output to a console renderer. For Production, we add a ISO timestamper and a JSON renderer with the “fastest” JSON library in Python orjson (or so it claims).

Finally, I bind the logger and return it. In our create_application:

def create_application() -> FastAPI:
  ...
  logger = configure_logging()

  ...
  app.add_middleware(AccessLogger, logger=logger)
  ...
  return app

Let’s look at the AccessLogger middleware and the logger we passed to it.

import time
from uuid import uuid4

import structlog
from starlette.middleware.base import (
    BaseHTTPMiddleware,
    DispatchFunction,
    RequestResponseEndpoint,
)
from starlette.requests import Request
from starlette.responses import Response
from starlette.types import ASGIApp


class AccessLogger(BaseHTTPMiddleware):
    def __init__(
        self, app: ASGIApp, dispatch: DispatchFunction | None = None, *, logger: structlog.BoundLogger
    ) -> None:
        self.logger: structlog.BoundLogger = logger
        super().__init__(app, dispatch)

    async def dispatch(self, request: Request, call_next: RequestResponseEndpoint) -> Response:
        structlog.contextvars.clear_contextvars()
        structlog.contextvars.bind_contextvars(
            request_id=str(uuid4()),
        )
        start_time = time.time()
        response = await call_next(request)
        duration = time.time() - start_time
        method = request.method
        path = request.url.path
        port = request.url.port
        scheme = request.url.scheme
        query_params = str(request.query_params)
        path_params = request.path_params
        client_host = request.client.host if request.client is not None else ""
        client_port = request.client.port if request.client is not None else ""
        http_version = request.scope["http_version"]
        status_code = response.status_code
        url: str = path
        if request.scope.get("query_string"):
            url += f"?{request.scope['query_string'].decode('ascii')}"

        logged_dict = {
            "duration": duration,
            "url": str(request.url),
            "method": method,
            "path": path,
            "scheme": scheme,
            "port": port,
            "query_params": query_params,
            "path_params": path_params,
            "client_host": client_host,
            "client_port": client_port,
            "http_version": http_version,
            "status_code": status_code,
        }
        self.logger.info(
            f'{client_host}:{client_port} - "{method} {url} HTTP/{http_version}" {status_code}',
            request=logged_dict,
        )
        return response

First thing to note is that it’s an ASGI middleware that we inherit from starlette’s BaseHTTPMiddleware. It has a `dispatch’ method we need to overwite. In that method, we:

  • clear any context vars we have
  • bind a UUID4 as the request’s unique id
  • we then get a bunch of attributes from the request object
  • if there’s a query string, we also add that
  • we then info log a message in uvicorn log style and we pass an additional dictionary to our logger to be used as context.

And thaaat’s it. What does this look like? Well, in the Development environment, we get some example logs like:

2023-11-23 06:51.41 [info     ] 127.0.0.1:50061 - "GET /api/v1/vocabulary/get-character?character=%E8%B4%BC&words=false HTTP/1.1" 200 [app.middleware.logger] request={'duration': 0.0049211978912353516, 'url': 'http://localhost:8000/api/v1/vocabulary/get-character?character=%E8%B4%BC&words=false', 'method': 'GET', 'path': '/api/v1/vocabulary/get-character', 'scheme': 'http', 'port': 8000, 'query_params': 'character=%E8%B4%BC&words=false', 'path_params': {}, 'client_host': '127.0.0.1', 'client_port': 50061, 'http_version': '1.1', 'status_code': 200} request_id=298ef303-3815-4de1-bac6-7d00007c531d

Finally, the commit for this post is here.