Logging 101

Logs?

Applications should record information/events to help make debugging (and understanding) what a program is doing easier.

Typically this information is recorded into a log file.

But in some cases it is preferred to send this information to stdout because it then becomes the responsibility of the operating system/environment to determine where and how those logs are stored.

To quote 12factor:

A (service) never concerns itself with routing or storage of its output stream. It should not attempt to write to or manage log files. Instead, each running process writes its event stream, unbuffered, to stdout.

Applications should be conscious of the volume and quality of the logs they emit. Logging isn’t free, and high volume logs aren’t necessarily more useful than fewer, thoughtful, log messages. Include important context with your log messages to be able to better understand the state of the system at that time.

You may discover that some log information is better off not logged but recorded as time series metrics so they can be provided and analysed by tools such as Datadog. The benefit of doing this is that you can more easily measure those metrics (and gain more insights) over time.

It can also, depending on the tools you use to analyse your log data (e.g. external log analysis system, such as Papertrail), be better to log data in a more structured format in order to provide better contextual information and to make filtering logs easier. For more information on structured logging, I recommend this article inspired from “The Art of Monitoring”.

Levels

When recording log data, there are various ‘levels’ you can utilise that indicate different severities.

Below are some common log levels along with a short description that describes when you would/should expect to use that particular level…

Note: I saw these descriptions in a tweet a long time ago, and saved them off, and then months later when I’ve come to write this post I have since lost the link to the original tweet. If someone knows what it is, could you ping me on twitter and I’ll get the author credited appropriately.

Fatal

Wake me up at 4am on a Sunday.

Error

Apologize to the user and raise a ticket.

Warn

Make a note in case it happens again.

Info

Everything's fine, just checking in.

Debug

Fill my hard-drive with stack traces.

Quality

There are many “logging best practice” articles, below is a short bullet list pulled from Splunk.

I recommend reading through their article for the full details.

Use clear key-value pairs
Create events that humans can read
Use timestamps for every event
Use unique identifiers
Log in text format
- e.g. log an image’s attributes, but not the binary data itself
Use developer-friendly formats
- e.g. json
Log more than just debugging events
Use categories/levels
- e.g. info, warn, error, debug
Identify the source
Keep multi-line events to a minimum