Security with Python

TOC

Introduction

I recently implemented a Python library which acts as an abstraction layer on top of an existing security algorithm (in this case scrypt).

The motivation was for allowing teams to have a consistent experience utilising encryption (and hashing) in their applications and services without necessarily having to know the ins-and-outs of what’s important with regards to salts, key lengths etc.

Note: I always encourage people to understand what it is they’re doing, but in some cases that’s not always a practical mindset.

The library provides three functions:

  1. generate_digest
  2. decrypt_digest
  3. validate_digest

{{< adverts/pythonforprogrammers >}}

Before we start looking at the three functions provided by this library/interface, let’s very briefly talk about KDF and PBKDF2.

A KDF (Key Derivation Function) accepts a message + a key, and produces a digest for its output. They are designed to be more computationally intensive than standard hashing functions, and so they make it harder to use dictionary or rainbow table style attacks (as they would require a lot of extra memory resources and become more unfeasible as an attack vector).

By default the KDF will generate a random salt (thus output is non-deterministic) and have a maximum computational time of 0.5 (although this can be overridden using a maxtime argument, as we’ll see later).

A PBKDF2 on the other hand is able to provide deterministic output (as well as the ability to specify an explicit salt value). The internal implementation will repeat its process multiple times, thus reducing the feasibility of automated password cracking attempts (similar to a KDF).

I mention both of these (KDF and PBKDF2) because the generate_digest function I’ve written is a multi-arity function that will switch implementation based upon the provided arguments in the method signature.

Originally I had two separate functions to distinguish them a bit more clearly but realised if this library is to make life easier for developers who don’t understand encryption or hashing concepts, then I need to provide a single function that intelligently handles things internally.

Because KDF accepts a key and is able to return the original message (given the same key) it’s acting as a form of symmetrical encryption, whereas a PBKDF2 is more like a one-way hash function. Hence I named the function in this library generate_digest rather than something like encrypt_message which wouldn’t have made sense when dealing with PBKDF2.

generate_digest

This is a multi-arity function that will generate a digest using either a password-based key derivation function (KDF) or a PBKDF2 depending on the input given.

If a password argument is provided, then KDF will be used (along with a random salt) to generate a non-deterministic digest.

If a salt is provided, then a PBKDF2 will be used to generate a deterministic digest.

Note: salts should be a minimum of 128bits (~16 characters) in length. Also, when specifying a maxtime with generate_digest, ensure you include that same value when decrypting with decrypt_digest or validating via validate_digest.

decrypt_digest and validate_digest

The decrypt_digest and validate_digest functions only apply to digests that have been generated using a password (i.e. KDF). Given the right password decrypt_digest will return the original message, and thus is considered more a form of symmetrical encryption than a straight one-way hash function. The validate_digest function will return a boolean true or false if the given password was able to decrypt the message.

Dependencies

This abstraction library requires scrypt, which itself requires the following dependencies to be installed within the context of your service: build-essential, libssl-dev, and python-dev. If your service has a Dockerfile, adding these dependencies should be as simple as adding a line like the following:

RUN apt-get update && apt-get install -y build-essential libssl-dev python-dev

Usage

I suggest looking at the test suite (see below) to get an idea of how you would use the functions in this library.

Note: for a glossary of security terms, refer to this document.

Tests

Before we look at the implementation of the library, let’s take a moment to sift through its test suite.

Note: I named the library secure and have it running on a private PyPy instance. This code is made available via GitHub.

import pytest

from secure.interface import ArgumentError, generate_digest, validate_digest, decrypt_digest


message = "my-message"
password = "my-password"
salt = "my-salt-is-long-enough"


def test_generate_digest_with_both_a_password_and_a_salt():
    """Providing both a password and a salt should raise an exception."""

    with pytest.raises(ArgumentError):
        generate_digest(message, salt=salt, password=password)


def test_generate_digest_with_a_password():
    """Generating a digest with a password should be non-deterministic."""

    digest1 = generate_digest(message, password=password)
    digest2 = generate_digest(message, password=password)
    digest3 = generate_digest(message, password=password, maxtime=1.5)
    digest4 = generate_digest(message, password=password, maxtime=1.5)
    digest5 = generate_digest(message, password=password, maxtime=int(1))
    digest6 = generate_digest(message, password=password, maxtime=int(1))

    assert digest1 != digest2
    assert digest3 != digest4
    assert digest5 != digest6


def test_generate_digest_without_a_password():
    """Generating a digest without a password should be deterministic."""

    digest1 = generate_digest(message)
    digest2 = generate_digest(message)
    digest3 = generate_digest(message, salt=salt)
    digest4 = generate_digest(message, salt=salt)
    digest5 = generate_digest(message, length=128)
    digest6 = generate_digest(message, length=128)

    assert digest1 == digest2
    assert digest3 == digest4
    assert len(digest5) == len(digest6)


def test_generate_digest_with_different_salt_lengths():
    """Salts should be at least 128bits (~16 characters) in length."""

    generate_digest(message, salt=salt)

    with pytest.raises(ArgumentError):
        generate_digest(message, salt="too-short")

def test_validate_digest():
    """Validation only applies to digests generated with a password."""

    digest1 = generate_digest(message, password=password)
    digest2 = generate_digest(message, password=password)
    digest3 = generate_digest(message, password=password, maxtime=1.5)
    digest4 = generate_digest(message, password=password, maxtime=1.5)
    digest5 = generate_digest(message, password=password, maxtime=int(1))
    digest6 = generate_digest(message, password=password, maxtime=int(1))

    assert not validate_digest(digest1, 'incorrect-password')
    assert validate_digest(digest1, password)
    assert validate_digest(digest3, password, maxtime=1.5)
    assert validate_digest(digest5, password, maxtime=int(1))


def test_decrypt_digest():
    """Decryption is possible given the right password."""

    digest = generate_digest(message, password=password)

    assert decrypt_digest(digest, password) == message

Implementation

OK, time to see the library code itself.

Note: I like to use MyPy for type hinting.

import scrypt

from typing import Union


class ArgumentError(Exception):
    pass


def generate_digest(message: str,
                    password: str = None,
                    maxtime: Union[float, int] = 0.5,
                    salt: str = "",
                    length: int = 64) -> bytes:
    """Multi-arity function for generating a digest.

    Use KDF symmetric encryption given a password.
    Use deterministic hash function given a salt (or lack of password).
    """

    if password and salt:
        raise ArgumentError("only provide a password or a salt, not both")

    if salt != "" and len(salt) < 16:
        raise ArgumentError("salts need to be minimum of 128bits (~16 characters)")

    if password:
        return scrypt.encrypt(message, password, maxtime=maxtime)
    else:
        return scrypt.hash(message, salt, buflen=length)


def decrypt_digest(digest: bytes,
                   password: str,
                   maxtime: Union[float, int] = 0.5) -> bytes:
    """Decrypts digest using given password."""

    return scrypt.decrypt(digest, password, maxtime)


def validate_digest(digest: bytes,
                    password: str,
                    maxtime: Union[float, int] = 0.5) -> bool:
    """Validate digest using given password."""

    try:
        scrypt.decrypt(digest, password, maxtime)
        return True
    except scrypt.error:
        return False

Conclusion

Let me know what you think on twitter. Have fun.