Hashing, Encryption and Encoding
Introduction
I’ve written previously (and in-depth) on the subject of security basics, using tools such as GPG, OpenSSH, OpenSSL, and Keybase. But this time I wanted to focus in on the differences between encryption and hashing, whilst also providing a slightly more concise reference point for those already familiar with these concepts.
Terminology
OK, so using the correct terminology is essential and helps us to be explicit and clear with what we really mean.
hash function:
calculates a deterministic, irreversible, fixed-size alphanumeric string (based on input).message:
a message is the data (e.g. the ‘input’ provided to a hash function).digest:
the hexidecimal output generated by a hash function (contextually referred to as “checksum” or “fingerprint”).symmetric algorithm:
a cryptographic algorithm that uses the same key to encrypt and decrypt data.asymmetric algorithm:
a form of encryption where keys come in pairs (what one key encrypts, only the other can decrypt).integrity:
the message transported has not been tampered with or altered.confidentiality:
the communication between trusted parties is confidential.authenticity:
the communication is with who you expect it to be (not a man-in-the-middle).
For a longer “Security Glossary”, please see this Google doc I created.
Hashing vs Encryption
In essence:
- hashing: provides integrity.
- encryption: provides confidentiality.
Often cryptographic primitives need to be combined. For example, public-key cryptography uses RSA (a slow, but very secure algorithm) for communicating securely, while internally using AES (a faster, but less secure algorithm †) for encrypting data with a shared key, while using a hash function for generating a message digest to ensure both parties can verify the integrity of the payload sent/received.
† less secure in the sense that you have to share a secret key with the person you wish to communicate with, but that’s what public-key cryptography helps to secure.
Why use a hash function?
Hash functions (or more specifically their output: digests) can be used for many things, like indexing data in a hash table, fingerprinting (i.e. detecting duplicate data or uniquely identifying files), or as a checksum (i.e. detecting data corruption).
Message authentication (i.e. message integrity) involves hashing the message to produce a digest and encrypting the digest with the private key to produce a digital signature.
In order to verify this ‘signature’ the recipient of the encrypted message would need to compute a hash of the message, then decrypting the signer’s public key and comparing the computed digest against the decrypted digest sent within the encrypted message.
If the digest you generated is the same as the decrypted digest, then we can be sure the message was delivered unmodified whilst in transit (e.g. ‘man-in-the-middle’).
Base64 Encoding
Base64 is a way of taking binary data and transforming it into a text-based format. It is commonly used when there is a need to transfer the binary data over a medium that only supports textual data (e.g. you can Base64 encode images so they can be inlined into HTML).
How it works: Base64 encoding takes three bytes, each consisting of eight bits, and represents them as four printable characters in the ASCII standard.
Note: Base64 encoded strings are NOT secure.
Remember, it encodes data, not encrypt it.
MAC vs HMAC
A ‘MAC’ (Message Authentication Code) uses symmetrical cryptography with an encryption algorithm (such as AES †) to verify the integrity of a message, whereas a ‘HMAC’ will use a hash function (such as SHA256) internally instead of an encryption algorithm.
† encryption algorithms: AES (Advanced Encryption Standard), Blowfish, DES (Data Encryption Standard), Triple DES, Serpent, and Twofish.
Below is an example HMAC written in Bash and using the OpenSSL command-line tool.
function hmac {
digest="$1"
data="$2"
key="$3"
shift 3
echo -n "$data" | openssl dgst "-$digest" -hmac "$key" "$@"
}
The way you would use it is as follows:
hmac sha256 "message to be hashed" secret-key
Note: you can swap
sha256
for any supported digest algorithm (seeopenssl dgst -h
for details).
Which would generate the digest output:
44db14fe496c4bc4af5e8e6e3683e5db7acffa555897cf4b2b4345abaaf1ace3
Now because the implementation is using the openssl
command, you can also
choose to convert the hexidecimal output into binary and then Base64 encode that
binary output, like so:
hmac sha256 "message to be hashed" secret-key -binary | base64
Which outputs:
RNsU/klsS8SvXo5uNoPl23rP+lVYl89LK0NFq6rxrOM=
You don’t have to use an abstraction around the command obviously, you can just use:
cat plaintext.txt | openssl dgst -sha512 -binary | base64
Note:
base64
could be replaced with openssl’s base64 encoding command:openssl enc -base64 -A
Random Password Generation
Generating random passwords that are complex enough to make automated attacks
difficult can be a bit tedious, yet important. But if you install a program such
as pwgen
(brew install pwgen
) you’ll be able to generate random and complex
passwords very easily.
Once installed, add the following alias to your shell:
alias psw="pwgen -sy 20 1"
Now when you execute psw
you’ll get output that looks something like the
following:
|93<3(M;r?~40c$A@>{\
Hash Functions
There are many different ways of accessing a hash function, two options we’ll
look at will be using the executable shasum
(provided by macOS) and the
hashlib
package provided by the Python programming
language.
shasum
Let’s generate a hexidecimal digest of the message foobar
using the SHA512
hash algorithm:
echo -n foobar | shasum -a 512
Note: see
shasum -h
for all available algorithms.
Which outputs:
0a50261ebd1a390fed2bf326f2673c145582a6342d523204973d0219337f81616a8069b012587cf5
635f6925f1b56c360230c19b273500ee013e030601bf2425
hashlib
Let’s again generate a hexidecimal digest of the message foobar
using the
SHA512 hash algorithm, now using Python:
import hashlib
message = hashlib.sha512()
message.update(b"foobar")
print(message.hexdigest())
Which outputs the same digest as the shasum
command produced:
0a50261ebd1a390fed2bf326f2673c145582a6342d523204973d0219337f81616a8069b012587cf5
635f6925f1b56c360230c19b273500ee013e030601bf2425
cksum
Remember hash functions generate a digest of some message input, and one such use of that digest output is data corruption (i.e. a checksum).
The macOS also provides a cksum
command which let’s you generate a checksum,
like so:
echo foobar | cksum
Which outputs:
857691210 7
The first number is the checksum and the second number is the amount of data in bytes.
OpenSSH
OpenSSH provides secure and encrypted tunneling capabilities and is typically used to enable secure shell connections from your machine to external servers.
In order to generate a cryptographically secure key pair, execute the following command:
ssh-keygen -t rsa -b 4096 -C "your.email@domain.com"
This uses the RSA algorithm (which is the default, so the -t
can be omitted)
along with a key size of 4096 bits (the default is 2048).
The output of this command will be a public and private key pair.
It’s usually best to generate these keys (or at least move them when generated)
within the ~/.ssh
directory.
SSH Agent
One thing that catches me out all the time is when I open a new terminal tab or shell instance and I go to push up some code changes to a remote server only to discover an error saying I’m not authenticated. This is because the new terminal/shell instance doesn’t have the SSH agent running which is what makes my SSH key pair available.
This happens so often I’ve created an alias to make starting up the SSH agent and loading my SSH private key very quick and easy:
alias sshagent='eval "$(ssh-agent -s)" && ssh-add -K ~/.ssh/github_rsa'
Note: the use of the
-K
flag is macOS specific, it means it’ll add the key into the macOS keychain program.
OpenSSL
OpenSSL is designed to provide a method for securing web based communication (think HTTPS/TLS/SSL).
Note: for a full list of commands see:
openssl -h
andopenssl <command> -h
.
Key Exchanges
There are two popular key exchange algorithms:
- RSA
- Diffie-Hellman
For the specific details of each I recommend you read this post on the differences. In short RSA uses the person’s public key to encrypt the secret, while Diffie-Hellman uses a mathematical function to ensure only those two people communicating can calculate the secret based on the information that’s publicly available.
Generating a key pair
In order to generate a RSA based public/private key pair, execute the following commands:
# generate a private key
openssl genrsa -out private_key.pem 4096
# generate a public key, from the private key
openssl rsa -pubout -in private_key.pem -out public_key.pem
Encrypting and Decrypting
The following examples use symmetric encryption, and so you’ll be asked for a
secret key when encrypting and decrypting (although you could also use the
-pass
flag like so -pass pass:<your_password>
, yeah the syntax is odd and
it’s the same for decrypting):
# symmetric encryption (you'll be asked for a key)
echo foobar | openssl enc -aes-256-cbc -out message.enc
# decrypt that encrypted message
openssl enc -aes-256-cbc -in message.enc -d
Note:
.enc
is a commonly used format to indicate a file is encrypted (.asc
is specifically used for asymmetric encryption).
I’m passing in the message via stdin (when encrypting), but specifying a file
for the output (when decrypting), but you could use a file for both by
explicitly specifying the -in
and -out
flags to provide a text file instead.
Annoyingly with openssl
the same thing can be done a million different ways,
so (for example) you might also find that you can do the above without the
enc
portion of the command (and thus removing the -
prefix from the selected
algorithm):
# symmetric encryption
echo foobar | openssl aes-256-cbc -out message.enc
# decrypt that encrypted message
openssl aes-256-cbc -in message.enc -d
Encoding
You can also generate Base64 output of the encrypted data, by using the -a
flag like so:
$ echo foobar | openssl aes-256-cbc -a
U2FsdGVkX19/L0WtkvCNlpMiQnvD1SWGM19lm4m6xK4=
Note: see
man enc
for details
Salts
It’s also worth mentioning that the default behaviour for OpenSSL is to use a ‘salt’ when using encrypting the message. A salt is random data appended to your already hashed message and then that is hashed itself. In pseudo-code it would look like this:
$pwd = hash(hash($password) + salt)
You would then store the value of $pwd
in your database along with the salt
itself.
The security doesn’t come from obfuscating the salt, but more that a rainbow table attack can’t now automatically loop/check its collection of hashed passwords. An attacker would need to incorporate your (per-user) unique salt value into their check against a predetermined list of hashes, and they also wouldn’t know if the salt was prefixed or suffixed to the password itself. Making it computationally very expensive and time consuming to attempt.
You can also see that a salt is used by trying to read an encrypted file (cat message.enc
):
Salted__MJin¨MàÍ£?è,random¡:~randomW!5µõ
Asymmetrical Encryption
If you need to you can use a public key to encrypt data with (i.e. asymmetrical
encryption) by utilising the openssl rsautl
command, which stands for “RSA
Utility” and is commonly used to sign, verify, encrypt and decrypt data using
the RSA algorithm.
In the following example we have a file plaintext.txt
we encrypt using a
public key. It will now only be possible to decrypt the secret.enc
file if you
have the corresponding private key:
# encrypting
openssl rsautl -encrypt -pubin -inkey public_key.pem -in plaintext.txt -out secret.enc
# decrypting
openssl rsautl -decrypt -inkey private_key.pem -in secret.enc
Randomness
OpenSSL also offers a way to generate random binary data which you can then export as either hexidecimal or base64 formats:
Note: in the following examples,
64
is the number of bytes to be generated.
$ openssl rand 64
RR_wK[=q5}VrdMܾj{8(Ty]7;file://Integralist-MBPr/tmp
$ openssl rand 64 -hex
660baf33c189ced722a07c6a29d35a7e4584bb954c8c86f2cfd4ea8d892bff32fc188b0c56cbe0a5
6d60b628cdee697308b0cf3806cd95052b743bec5ccc5240
$ openssl rand 64 -base64
JIPU5SiCgKP3XVrnef1gY+PxjBvjdQgSN+OJoBAdWmCa/cRvDdFl01GQiSwFimQ5
1lVa/7hfYIK6Z5jjHNauaQ==
GPG
GPG is a tool which provides encryption and signing capabilities, and supports both symmetrical and asymmetrical encryption + digital signing of your encrypted content to ensure the integrity.
Generating a key pair
To generate a new GPG key pair you would execute the following command and interactively fill in the details:
gpg --gen-key
Automate
If you prefer to automate this you can create a file to contain the details and
pass that into the command-line instead. The following code generates a new
batch_file
that will contain the information we would otherwise have to enter
manually:
$ cat > batch_file <<EOF
%echo Generating a basic OpenPGP key
Key-Type: RSA
Key-Length: 4096
Subkey-Type: Default
Name-Real: Your Name
Name-Comment: Integralist testing
Name-Email: foo@example.com
Expire-Date: 0
Passphrase: foobar
%commit
%echo done
EOF
Once we have this file we can pass it along with the --gen-key
command:
$ gpg --gen-key --batch batch_file
gpg: Generating a basic OpenPGP key
gpg: key 4BCAEAAD199B5FE8 marked as ultimately trusted
gpg: directory '/Users/Integralist/.gnupg/openpgp-revocs.d' created
gpg: revocation certificate stored as '/Users/Integralist/.gnupg/openpgp-revocs.d/\
CFE96536285D83C990567BF64BCAEAAD199B5FE8.rev'
gpg: done
Now if we check our list of keys we’ll see the new one we just generated:
$ gpg --list-keys
/Users/Integralist/.gnupg/pubring.gpg
---------------------------------------
pub rsa4096 2018-02-17 [SCEA]
CFE96536285D83C990567BF64BCAEAAD199B5FE8
uid [ultimate] Your Name (Integralist testing) <foo@example.com>
sub rsa2048 2018-02-17 [E]
Revocation
When you generate a new key pair, if you intend on publishing your public key online, then you’ll want to generate a revocation certificate. Doing this will mean you can revoke your original key pair if your private key becomes compromised (or you just want to decommission it):
gpg --gen-revoke your.email@domain.com
When you’re ready to decommission it, just import the certifcate into your keyring:
gpg --import revocation.cert
You can then also push up your key identifier to a key server to force it to recognise the key has been revoked:
gpg --keyserver pgp.mit.edu --send-keys <key_id>
Asymmetrical Encryption and Decryption
In order to encrypt some data using someone elses public key (i.e. so only they can decrypt the data) you first need access to their public key and have it imported to your gpg keyring:
gpg --import public.key
If you want to verify the integrity of the public key you have acquired, then you should speak securely with the recipient who owns the public key and ask them to give you their digital ‘fingerprint’. You can then verify it matches what you have using the following command:
gpg --fingerprint <pub_key_id>
You’ll then look for the fingerprint in the gpg output. The fingerprint should look something like this:
FDFB E9B5 24BA 6972 A3AA 44B9 A1B1 7E6F DD86 E7F5
The command for encrypting a file plaintext.txt
using their public key would
be:
gpg --encrypt -u "Sender User Name" -r "Receiver User Name" plaintext.txt
As you’ve encrypted the file using that person’s public key, it means they can decrypt the file simply with:
gpg -d plaintext.txt.gpg
Symmetrical Encryption and Decryption
By default gpg uses the AES algorithm for its symmetrical encryption. The command to use is (you’ll be asked to provide a passphrase):
gpg --symmetric plaintext.txt
You can specify a different algorithm, as the default isn’t as secure as it could be. Let’s use a 256bit encryption key:
gpg --symmetric --cipher-algo AES256 plaintext.txt
Note: see
gpg --version
for all available ciphers
Signing keys
If you want to explicitly trust a public key you have imported, you can ‘sign’
it. You do this using the --sign-key
flag. Doing this can also be beneficial
for the owner of that public key (Bob), because if a friend of yours (Alice)
trusts you and they see you’ve signed Bob’s public key, then Alice is more
likely to trust Bob as well.
In order for Bob to benefit from this ‘web of trust’ you need to send him back his public key which you signed. Bob would need to import that version of his public key back into his gpg keyring, so that he can then republish it online for others to see the you trust him.
The following example demonstrates how you would export Bob’s public key, which you previously imported and signed:
gpg --export --armor bob@example.org
Note:
--armor
simply outputs the binary data as ASCII
Signing encrypted files
It can be useful to sign a file that you encrypt, so that the person who will decrypt the file can verify it was you who sent it to them, and also check that the integrity of the file is still intact.
Note: this provides a combination of authenticity and integrity (as defined within the terminology section)
You do this by using the --sign
flag:
gpg --local-user Bob --encrypt --recipient Alice --sign plaintext.txt
Note: I’m using
--local-user
because I have many different key pairs setup for testing.
This will generate a plaintext.txt.gpg
encrypted file.
The recipient (Alice), can either decrypt the file using Bob’s public key and
this will both decrypt and verify the signature, or Alice could just use the
--verify
flag if she didn’t want to decrypt the file.
$ gpg --verify plaintext.txt.gpg
gpg: Signature made Mon Feb 19 10:16:38 2018 GMT
gpg: using RSA key F2G91BE243E405E5B64B08A1CB5EBDB2561C861B
gpg: Good signature from "Bob <bob@example.com>" [ultimate]
Keybase
Keybase is a public-key directory that maps social media identities to encryption keys in a publicly auditable manner. Keybase offers an end-to-end encrypted chat and cloud storage system, called Keybase Chat and the Keybase filesystem.
In order to use the command-line tool keybase
you’ll need to register for an
account on their website.
To install keybase on macOS:
brew install keybase
Once installed you’ll need to login:
keybase login
At this point you can either generate a fresh key pair or select an existing gpg key pair:
# generate new key pair
keybase pgp gen
# select existing key pair
keybase pgp select
You can search for other keybase users:
keybase search sthulb
You can then encrypt data for another keybase user, like so:
keybase encrypt -i info.txt -o info.txt.asc sthulb
If you receive an encrypted file you can decrypt it, like so:
keybase decrypt -i info.txt.asc -o info.txt
If you receive an encrypted file (info.txt.gpg
) using your keybase pub key but
the senders not using keybase (e.g. they’ve encrypted the file using their own
gpg private key), then you’ll need to have their public key in your gpg keyring:
keybase pgp decrypt -i info.txt.gpg