RSA 2018 - Poison Pixels!

Steganography is data with a double meaning - one overt, the other hidden - where it's not possible to detect the presence of the hidden meaning using normal tools, and in many cases it is not even possible using specialist tools.

The hidden meaning can be encoded in three ways:


Steganography has always been talked about in the context of hidden communication - it might be used by people who want to exchange information without anyone watching being able to find out that they are doing so. But recently steganography has been employed by cyber attackers to make elements of their attacks stealthy. This is StegWare, and it's dangerous because it can be undetectable.


This sample file is a GIF image that's a small blue square:

Here is a link to the same file, but with the MIME type set to text/plain, so it opens in your text editor: image. You'll see a lot of binary characters, which are part of the GIF image's content, but following that is some plain text - an appended Powershell script. This script is not part of the GIF image, but is invisible if normal image tools are used to handle the data. Only if we open the image in a text editor is the hidden message revealed. So this is steganography - not very good steganography though, because it is fairly easy to discover the presence of the hidden message.

The script isn't dangerous, so if you have a Windows computer to hand, you can download it and run it. You'll notice some errors are reported, but these do not stop execution of the script. So eventually, the Powershell interpreter gets to the end and the "unsafe" script runs.


This sample file is a BMP image that's a small red square:

The colours in the BMP are held in a palette, and each pixel is a one-byte index into this palette. But all the colours in the palette are the same red colour. That means, whatever data is encoded as a pixel index, it will always show as red. So we can hide all kinds of data in the pixel indexes and it just looks red. Unlike with the previous example, there's no extra data in this file. The message is not well hidden in the pixel array, so you can see it with a text editor: image. The binary characters you can see at the beginning are the colour values in the palette. The image is 16x16 pixels, we the hidden message can be up to 256 bytes long.


In a full colour image, each pixel is (usually) represented by a 24' colour value - 8 bits each for the level of red, green and blue. In a JPEG these values are compressed by converting blocks of 8x8 pixels into the coefficients of a wave equation that approximate the 64 colours. The clever maths involved means that the coefficients of the wave equation are small integers and so they compress much better than the original colour values. Interestingly, the exact values of the higher order coefficients don't make much difference to the appearance of the image. That means the least significant bits of these coefficients can be used to encode a hidden message.

This image is an original JPEG photograph of some Morgan sports cars (the Morgan factory is in Malvern, round the corner from Deep Secure's offices): Morgan Motors - original. Here's a modification of that image Morgan Motors - modified. The images look pretty much the same (tip - open both in two tabs side by side and flip between them - the eye sees movement if there are any differences), but the modified image has the entire text of Jane Austen's "Pride and Prejudice" encoded in it. This text is 724kBytes, though it has been compressed and encrypted before encoding it in the image.

It is a truth universally acknowledged, that a single attacker in possession of good Stegware, is not in want of a good target.

The text is encoded in a simplistic way. Each bit of the compressed and encrypted text is set as the least significant bit of one of the JPEG wave coefficients. This has an effect on the statistical distribution of the coefficient values. A feature of JPEG compression is that a histogram of the coefficient values has a nice normal shape, but modifying the least significant bits with encrypted data effectively randomises them, leaving the histogram distorted.


Encoding a message in the Least Significant Bit of the pixel coefficients of a JPEG image has little effect on the visual look of an image, but it upsets the natural statistical properties of the coefficient values. You can try this out here. You can upload a JPEG image and a data file. You get back a modified image that contains the data and histograms of the two images.

Note, if you choose a message that's too large to fit into the image's Least Significant Bits, you get an error and must try a smaller message or bigger image.

If your message is small relative to the image, changes to the histograms might not be easy to see with the eye, but generally you will see the normal distribution become irregular - something that can be detected using the Chi-squared test.


Pixel data can be redundant, giving an opportunity to hide information. For example, each scan line in a monochrome Bitmap file is padded out to a multiple of 32 bits, so you can have up to 31 bits of hidden data in each row. Another example is that run length encoding compression is supported by the Bitmap format, and with this it is possible to draw off the canvas. So it doesn't matter what the pixel data is - it will never be seen. This redundancy is exploited in this image:

The image is a valid Bitmap with no extraneous data, but it contains a hidden copy of part of the Bitmap format specification. You can see it here by viewing the image as text.