Steganography - FAQ

Design and implement a program that encodes messages and embeds them in BMP image files, and decodes messages imbedded in the manner of the encoding portion of the program.

Purpose of exercise

Activities

Input

The encoding operation reads two input files. One file, the plaintext file, contains some encoding coefficients and a message to be encoded, and the other, the image file, contains an image in which the encoded message is to be embedded. Here are some input files for testing: tiny, small, medium, large, gigantic.

The decoding operation reads an image file containing an encoded message. 

The names of plaintext input file, the image input file, and the output image file are parameters of the decoding function. If a plaintext output file is written (for an error report), name the file "error.txt".

Output

The encoding operation writes an image file containing an encoding of the message from the input plaintext file if the input image file was big enough for the message. If not, it writes a plaintext file reporting the problem (instead of writing an image file with the message embedded in it).

The decoding operation writes a plaintext file containing the encoding coefficients and the message extracted from the input image file, following the same format as the input plaintext file.

Plaintext file format

The plaintext file contains the message to be encoded. It is Unix-style text file. That is, its line terminators are newline characters (the character whose ASCII code is 10), and all of its other characters are visible characters or spaces (that is, characters whose ASCII codes are in the range 32 to 126). For encoding purposes, the newline characters are interpreted as spaces.

Image file format

The image file is a BMP file containing an image that is not compressed. This is a standard file format for which you can find documentation on the web (or here). The key things to find out are how to extract the file size and the offset to the image data from the header.

Information recorded in image file

Information is encoded in an image file by using the low-order bit in each byte of image data to record a part of the encoded message. The input image file will contain a sequence of image bytes, and the output file will contain the same sequence of image bytes, except that the low-order bit in some of the bytes will be changed to record information about the encoded message. (Generally, this change in the low-order bits will not affect the way the image looks to a person viewing it on a monitor. That is, if a person uses a BMP file viewer to look at an image file containing an encoded message and then looks at the same image file, but without the encoded message, the two files will look the same.)

Encoding messages

The encoded information consists of the sequence of numbers produced by the encoding process described in the cryptography project..  Each number in the encoded message is in the set {0, 1, 2, … 94} and is embedded in the image by putting the bits of its binary numeral in the low-order bits of a block of seven consecutive image-data bytes. Terminate the message by putting the binary numeral for 95 in the  seven-byte block in the image just after the the seven-byte block containing the last number of the encoded information.

Decoding messages

Extract the information from an image file containing an encoded message by inverting the encoding process, and write a plaintext file containing this information in the same format as the input plaintext file. Since the newline characters from the plaintext file were interpreted as spaces in the encoding process, you will need to insert occasional newlines to make the file readable with an ordinary, text-file viewer. You can decide how you want to intersperse these newline characters.

Required functions

  1. (bury bs cs): delivers a list of bytes like cs, which is a list of numbers in the range 0...255, but with its low-order bits replaced by bs, which is a list of 0/1 values (assume that the list cs has at least as many elements as the list bs)
  2. (unbury cs): delivers the sequence of low-order bits from cs (a sequence of numbers in the range 0...255)
  3. (bin v): delivers (in the form of a list of 0s and 1s) the binary numeral of the non-negative integer v
  4. (num bs): delivers the non-negative integer whose binary numeral is bs (a list of 0s and 1s)
  5. (bi7 v): delivers (in the form of a list of exactly seven 0s and 1s) the low-order seven bits of the binary numeral for the non-negative integer v, padded with leading zeros, if necessary

Properties

  1. (unbury (bury bs cs)) reproduces bs
  2. (num (bin v)) reproduces v
  3. (num (bi7 v)) delivers v mod 2^7

Functions and commands useful in this exercise

Avoid stack overflow

You will be inductively defining functions that construct, character-by-character or byte-by-byte, lists with many thousands of elements. These functions must be tail recursive to avoid stack overflow. The function zap-img in the test suite supplied with the binary-io-utilities (a DrACuLa teachpack) book illustrates how to read, transform, and write a large file without triggering a stack overflow.