Equivariance of JPEG steganography to geometric transforms

Image steganography is the task of hiding information in innocently looking images. The task of detecting whether an image contains hidden information is called steganalysis.

Today, detecting state-of-the-art image steganography is usually approached by training a learning-based binary classifier to distinguish between cover and stego images. We recently found that learning-based steganography detectors are sensitive to unseen image orientations. This may not be obvious, because a change in orientation does not change any pixels. This observation demonstrates that horizontal pixel sequences have different statistical properties than vertical pixel sequences. We call this phenomenon directionality.

Directionality in images has multiple cases. It can be inherent to the image content or introduced at several states of the image acquisition. An in-depth study of several causes can be found in this paper. In our follow-up study, we investigated the effect of two causes of directionality on JPEG steganography and steganalysis. These are the scene content and the JPEG quantization table. Due to space constraints, however, this study skipped over another potential cause of directionality in JPEG steganography, namely JPEG steganography methods themselves. This blog post fills this gap by analyzing the equivariance of modern JPEG steganography methods to flipping and rotation.

JPEG steganography methods revisited

This section starts with a brief summary of the three popular JPEG steganography methods nsF5, UERD, and J-UNIWARD. Recall that JPEG steganography methods embed into the quantized DCT coefficients. J-UNIWARD and UERD embed in two steps. The first step is to assign an embedding cost to each coefficient. The result is a cost map. The second step uses a coding method to embed the desired payload while minimizing the cost. While nsF5 does not produce a cost map by design, we can obtain a cost map by treating all changable coefficients as having the same cost and treating unchangable coefficients as wet (infinite cost).

nsF5

No-shrinkage F5 (nsF5) is an improved variant of the steganography method F5. nsF5 embeds by decreasing the absolute values of DCT coefficients. It does not embed into the DC coefficient and into zero AC coefficients. nsF5 is usually not counted as content adaptive, but the fact that it does not embed into zeros makes it somewhat content adaptive.

UERD

UERD (and J-UNIWARD) are content-adaptive steganography methods. Content-adaptive steganography method favor embedding into textured image regions, which are difficult to model for the steganalyst. This is done by first calculating an embedding cost per DCT coefficient and then embedding with an advanced coding method while minimizing the embedding cost.

The UERD distortion is a ratio of two components: the quantization table in the numerator and the image content in the denominator. The numerator scales the cost of embedding by the DCT coefficient’s quantization factor. As a result, UERD prefers embedding into DCT coefficients with lower quantization factors. The block energy in the denominator scales the embedding cost by the block’s flatness. The block energy is the sum of de-quantized DCT coefficients per 8x8 block, averaged over a local neighborhood. As a result, the denominator controls which blocks are embedded but treats both directions equally. Hence, asymmetry in UERD-embedded images can only arise from asymmetries in the quantization table.

J-UNIWARD

Similar to UERD, J-UNIWARD embeds into all DCT coefficients while minimizing the distortion caused by the embedding. The UNIWARD distortion is the relative sum of changes in three directional Wavelet domains (horizontal, vertical, and diagonal). The embedding cost is influenced by the quantization factor and by the image content. Higher quantization factors lead to higher embedding cost. When the image content is smoother in one direction than the other, this increases the cost of embedding along orthogonal edges. Asymmetry in J-UNIWARD-embedded images arises from asymmetric quantization tables and asymmetries in the scene content.

Equivariance to geometric transforms

An embedding method is equivariant to a transformation if applying the transformation to the cost map of an image yields the same result as generating the cost map from the transformed image.

img --------------------> cost map 
 |                           |
 |                           |
 v                           v
transformed img ---> transformed cost map

Transforming the cost map can be implemented similarly as transforming images in the DCT domain:

  • Horizontal flipping reverses the order of blocks in horizontal direction.
  • Vertical flipping reverses the order of blocks in vertical direction.
  • A rotation by 90 degrees in counter-clockwise direction comprises two steps: First, each block is transposed. Second, the block order is rotated.
  • A rotation by 180 degrees reverses the order of blocks in both horizontal and vertical direction.
  • A rotation by 270 degrees in counter-clockwise direction comprises two steps: First, each block is transposed. Second, the block order is rotated three times.

A convenient tool to transform JPEG images losslessly is jpegtran. Alternatively, the geometric transforms can also be implemented as described above. Depending on the transformation, you also need to transpose the quantization table and flip signs of DCT coefficients in specific rows and columns.

Steganography method Flipping Rotation
nsF5 yes yes
UERD yes yes
J-UNIWARD no no

nsF5 is equivariant to flipping and rotation.

UERD is equivariant to flipping and rotation.

J-UNIWARD is neither equivariant to flipping nor rotation. There are three potential reasons. First, the Wavelet filter kernels are not symmetric. Second, when the 16x16 filter kernels are correlated with an even-sized input image, then Matlab’s imfilter and scipy’s correlate2d prepend 7 zeros and append 8 zeros to each side to maintain the input resolution. When the image is transformed, the padding of the transformed image may not match the original version. Third, the original implementation accidentially shifts the distortion by one pixel to the bottom right.

Conclusion

The sensitivity of steganography detectors to rotated images can be explained by several causes of directionality. Two major causes are the scene content and asymmetries in the JPEG quantization table. Additionally, our equivariance analysis shows that J-UNIWARD’s embedding locations depend on the image’s orientation.

In prior work, we found that training steganography detectors with rotation augmentation alleviates their sensitivity to unseen orientations. However, rotation augmentation can also decrease the detection accuracy on images of the original orientation. J-UNIWARD’s lack of equivariance to flipping calls into question the common practice of flipping during data augmentation. However, we deem the effect to be rather small and hypothesize that CNN training benefits more from the exposure to augmented variants of the training images.