How to debug jpeglib
Did you know that different JPEG libraries not always produce the same output images? My colleagues at the University of Innsbruck systematically investigated differences between different versions of the popular JPEG implementation libjpeg. For this purpose, they developed a convenient Python wrapper that enables switching between different versions of libjpeg and its two popular forks libjpeg-turbo and MozJPEG. The Python wrapper is called jpeglib.
Recently, we noticed a small discrepancy between JPEG images created through the Python wrapper and the command line tool cjpeg from MozJPEG. MozJPEG was out of the scope of the previous study and therefore this discrepancy does not affect the study results.
To trace what is causing the difference, it is convenient to debug into the C library below the Python interface. This is quite straightforward, and this blog post describes how to do it.
We recommend working in a virtual environment (not shown here).
-
Clone the repository. We usually work on the
dev
branch.git clone git@github.com:martinbenes1996/jpeglib.git cd jpeglib git checkout dev
-
Compile and install jpeglib with debug symbols.
python setup.py build_ext --debug install
The build_ext builds external C modules, the
--debug
passes the debug flag to the compiler, and install eventually installs the package into your virtual environment. On a side note, the current setup.py already passes the-g
flag to the compiler.Optionally, you can verify that the built shared libraries contain debug symbols. Running the file command should show with debug_info.
# The shared libraries can be found in your environment's site-packages folder. cd <venv>/lib/python3.10/site-packages/jpeglib-0.13.2-py3.10-linux-x86_64.egg/jpeglib/cjpeglib file cjpeglib_mozjpeg403.abi3.so > cjpeglib_mozjpeg403.abi3.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=f3ece56613f3bc34840c0de74882362b5b900b39, with debug_info, not stripped
You can also view the symbols using objdump.
objdump --syms cjpeglib_mozjpeg403.abi3.so
For tracing down our bug, we are particularly interested in the method write_jpeg_spatial. The method can also be found in the table of exported symbols. The source code can be found in
src/jpeglib/cjpeglib/cjpeglib_spatial.cpp
. -
Launch gdb
gdb # Tell gdb the execution target > target exec python # Set breakpoint > b write_jpeg_spatial # Launch the Python interpreter > run # Put your Python code here >>> ... Python code
If you have a longer Python code, it is convenient to store it to a file and call
run script.py
instead of justrun
. In my example, I used the following Python code. The finalim.write_spatial()
internally calls write_jpeg_spatial.from PIL import Image import jpeglib import numpy as np import os # Input path uncompressed_filepath = "~/data/alaska2/ALASKA_v2_TIFF_512_COLOR/00001.tif" # Output path compressed_filepath = "/tmp/00001.jpg" # JPEG quality qf = 90 # Compress with jpeglib im = np.array(Image.open(uncompressed_filepath)) with jpeglib.version("mozjpeg403"): im = jpeglib.from_spatial(im) im.write_spatial(compressed_filepath, qt=qf)
The program execution pauses inside the write_jpeg_spatial method. You can now use the gdb commands to debug the library code.
This concludes this short tutorial. While stepping through the C code with gdb, we spotted the bug that was causing the difference between jpeglib and cjpeg. The bug was resolved and the fix will be part of the next jpeglib release.