How to debug jpeglib

Did you know that different JPEG libraries not always produce the same output images? My colleagues at the University of Innsbruck systematically investigated differences between different versions of the popular JPEG implementation libjpeg. For this purpose, they developed a convenient Python wrapper that enables switching between different versions of libjpeg and its two popular forks libjpeg-turbo and MozJPEG. The Python wrapper is called jpeglib.

Recently, we noticed a small discrepancy between JPEG images created through the Python wrapper and the command line tool cjpeg from MozJPEG. MozJPEG was out of the scope of the previous study and therefore this discrepancy does not affect the study results.

To trace what is causing the difference, it is convenient to debug into the C library below the Python interface. This is quite straightforward, and this blog post describes how to do it.

We recommend working in a virtual environment (not shown here).

  1. Clone the repository. We usually work on the dev branch.

     git clone git@github.com:martinbenes1996/jpeglib.git
     cd jpeglib
     git checkout dev
    
  2. Compile and install jpeglib with debug symbols.

     python setup.py build_ext --debug install 
    

    The build_ext builds external C modules, the --debug passes the debug flag to the compiler, and install eventually installs the package into your virtual environment. On a side note, the current setup.py already passes the -g flag to the compiler.

    Optionally, you can verify that the built shared libraries contain debug symbols. Running the file command should show with debug_info.

     # The shared libraries can be found in your environment's site-packages folder.
     cd <venv>/lib/python3.10/site-packages/jpeglib-0.13.2-py3.10-linux-x86_64.egg/jpeglib/cjpeglib
     file cjpeglib_mozjpeg403.abi3.so
     > cjpeglib_mozjpeg403.abi3.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=f3ece56613f3bc34840c0de74882362b5b900b39, with debug_info, not stripped
    

    You can also view the symbols using objdump.

     objdump --syms cjpeglib_mozjpeg403.abi3.so
    

    For tracing down our bug, we are particularly interested in the method write_jpeg_spatial. The method can also be found in the table of exported symbols. The source code can be found in src/jpeglib/cjpeglib/cjpeglib_spatial.cpp.

  3. Launch gdb

     gdb
    
     # Tell gdb the execution target
     > target exec python
    
     # Set breakpoint
     > b write_jpeg_spatial
    
     # Launch the Python interpreter
     > run
    
     # Put your Python code here
     >>> ... Python code
    

    If you have a longer Python code, it is convenient to store it to a file and call run script.py instead of just run. In my example, I used the following Python code. The final im.write_spatial() internally calls write_jpeg_spatial.

     from PIL import Image
     import jpeglib
     import numpy as np
     import os
    
     # Input path
     uncompressed_filepath = "~/data/alaska2/ALASKA_v2_TIFF_512_COLOR/00001.tif"
     # Output path
     compressed_filepath = "/tmp/00001.jpg"
     # JPEG quality
     qf = 90
    
     # Compress with jpeglib
     im = np.array(Image.open(uncompressed_filepath))
     with jpeglib.version("mozjpeg403"):
         im = jpeglib.from_spatial(im)
         im.write_spatial(compressed_filepath, qt=qf)
    

    The program execution pauses inside the write_jpeg_spatial method. You can now use the gdb commands to debug the library code.

This concludes this short tutorial. While stepping through the C code with gdb, we spotted the bug that was causing the difference between jpeglib and cjpeg. The bug was resolved and the fix will be part of the next jpeglib release.