Sieun Park · Follow
Published in · 8 min read · Oct 18, 2021
--
Real-world images commonly experience various degradations throughout their lifetime. For example, images taken using older devices include various noise and artifacts. Artifacts are also added when images experience multiple editing and compressions while being shared and transferred multiple times using the internet.
However, recovering from such degradation and bringing these images back to life is very challenging, mostly because the degradation process is unknown and is different for every image. How can we learn the inverse of an undefined degradation mapping? Let’s have a short overview of previous approaches to this problem:
- Explicit degradation prediction: Simulating the degradation process with a combination of classic operations (e.g. blur, downsample, noise, JPEG compression) and creating a dataset by applying explicit degradations.
Problem: Real-world degradations can be too complex to model with such a simple degree of degradation.
- Conducting a dataset with real-world degradations e.g. pairs of images taken using old and modern cameras, or unpaired degraded and high-quality images.
- Degradation prediction: Estimating the degradation mapping of the dataset mentioned above using GANs either separately or jointly with the restoration network.
Problem: Limited to the degradations of the dataset the network was trained on and fails at generalizing to out-of-distribution images.
As we can see, modeling the degradation mapping is crucial for blind SR: the problem of synthesizing high-resolution images from noisy low-resolution real-world images. Despite the importance of this problem, the results of previous works are not comparable to real high-quality images. The authors of Real-ESRGAN nails the problem and show amazing results. They propose to address the degradation modeling problem and improve multiple aspects of the synthesis network.
In particular, the paper suggests:
- High-order degradations to better simulate real-world degradations.
- sinc filters to address the ringing and overshoot artifacts from degradation.
- More powerful discriminator using the U-Net architecture and spectral normalization regularization(SN).
Original paper: Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data
Degradation operations
We will first look at the details of how each degradation effect is typically simulated. Feel free to skip parts of this section.
Blur
Blur is simulated as a convolution with a blur filter, a Gaussian kernel(a.k.a. Gaussian blur kernel) in most cases. It is calculated as the Gaussian distribution in 2D space. To make the effects more realistic, the paper adopts generalized Gaussian blur kernels. According to the abstract of the paper proposing generalized Gaussian(GG) functions, the function is theoretically and practically accurate for modeling out-of-focus blur degradation and can be simplified as a single-parameter model.
More about generalized Gaussian blur kernels: Estimating generalized gaussian blur kernels for out-of-focus image deblurring.
Noise
Noise is applied by either adding Gaussian noise or Poisson noise. In additive Gaussian noise, the noise is sampled from the Gaussian distribution and the intensity of the noise is controlled by the standard deviation. Poisson noise is sampled from the Poisson distribution and it approximately models the sensor noise caused by statistical quantum fluctuation. The intensity of the Poisson noise is proportional to the image.
Noise can either be given as “color noise” when each channel of the noise is independent or “gray noise” when the three channels have the same noise.
A great explanation about the properties of the Poisson distribution.
Downsampling
There are several resize methods with unique characteristics. Area resizing, bilinear interpolation, and bicubic interpolation are considered.
JPEG compression
JPEG compression is a lossy compression method commonly used when saving and transferring digital images. We can make a trade-off between the quality of the compressed image and a higher compression rate using a quality factor that ranges from 0 to 100.
sinc Filter
Ringing artifacts are artifacts that appear near sharp transitions in an image. They are often accompanied by overshoot artifacts that are described as an increased jump at the edge transition(e.g. the black ring in the bottom right figure). They are very common artifacts in images and there is a need to simulate such artifacts.
The sinc filter is a kernel that mimics these artifacts, expressed as the formula below. (i, j) are kernel coordinates, J₁ is the first-order Bessel function of the first kind, and ω_c is the cutoff frequency. These kernels look like the figures on the left of each example in the figure above when plotted.
High-order degradation
A typical degradation model would involve multiple degradation operations for synthesizing low-resolution images similar to real-world samples. For example, given the ground-truth image y, the degradation process D could be a combination of a blur kernel k, downsampling with scale factor r, adding noise n, and JPEG compression. We could express this process as the equation below.
These classic degradation models could generate synthetic training pairs to some degree. However, models with basic degradation, which the paper refers to as “first-order modeling” cannot model complex real-world degradations which are diverse and usually comprise a series of multiple procedures. The figure below describes examples of both cases.
Specifically, the original image might be taken with a cellphone many years ago, which inevitably contains degradations such as camera blur, sensor noise, low resolution, and JPEG compression. The image was then edited with sharpening and resize operations, bringing in overshoot and blur artifacts. After that, it was uploaded to some social media applications, which introduces a further compression and unpredictable noises.(example from Real-ESRGAN)
A simple solution is to repeat the random degradations multiple times. An n-order model is referred to the process of repeating the classic degradation model n times.
The figure above illustrates the overall pipeline of the degradation simulation process. In detail,
- Each operation is selected randomly from the set of options available.
- The paper suggests a second-order degradation process is sufficient for modeling real-world degradations.
- Downsampling is replaced with random resizing to control the image resolution.
- The sinc filter is applied in the initial blurring process and at the final stage. The order of the JPEG compression and final sinc filter is random.
- Details and hyper-parameters of the degradation operations are provided in the paper.
The figure above demonstrates the effectiveness of the sinc filter and second-order degradation modeling. We observe that ring artifact are removed as expected when the sinc filter is applied.
Network
The authors also suggest multiple improvements to the synthesis network. The generator backbone with residual-in-residual dense block(RRDB) is the same as the original ESRGAN. However, the input layer is adjusted to input images of multiple resolutions by applying pixel-unshuffle(inverse of pixel shuffling) to reduce the spatial size and increase the number of channels for efficiency.
The authors argue that the discriminator for Real-ESRGAN requires a greater discriminative power for complex training outputs because Real-ESRGAN aims to address a much larger degradation space than ESRGAN.
I agree that GANs typically benefit from powerful discriminators and the proposals of the paper is valid. However, I don’t understand why the authors suggest that the discriminative space of Real-ESRGAN is larger compared to ESRGAN. I believe that the problem complexity of the generator will increase because it has to learn the inverse of the degradation process together with the original upsampling problem, but the task of the discriminator seems the same: classifying synthesized images and high-resolution images. I want to know your thoughts 🧐.
The authors improve the original VGG-style discriminator to a U-Net structure. To stabilize training, the authors also apply spectral normalization regularization. Such modifications being surprising improvements to the synthesized image.
More about Spectral Normalization: Spectral normalization for generative adversarial networks
The authors also assess the effect of post-processing algorithms, unsharp masking in particular as Real-ESRGAN+. The authors find overshoot artifacts when using this algorithm, and propose a trick to sharpen the ground-truth image during training and balance sharpness and overshoot artifacts.
There are minor changes in training hyper-parameters in terms of the weights of each loss.
Discussions
We clearly observe that the visual quality of Real-ESRGAN outperforms previous approaches. Real-ESRGAN effectively removes artifacts in the image and generates photo-realistic high-resolution textures.
However, the authors also find limitations of Real-ESRGAN. While the proposed high-order degradation model is capable of simulating more complex degradations, it is not perfect. The authors find its model has some limitations when synthesizing twisted lines and out-of-distribution degradations, together with GAN artifacts.
Conclusion
We looked at the challenges of modeling the degradation process for real-world image synthesis. The authors propose sinc filters to remove the ringing and overshooting artifacts and high-order degradation models to better express the complex real-world degradation. The authors also suggest improvements to the network architectures.
However, I believe there must be more investigation on whether this pipeline can truly generalize to other real-world degradations apart from the samples suggested by the author. Still, the contributions in terms of how to represent the unknown degradation process were very interesting. The strategies discussed in this paper can be applied to other image synthesis problems or even supervised vision problems. A data augmentation that involves the described pipeline could improve the robustness of the network.