ChWDTA: Channel-wise Wavelet-Domain Transformer Attention and Entropy Modeling for Learned Image Compression
Researchers have proposed two new image compression schemes, ChWDTA and HyperVQ, which aim to improve rate-distortion performance in learned image compression (LIC).
The ChWDTA scheme, described in a paper on arXiv[1], combines channel-wise wavelet transforms with transformer attention and entropy modeling. It introduces a channel-wise wavelet-domain transformer attention mechanism and a channel-wise wavelet packet decomposition, achieving BD-rate reductions of -17.82%, -19.15%, and -22.56% on the Kodak, CLIC Professional Validation, and Tecnick test sets, respectively[1]. Meanwhile, another research team proposed HyperVQ, a framework that enables hyperprior entropy modeling for VQ-based generative image compression, on arXiv[2]. HyperVQ predicts a high-dimensional continuous multivariate Gaussian distribution for continuous latents and achieves an average bitrate saving of 18.5% across diverse VQ architectures[2]. Existing VQ codecs lack efficient content-adaptive entropy modeling and rely on static frequencies, according to the researchers[2]. The introduction of wavelet transforms in CNN-transformer-based LIC schemes and the development of HyperVQ are expected to advance the field of image compression.
applicationresearch-paperbenchmark