ChWDTA: Channel-wise Wavelet-Domain Transformer Attention and Entropy Modeling for Learned Image Compression

32d ago · Global · primary source: export.arxiv.org

Multi-source synthesis by The Embedding Report from 2 sources. Every numeric and quoted claim traces to a cited source body (see methodology).

Researchers have proposed two new image compression schemes, ChWDTA and HyperVQ, which aim to improve rate-distortion performance in learned image compression (LIC).

The ChWDTA scheme, described in a paper on arXiv[1], combines channel-wise wavelet transforms with transformer attention and entropy modeling. It introduces a channel-wise wavelet-domain transformer attention mechanism and a channel-wise wavelet packet decomposition, achieving BD-rate reductions of -17.82%, -19.15%, and -22.56% on the Kodak, CLIC Professional Validation, and Tecnick test sets, respectively[1]. Meanwhile, another research team proposed HyperVQ, a framework that enables hyperprior entropy modeling for VQ-based generative image compression, on arXiv[2]. HyperVQ predicts a high-dimensional continuous multivariate Gaussian distribution for continuous latents and achieves an average bitrate saving of 18.5% across diverse VQ architectures[2]. Existing VQ codecs lack efficient content-adaptive entropy modeling and rely on static frequencies, according to the researchers[2]. The introduction of wavelet transforms in CNN-transformer-based LIC schemes and the development of HyperVQ are expected to advance the field of image compression.

applicationresearch-paperbenchmark

Sources cited (2)

  1. arxiv.org ↗ E
  2. arxiv.org ↗ E
Spot something wrong? Report an issue