Towards Resolving Optimization Conflicts Between Image- and Text-Based Person Re-Identification

32d ago · Global · primary source: export.arxiv.org

Multi-source synthesis by The Embedding Report from 2 sources. Every numeric and quoted claim traces to a cited source body (see methodology).

Researchers have proposed new methods to improve person re-identification across multiple scenarios, addressing optimization conflicts between image- and text-based retrieval.

A recent study[1] highlights the challenges in jointly optimizing image-based (I2I) and text-based (T2I) person re-identification (ReID) due to modality discrepancies and conflicting training objectives. I2I ReID focuses on identity-level invariance across images of the same person, while T2I ReID is driven by instance-specific textual descriptions tied to unique visual traits[1]. The study found that I2I and T2I ReID are often studied separately, and the loss functions optimized for one retrieval setting may negatively affect the representation quality required by the other. To address this, the researchers proposed a decoupled two-stage training pipeline for learning a shared representation across image and text modalities. The pipeline uses a single vision encoder that supports both I2I and T2I retrieval while avoiding cross-task interference during training. Experiments showed that I2I ReID pre-training positively impacts the generalization ability to T2I data, and incorporating textual supervision during vision encoder training enhances both I2I and T2I performance[1]. Another study[2] introduced the task of Anytime Person Re-identification (AT-ReID), which aims to achieve effective retrieval in multiple scenarios based on variations in time. The researchers created a dataset, AT-USTC, containing 403k images of individuals wearing multiple clothes captured by RGB and IR cameras over 21 months, with 270 volunteers photographed on average 29.1 times across different dates or scenes[2]. They proposed a model, Uni-AT, which comprises a multi-scenario ReID (MS-ReID) framework for scenario-specific features learning, a Mixture-of-Attribute-Experts (MoAE) module to alleviate inter-scenario interference, and a Hierarchical Dynamic Weighting (HDW) strategy to ensure balanced training across all scenarios.

research-paper

Background sources we checked (3)
  • en.wikipedia.org ↗ Artificial intelligence (AI) is the capability of computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. It is a field of research in engineering, mathematics and computer…
  • en.wikipedia.org ↗ This glossary of artificial intelligence is a list of definitions of terms and concepts relevant to the study of artificial intelligence (AI), its subdisciplines, and related fields. Related glossaries include Glossary of computer science, Glossary of robotics, Glossary of machin…
  • en.wikipedia.org ↗ Wikipedia is a free online encyclopedia written and maintained by a community of volunteers, known as Wikipedians, through open collaboration and the wiki software MediaWiki. Founded by Jimmy Wales and Larry Sanger in 2001, Wikipedia has been hosted since 2003 by the Wikimedia Fo…

Sources cited (2)

  1. arxiv.org ↗ E
  2. arxiv.org ↗ E
Spot something wrong? Report an issue