When Gradients Collide: Failure Modes of Multi-Objective Prompt Optimization for LLM Judges

11d ago · Global · primary source: export.arxiv.org

person Abhishek Divekar

Multi-source synthesis by The Embedding Report from 2 sources. Every numeric and quoted claim traces to a cited source body (see methodology).

Two new studies propose frameworks to improve the reliability and evaluation of large language models in specialized domains, addressing trustworthiness in medical reporting and optimization for multi-criteria assessment.

A study introduces a Multi-Dimensional Credibility Assessment (MDCA) framework designed to enhance the trustworthiness of Chinese LLM-generated liver MRI reports ^[1]. The research notes that while LLMs can generate diagnostic conclusions from imaging findings, a comprehensive framework for assessing the trustworthiness of such radiology reports has been lacking ^[1]. The study applies the MDCA framework to evaluate multiple advanced LLMs using the SiliconFlow platform ^[1].

Separately, another study identifies two failure modes in multi-objective prompt optimization for LLM judges: gradient dilution and instruction interference ^[2]. The research finds that optimizing an LLM judge across multiple evaluation criteria requires handling textual gradients differently than single-criterion optimization ^[2]. Combining per-task instructions into a single prompt was found to degrade performance by -5.3% as measured by Spearman's rho ^[2]. Furthermore, gradient specificity drops by 59% when the gradient LLM processes multiple criteria jointly ^[2]. The researchers tested five decomposition modes of textual gradient optimizers, with six out of ten configurations showing no improvement over the initial prompt ^[2].

research-paperinfrastructure

Sources cited (2)

arxiv.org E · research — https://arxiv.org/abs/2510.23008 ↗
arxiv.org E · research — https://arxiv.org/abs/2605.26046 ↗

Spot something wrong? Report an issue