When Gradients Collide: Failure Modes of Multi-Objective Prompt Optimization for LLM Judges
- person Abhishek Divekar
Two new studies propose frameworks to improve the reliability and evaluation of large language models in specialized domains, addressing trustworthiness in medical reporting and optimization for multi-criteria assessment.
A study introduces a Multi-Dimensional Credibility Assessment (MDCA) framework designed to enhance the trustworthiness of Chinese LLM-generated liver MRI reports [1]. The research notes that while LLMs can generate diagnostic conclusions from imaging findings, a comprehensive framework for assessing the trustworthiness of such radiology reports has been lacking [1]. The study applies the MDCA framework to evaluate multiple advanced LLMs using the SiliconFlow platform [1].
Separately, another study identifies two failure modes in multi-objective prompt optimization for LLM judges: gradient dilution and instruction interference [2]. The research finds that optimizing an LLM judge across multiple evaluation criteria requires handling textual gradients differently than single-criterion optimization [2]. Combining per-task instructions into a single prompt was found to degrade performance by -5.3% as measured by Spearman's rho [2]. Furthermore, gradient specificity drops by 59% when the gradient LLM processes multiple criteria jointly [2]. The researchers tested five decomposition modes of textual gradient optimizers, with six out of ten configurations showing no improvement over the initial prompt [2].
research-paperinfrastructure
Sources cited (2)
- arxiv.org E · research — https://arxiv.org/abs/2510.23008 ↗
- arxiv.org E · research — https://arxiv.org/abs/2605.26046 ↗