Prominence-Stratified Failure Modes in Retrieval-Augmented Commercial Recommendation: A 37,000-Run Audit

12d ago · Global · primary source: export.arxiv.org

Multi-source synthesis by The Embedding Report from 2 sources. Every numeric and quoted claim traces to a cited source body (see methodology).

A comprehensive audit of AI recommendation engines revealed significant disparities in brand visibility and conversion rates across different prominence tiers, with top brands facing stiff competition and smaller brands often being overlooked.

The audit, which analyzed 37,000 runs across four model configurations and 215 commercially-framed prompts, found that L1 category leaders appeared in nearly every relevant retrieval but secured only 25-41% of the recommendation slots they reached^[1]. In contrast, L2 challenger brands achieved the highest conversion rates, ranging from 37-52%. L3 mid-market brands experienced a decline in aggregate coverage to 88% and conversion rates to 34-40%. The situation was more dire for L4 specialists and L5 regional players, with 48-52% never surfacing in any of the 37,000 runs. Further analysis revealed that minor changes in a buyer's phrasing could substantially alter brand recommendations from AI assistants, with the recommendation-set similarity between two paraphrases of the same buying intent being 0.288 for cosmetic rewordings and 0.135 for constraint-adding rewordings^[2]. This suggests that the prompt string, rather than the underlying buyer intent, is the dominant factor in determining brand visibility.

research-paper

Sources cited (2)

arxiv.org ↗ E
arxiv.org ↗ E

Spot something wrong? Report an issue