Disagreeing Rationales: Rethinking Classification and Explainability Evaluation in Hate Speech Detection

This story has moved to /story/attribute-based-diagnosis-of-llm-alignment-with-hate-speech-annotations/.