Reasoning Comparisons — Our Model vs. Qwen2Audio‑CoT vs. Qwen2.5‑Omni‑CoT

Text‑only demo page for comparing predicted labels and rationales (open/closed form) with majority vs annotator‑aware scoring.

Emotion Extraction Prompt below:


          You are an expert in emotion and answer extraction.
          Your task is to extract the emotion answer from the model prediction.
          The answer must be presented in the model prediction and only one emotion should be extracted.
          Do not provide any explanation and do not add your comments on the prediction.
          
          Model Prediction: {$model_pred}
          
          Return the emotion answer in the model prediction, if the model fails to predict the emotion, return "None".
      
          Answer:

Emotion Grouping Prompt below:


          You are an expert in emotion theory. According to emotion wheel and related theory, 
          group the following emotion to the following eight emotion categories: 
          [Angry, Sad, Happy, Surprise, Fear, Disgust, Contempt, Neutral], 
          if you think it's not an emotion or can not be grouped into these categories, but it in Others. 
          The output should be a json with emotion category as key, and all emotions in this group as values. 
          Input:

          $ALL_EMOTIONS_EXTRACTED

Win-Rate Prompt for pairwise judging below:

## Instructions:

You are an impartial adjudicator for speech-emotion reasoning. Given the inputs and two anonymous candidates (A, B), 
          score each, pick a winner (or Tie), and return only the JSON schema below.

## Inputs:

    - Audio
    - Reasoning A: {pred_reasoning1}
    - Reasoning B: {pred_reasoning2}

## Evaluate Criteria:

1. Audio Grounding: Uses concrete acoustic/prosodic evidence (pitch/energy/tempo/pauses/timbre) tied to the audio.
2. Coherence & correctness: Internally consistent; no hallucinations (e.g., inventing content not in audio/transcript).
3. Specificity vs. Vagueness

## Output Format:

```json
{{
    "winner": "A | B | Tie",
    "rationale_short": "<1-2 sentences comparing A vs B>",
    "flags": ["missing_audio" | "low_evidence" | "hallucination_A" | "hallucination_B"]
}}
```

Legend: •OK = matches metric target; NO = does not match.

Toggle: Transcript Rationales