RAGAS Evaluation Guide

Overview

RAGAS (Retrieval Augmented Generation Assessment) is a framework for evaluating the quality of RAG-generated answers. The RagasMetrics block evaluates a single QA pair against multiple quality metrics.

Metrics

1. Answer Relevancy

What it measures: How relevant the answer is to the question.

Range: 0.0 - 1.0 (higher is better)

Requires:

Example:

2. Faithfulness

What it measures: Whether the answer is factually consistent with the provided context.

Range: 0.0 - 1.0 (higher is better)

Requires:

Example:

3. Context Precision

What it measures: Whether the relevant context chunks appear earlier in the context list.

Range: 0.0 - 1.0 (higher is better)

Requires:

Example: If the most relevant context appears first in the list -> High score If relevant context is buried at the end -> Low score

4. Context Recall

What it measures: Whether all information needed to answer the question is present in the contexts.

Range: 0.0 - 1.0 (higher is better)

Requires:

Example:

Configuration

Field References

The block uses field references to locate data in the pipeline state:

These are dropdowns populated from available pipeline fields, you can use the FieldMapper block to rename or create fields as needed (eg. extract fields from nasted structures).

Selecting Metrics

Use the metrics multi-select to choose which metrics to compute:

Score Threshold

The field score_threshold is the minimum value for each metric to be considered passing. The block outputs a boolean passed indicating if all selected metrics meet or exceed this threshold.

Model Configuration

Output Format

The block outputs a single ragas_scores object:

{
  ...
  "ragas_scores": {
    "answer_relevancy": 0.92,
    "faithfulness": 0.88,
    "context_precision": 0.95,
    "context_recall": 0.85,
    "passed": true
  },
}