RAGAS (Retrieval Augmented Generation Assessment) is a framework for evaluating the quality of RAG-generated answers. The RagasMetrics block evaluates a single QA pair against multiple quality metrics.
What it measures: How relevant the answer is to the question.
Range: 0.0 - 1.0 (higher is better)
Requires:
Example:
What it measures: Whether the answer is factually consistent with the provided context.
Range: 0.0 - 1.0 (higher is better)
Requires:
Example:
What it measures: Whether the relevant context chunks appear earlier in the context list.
Range: 0.0 - 1.0 (higher is better)
Requires:
Example: If the most relevant context appears first in the list -> High score If relevant context is buried at the end -> Low score
What it measures: Whether all information needed to answer the question is present in the contexts.
Range: 0.0 - 1.0 (higher is better)
Requires:
Example:
The block uses field references to locate data in the pipeline state:
These are dropdowns populated from available pipeline fields, you can use the FieldMapper block to rename or create fields as needed (eg. extract fields from nasted structures).
Use the metrics multi-select to choose which metrics to compute:
answer_relevancy requires an embedding modelThe field score_threshold is the minimum value for each metric to be considered passing. The block outputs a boolean passed indicating if all selected metrics meet or exceed this threshold.
The block outputs a single ragas_scores object:
{
...
"ragas_scores": {
"answer_relevancy": 0.92,
"faithfulness": 0.88,
"context_precision": 0.95,
"context_recall": 0.85,
"passed": true
},
}