Q&A Generation Template

Overview
Pipeline Architecture
Seed Format
Output Format
Use Cases
Markdown File Upload Feature
Customization
Example Workflow
Performance Tips
Related Documentation

Overview

Complexity: Advanced (4 blocks with multiplier) Use Case: Generate question-answer pairs from markdown documents

This template converts markdown documentation into question-answer pairs. It automatically chunks long documents and generates 3-5 comprehension questions with answers for each chunk.

Special Feature: Supports markdown file upload in the UI. One file can generate multiple Q&A records through automatic chunking.

Pipeline Architecture

┌──────────────┐   ┌──────────────┐   
│   Markdown   │──►│     Text     │──►
│  Multiplier  │   │  Generator   │   
└──────────────┘   └──────────────┘   

┌──────────────┐   ┌──────────────┐
│  Structured  │──►│     JSON     │
│  Generator   │   │  Validator   │
└──────────────┘   └──────────────┘


Input: file_content
  ↓
+ chunk_text (multiplies: 1 file → N chunks)
  ↓
+ assistant (questions)
  ↓
+ generated (qa_pairs array)
  ↓
+ valid, parsed_json

Blocks:

MarkdownMultiplierBlock - Chunks markdown into processable sections (512 tokens, no overlap)
TextGenerator - Generates 3-5 comprehension questions per chunk
StructuredGenerator - Answers questions based strictly on chunk content
JSONValidatorBlock - Validates Q&A structure

Key Concept: The multiplier block returns a list, so 1 seed with 2 repetitions and 5 chunks creates 10 records (1 × 2 × 5).

Seed Format

⚠️ Important: This template uses file_content (not content).

Single markdown file:

{
  "repetitions": 2,
  "metadata": {
    "file_content": "# Python Basics\n\nPython is a high-level programming language...\n\n## Variables\n\nVariables store data values..."
  }
}

Typical workflow:

Upload a .md file in the UI (the MarkdownMultiplierBlock handles this)
Set repetitions (how many times to process each chunk)
Generate - creates multiple records from one file

Output Format

Schema:

{
  "qa_pairs": [
    {
      "question": "string - comprehension question",
      "answer": "string - answer based on content"
    }
  ]
}

Example output (from one chunk):

{
  "qa_pairs": [
    {
      "question": "What is photosynthesis?",
      "answer": "Photosynthesis is the process by which plants convert sunlight into energy using chlorophyll."
    },
    {
      "question": "What role does chlorophyll play in photosynthesis?",
      "answer": "Chlorophyll in leaves absorbs light, which triggers chemical reactions that produce glucose."
    },
    {
      "question": "What is the end product of photosynthesis?",
      "answer": "The end product of photosynthesis is glucose."
    }
  ]
}

Use Cases

Perfect for:

Converting technical documentation to training datasets
Creating educational Q&A from tutorials
Building comprehension tests from long articles
Processing multi-section markdown documents efficiently

Not ideal for:

Short single-paragraph text (use simpler templates)
Non-markdown formats (requires preprocessing)
Open-ended questions (this generates factual Q&A)

Markdown File Upload Feature

The MarkdownMultiplierBlock is a seeder block that enables markdown file upload:

Navigate to Generator page
Click “Choose File” and select a .md file
The block automatically:
- Parses markdown structure (headers, sections)
- Chunks by size (512 tokens default)
- Creates one record per chunk

Result: 1 uploaded file → Multiple Q&A records (one per chunk)

Customization

Modify the template in lib/templates/qa_generation.yaml:

Adjust chunk size:

blocks:
  - type: MarkdownMultiplierBlock
    config:
      chunk_size: 1024  # Larger chunks
      chunk_overlap: 50 # Add overlap for context

Change number of questions:

# In StructuredGenerator json_schema:
minItems: 5  # Minimum 5 questions
maxItems: 10 # Maximum 10 questions

Modify question style:

# In TextGenerator system_prompt:
system_prompt: |
  Generate analytical questions that require critical thinking about the text.

Adjust answer detail level:

# In StructuredGenerator user_prompt:
user_prompt: |
  Provide detailed answers with examples from the content.

Example Workflow

Input: Technical documentation file (5000 words)

Processing:

MarkdownMultiplierBlock splits into ~10 chunks
Set repetitions = 2
Total records created: 10 chunks × 2 reps = 20 Q&A records

Output: 20 records, each with 3-5 question-answer pairs

Performance Tips

Large files: Increase chunk_size to reduce number of chunks
More coverage: Add chunk_overlap (e.g., 50 tokens)
Quality control: Review first few generated records before processing entire file
Repetitions: Use 1-2 for variety without duplication

Templates Overview - All available templates
How to Use - Running pipelines with templates
Custom Blocks - Understanding multiplier blocks

Documentation