JSON Extraction Template
Table of Contents
- Overview
- Pipeline Architecture
- Seed Format
- Output Format
- Use Cases
- Customization
- Related Documentation
Overview
Complexity: Simple (2 blocks) Use Case: Extract structured information from unstructured text as JSON
This template converts free-form text into structured JSON with title and description fields. Perfect for creating structured datasets from raw text content.
Pipeline Architecture
┌──────────────────┐ ┌──────────────────┐
│ Structured │ ──► │ JSON │
│ Generator │ │ Validator │
└──────────────────┘ └──────────────────┘
Input: content
↓
+ generated (title, description)
↓
+ valid, parsed_json
Blocks:
- StructuredGenerator - Extracts key information as structured JSON
- JSONValidatorBlock - Validates the JSON structure and required fields
Seed Format
Use the simplified content field in metadata:
{
"repetitions": 3,
"metadata": {
"content": "Python is a high-level programming language known for readability and versatility. It's widely used in web development, data science, and automation."
}
}
Multiple seeds example:
[
{
"repetitions": 2,
"metadata": {
"content": "Electric cars reduce emissions but require charging infrastructure."
}
},
{
"repetitions": 1,
"metadata": {
"content": "Machine learning algorithms learn patterns from data without explicit programming."
}
}
]
Output Format
Schema:
{
"title": "string - concise title summarizing the content",
"description": "string - detailed description of the content"
}
Example output:
{
"title": "Introduction to Python",
"description": "Python is a high-level programming language known for its readability and versatility, widely used in web development, data science, and automation."
}
Use Cases
Perfect for:
- Creating structured datasets from raw text
- Converting blog posts/articles to metadata
- Extracting key information from descriptions
- Building content catalogs with titles and summaries
Not ideal for:
- Complex multi-field extraction (use custom blocks)
- Binary classification tasks (use Text Classification template)
- Question-answer pairs (use Q&A Generation template)
Customization
You can modify the template in lib/templates/json_generation.yaml:
Change output fields:
json_schema:
type: object
properties:
headline: {type: string}
summary: {type: string}
category: {type: string}
required: ["headline", "summary"]
Adjust LLM parameters:
config:
temperature: 0.5 # Lower = more deterministic
max_tokens: 512 # Limit response length
Customize the prompt:
user_prompt: "Extract the main topic and a brief summary from: {{ content }}"
Related Documentation
- Templates Overview - All available templates
- How to Use - Running pipelines with templates
- Custom Blocks - Creating your own blocks