Contributing to DataGenFlow
Thank you for your interest in contributing! This document provides guidelines for contributing to the project.
Table of Contents
Getting Started
- Fork the repository
- Clone your fork:
git clone https://github.com/YOUR_USERNAME/DataGenFlow.git cd DataGenFlow - Set up development environment:
make setup make dev - Create a feature branch:
Tip: Click βCreate a new branchβ on GitHub when making a PR to have a good branch name.git checkout -b feature/your-feature-name
Development Workflow
Running the application
# development mode (both servers with hot reload)
make run-dev # starts backend (:8000) and frontend (:5173)
# or run separately in different terminals
make dev-backend # backend on :8000 with auto-reload
make dev-ui # frontend on :5173 with hot reload
# production mode
make run # builds frontend and runs backend
Code quality
Before submitting a PR, ensure:
make format # format code with ruff
make lint # check for issues
make typecheck # run mypy
make test # run all tests
All checks must pass before merging.
Pull Request Conventions
We use icons and prefixes to make PRs easy to scan and understand at a glance.
Using PR Templates
When creating a pull request, GitHub will prompt you to choose a template:
- π Feature - For new functionality
- π§© Fix - For bug fixes
- π Refactor - For code improvements
- π Docs - For documentation updates
Select the appropriate template to get a pre-filled PR description with the right sections.
PR Title Format
<icon> <type>: <short description>
PR Types
| Type | Icon | Description | Example |
|---|---|---|---|
| Feature | π | New functionality or capability | π feat: add JSON validation block |
| Fix | π§© | Bug fixes or corrections | π§© Fix: block configuration not visible in edit mode |
| Epic | πΈ | Large feature requiring multiple PRs | πΈ EPIC: complex workflow and branching support |
| Refactor | π | Code improvements without behavior change | π Refactor: simplify block renderer logic |
| Docs | π | Documentation updates | π Docs: add block creation guide |
PR Description Guidelines
Every PR should include:
- Description: What does this PR do?
- Proposed solution: Why is this change needed?
- Testing: How was this tested?
For Fix PRs, also include:
- Reproduction steps: How to reproduce the bug
- Expected vs Actual: What should happen vs what happens
Example PR (Fix)
Title: π§© Fix: block configuration not visible in edit mode
### Description
Fixed issue where block configuration panel doesn't appear when editing existing pipelines.
### Reproduction Steps
1. Create a new pipeline with a TextGenerator block
2. Configure the block with a custom system prompt
3. Save the pipeline
4. Click "Edit" on the saved pipeline
5. Click on the TextGenerator block
6. **Bug**: Configuration panel doesn't appear
### Expected vs Actual
- **Expected**: Configuration panel should appear on the right side
- **Actual**: Nothing happens when clicking the block
### Proposed Solution
- Updated `Pipelines.tsx` to properly load block config when entering edit mode
- Fixed state initialization in `useEffect` hook
- Added null check for block config before rendering
## Testing
- Manually tested edit flow with all block types
- Added test case in `test_api.py` for pipeline update endpoint
- All existing tests passing
Example PR (Feature)
Title: π Feat: add retry logic to TextGenerator block
### Description
Added configurable retry logic to TextGenerator for handling transient API failures. LLM API calls can fail due to rate limits, network issues, or temporary outages. Without retries, entire pipeline executions fail, wasting compute and time.
## Proposed Solution
- Added `max_retries` and `retry_delay` config options to TextGenerator
- Implemented exponential backoff strategy
- Updated TextGenerator schema and validation
- Added tests for retry behavior
## Testing
- Unit tests for retry logic with mocked failures
- Integration test with actual API (using test endpoint)
- Tested with rate-limited endpoint to verify backoff
Code Style
- Follow existing code patterns
- Write comments that explain why, not what
- Use lowercase for comments
- Return early instead of nested
elsestatements - Create minimal number of functions
- Code should be self-explanatory
Example:
# β Bad
def process_data(data):
# Process the data
if data is not None:
result = data.upper()
return result
else:
return None
# β
Good
def process_data(data):
# early return avoids unnecessary nesting
if data is None:
return None
return data.upper()
Commit Messages
- Keep commits focused and atomic
- Write clear, descriptive commit messages
- Squash WIP commits before merging
Questions?
Feel free to open an issue or reach out to maintainers if you have questions about contributing.