Generate high-quality synthetic data with AI while preserving referential integrity
SYDA seamlessly generates realistic synthetic test data—including structured, unstructured, PDF, and HTML—using AI and large language models. It preserves referential integrity, maintains privacy compliance, and accelerates development workflows. SYDA enables both highly regulated industries such as healthcare and banking, as well as non-regulated environments like software testing, research, and analytics, to safely simulate diverse data scenarios without exposing sensitive information.
For detailed documentation, examples, and API reference, visit: https://python.syda.ai/
pip install sydaCreate .env file:
# .env
ANTHROPIC_API_KEY=your_anthropic_api_key_here
# OR
OPENAI_API_KEY=your_openai_api_key_here
# OR
GEMINI_API_KEY=your_gemini_api_key_here
# OR
GROK_API_KEY=your_grok_api_key_here"""
Syda 30-Second Quick Start Example
Demonstrates AI-powered synthetic data generation with perfect referential integrity
"""
from syda import SyntheticDataGenerator, ModelConfig
from dotenv import load_dotenv
# Load environment variables from .env file
load_dotenv()
print("🚀 Starting Syda Quick Start...")
# Configure AI model
generator = SyntheticDataGenerator(
model_config=ModelConfig(
provider="anthropic",
model_name="claude-haiku-4-5-20251001"
)
)
# Define schemas with rich descriptions for better AI understanding
schemas = {
# Categories schema with table and column descriptions
'categories': {
'__table_description__': 'Product categories for organizing items in the e-commerce catalog',
'id': {
'type': 'number',
'description': 'Unique identifier for the category',
'primary_key': True
},
'name': {
'type': 'text',
'description': 'Category name (Electronics, Home Decor, Sports, etc.)'
},
'description': {
'type': 'text',
'description': 'Detailed description of what products belong in this category'
}
},
# Products schema with table and column descriptions and foreign keys
'products': {
'__table_description__': 'Individual products available for purchase with pricing and category assignment',
'__foreign_keys__': {
'category_id': ['categories', 'id'] # products.category_id references categories.id
},
'id': {
'type': 'number',
'description': 'Unique product identifier',
'primary_key': True
},
'name': {
'type': 'text',
'description': 'Product name and title'
},
'category_id': {
'type': 'foreign_key',
'description': 'Reference to the category this product belongs to'
},
'price': {
'type': 'number',
'description': 'Product price in USD'
}
}
}
# Generate data with perfect referential integrity
print("📊 Generating categories and products...")
results = generator.generate_for_schemas(
schemas=schemas,
sample_sizes={"categories": 5, "products": 20},
output_dir="data"
)
print("✅ Generated realistic data with perfect foreign key relationships!")
print("📂 Check the 'data' folder for categories.csv and products.csv")
# Check data/ folder for categories.csv and products.csv| Feature | Benefit | Example |
|---|---|---|
| Multi-AI Provider | No vendor lock-in | Claude, GPT, Gemini, Grok, Ollama, and any OpenAI-compatible API |
| Zero Orphaned Records | Perfect referential integrity | product.category_id → category.id ✅ |
| SQLAlchemy Native | Use existing models directly | Customer, Contact classes → CSV data |
| Multiple Schema Formats | Flexible input options | SQLAlchemy, YAML, JSON, Dict |
| Document Generation | AI-powered PDFs linked to data | Product catalogs, receipts, contracts |
| Custom Generators | Complex business logic | Tax calculations, pricing rules, arrays |
| Large Dataset Support | Thousands to millions of rows | Code-gen mode: 10,000 rows with ~3 LLM calls |
| Privacy-First | Protect real user data | GDPR/CCPA compliant testing |
| Database Integration | Any SQLAlchemy-compatible database | DatabaseSchemaLoader("postgresql://...") → generate → write back |
| CLI | No Python required | syda generate --schema patients.yaml --rows 1000 --large-dataset |
| Cost Tracking | Know what you're spending | Per-table & per-column cost breakdown in every run report |
| Developer Experience | Just works | Type hints, great docs, HTML run reports |
Click to view schema files (category_schema.yml & product_schema.yml)
category_schema.yml:
__table_name__: Category
__description__: Retail product categories
id:
type: integer
description: Unique category ID
constraints:
primary_key: true
not_null: true
min: 1
max: 1000
name:
type: string
description: Category name
constraints:
not_null: true
length: 50
unique: true
parent_id:
type: integer
description: Parent category ID for hierarchical categories, if it is a parent category, this field should be 0
constraints:
min: 0
max: 1000
description:
type: text
description: Detailed category description
constraints:
length: 500
active:
type: boolean
description: Whether the category is active
constraints:
not_null: trueproduct_schema.yml:
__table_name__: Product
__description__: Retail products
__foreign_keys__:
category_id: [Category, id]
id:
type: integer
description: Unique product ID
constraints:
primary_key: true
not_null: true
min: 1
max: 10000
name:
type: string
description: Product name
constraints:
not_null: true
length: 100
unique: true
category_id:
type: integer
description: Category ID for the product
constraints:
not_null: true
min: 1
max: 1000
sku:
type: string
description: Stock Keeping Unit - unique product code
constraints:
not_null: true
pattern: '^P[A-Z]{2}-\d{5}$'
length: 10
unique: true
price:
type: float
description: Product price in USD
constraints:
not_null: true
min: 0.99
max: 9999.99
decimals: 2
stock_quantity:
type: integer
description: Current stock level
constraints:
not_null: true
min: 0
max: 10000
is_featured:
type: boolean
description: Whether the product is featured
constraints:
not_null: true🐍 Click to view Python code
from syda import SyntheticDataGenerator, ModelConfig
from dotenv import load_dotenv
import os
# Load environment variables from .env file
load_dotenv()
# Configure your AI model
config = ModelConfig(
provider="anthropic",
model_name="claude-haiku-4-5-20251001"
)
# Create generator
generator = SyntheticDataGenerator(model_config=config)
# Define your schemas (structured data only)
schemas = {
"categories": "category_schema.yml",
"products": "product_schema.yml"
}
# Generate synthetic data with relationships intact
results = generator.generate_for_schemas(
schemas=schemas,
sample_sizes={"categories": 5, "products": 20},
output_dir="output",
prompts = {
"Category": "Generate retail product categories with hierarchical structure.",
"Product": "Generate retail products with names, SKUs, prices, and descriptions. Ensure a good variety of prices and categories."
}
)
# Perfect referential integrity guaranteed! 🎯
print("✅ Generated realistic data with perfect foreign key relationships!")Output:
output/
├── categories.csv # 5 product categories with hierarchical structure
└── products.csv # 20 products, all with valid category_id referencesTo generate AI-powered documents along with your structured data, simply add the product catalog schema and update your code:
Click to view document schema (product_catalog_schema.yml)
product_catalog_schema.yml (Document Template):
__template__: true
__description__: Product catalog page template
__name__: ProductCatalog
__depends_on__: [Product, Category]
__foreign_keys__:
product_name: [Product, name]
category_name: [Category, name]
product_price: [Product, price]
product_sku: [Product, sku]
__template_source__: templates/product_catalog.html
__input_file_type__: html
__output_file_type__: pdf
# Product information (linked to Product table)
product_name:
type: string
length: 100
description: Name of the featured product
category_name:
type: string
length: 50
description: Category this product belongs to
product_sku:
type: string
length: 10
description: Product SKU code
product_price:
type: float
decimals: 2
description: Product price in USD
# Marketing content (AI-generated)
product_description:
type: text
length: 500
description: Detailed marketing description of the product
key_features:
type: text
length: 300
description: Bullet points of key product features
marketing_tagline:
type: string
length: 100
description: Catchy marketing tagline for the product
availability_status:
type: string
enum: ["In Stock", "Limited Stock", "Out of Stock", "Pre-Order"]
description: Current availability status🎨 Click to view HTML template (templates/product_catalog.html)
Create the Jinja HTML template (templates/product_catalog.html):
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>{{ product_name }} - Product Catalog</title>
<style>
body {
font-family: 'Arial', sans-serif;
max-width: 800px;
margin: 0 auto;
padding: 40px;
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
color: #333;
}
.catalog-page {
background: white;
padding: 40px;
border-radius: 15px;
box-shadow: 0 10px 30px rgba(0,0,0,0.2);
}
.product-header {
text-align: center;
margin-bottom: 30px;
border-bottom: 3px solid #667eea;
padding-bottom: 20px;
}
.product-name {
font-size: 36px;
font-weight: bold;
color: #2c3e50;
margin-bottom: 10px;
}
.category-sku {
font-size: 16px;
color: #7f8c8d;
margin-bottom: 15px;
}
.price {
font-size: 32px;
color: #e74c3c;
font-weight: bold;
}
.tagline {
font-style: italic;
font-size: 18px;
color: #34495e;
text-align: center;
margin: 20px 0;
padding: 15px;
background: #ecf0f1;
border-radius: 8px;
}
.description {
font-size: 16px;
line-height: 1.6;
margin: 25px 0;
text-align: justify;
}
.features {
background: #f8f9fa;
padding: 20px;
border-radius: 8px;
margin: 25px 0;
}
.features h3 {
color: #2c3e50;
margin-top: 0;
}
.availability {
text-align: center;
font-size: 18px;
font-weight: bold;
padding: 15px;
border-radius: 8px;
margin-top: 30px;
}
.in-stock { background: #d4edda; color: #155724; }
.limited-stock { background: #fff3cd; color: #856404; }
.out-of-stock { background: #f8d7da; color: #721c24; }
.pre-order { background: #d1ecf1; color: #0c5460; }
</style>
</head>
<body>
<div class="catalog-page">
<div class="product-header">
<div class="product-name">{{ product_name }}</div>
<div class="category-sku">{{ category_name }} Category | SKU: {{ product_sku }}</div>
<div class="price">${{ "%.2f"|format(product_price) }}</div>
</div>
<div class="tagline">"{{ marketing_tagline }}"</div>
<div class="description">
{{ product_description }}
</div>
<div class="features">
<h3>KEY FEATURES:</h3>
{{ key_features }}
</div>
<div class="availability {{ availability_status.lower().replace(' ', '-') }}">
Availability: {{ availability_status }}
</div>
</div>
</body>
</html>🐍 Click to view updated Python code (with document generation)
# Same setup as before...
from syda import SyntheticDataGenerator, ModelConfig
from dotenv import load_dotenv
load_dotenv()
config = ModelConfig(provider="anthropic", model_name="claude-haiku-4-5-20251001")
generator = SyntheticDataGenerator(model_config=config)
# Define your schemas (structured data)
schemas = {
"categories": "category_schema.yml",
"products": "product_schema.yml",
# 🆕 Add document templates
"product_catalogs": "product_catalog_schema.yml"
}
# Generate both structured data AND documents
results = generator.generate_for_schemas(
schemas=schemas,
templates=templates, # 🆕 Add this line
sample_sizes={
"categories": 5,
"products": 20,
"product_catalogs": 10 # 🆕 Add this line
},
output_dir="output",
prompts = {
"Category": "Generate retail product categories with hierarchical structure.",
"Product": "Generate retail products with names, SKUs, prices, and descriptions. Ensure a good variety of prices and categories.",
"ProductCatalog": "Generate compelling product catalog pages with marketing descriptions, key features, and sales copy." # 🆕 Add this line
}
)
print("✅ Generated structured data + AI-powered product catalogs!")Enhanced Output:
output/
├── categories.csv # 5 product categories with hierarchical structure
├── products.csv # 20 products, all with valid category_id references
└── product_catalogs/ # AI-generated marketing documents
├── catalog_1.pdf # Product names match products.csv
├── catalog_2.pdf # Prices match products.csv
├── catalog_3.pdf # Perfect data consistency!
├── ...
└── catalog_10.pdfCategories Table:
id,name,parent_id,description,active
1,Electronics,0,Electronic devices and accessories,true
2,Smartphones,1,Mobile phones and accessories,true
3,Laptops,1,Portable computers and accessories,true
4,Clothing,0,Apparel and fashion items,true
5,Men's Clothing,4,Men's apparel and accessories,trueProducts Table (with matching category_id):
id,name,category_id,sku,price,stock_quantity,is_featured
1,iPhone 15 Pro,2,PSM-12345,999.99,50,true
2,MacBook Air M3,3,PLA-67890,1299.99,25,true
3,Samsung Galaxy S24,2,PSA-11111,899.99,75,false
4,Dell XPS 13,3,PDE-22222,1099.99,30,false
5,Men's Cotton T-Shirt,5,PMC-33333,24.99,200,falseGenerated Product Catalog PDF Content:
IPHONE 15 PRO
Smartphones Category | SKU: PSM-12345
$999.99
Revolutionary Performance, Unmatched Design
Experience the future of mobile technology with the iPhone 15 Pro.
Featuring the powerful A17 Pro chip, this device delivers unprecedented
performance for both work and play. The titanium design combines
durability with elegance, while the advanced camera system captures
professional-quality photos and videos.
KEY FEATURES:
• A17 Pro chip with 6-core GPU
• Pro camera system with 3x optical zoom
• Titanium design with Action Button
• USB-C connectivity
• All-day battery life
"Innovation that fits in your pocket"
Availability: In Stock
🎯 Perfect Integration: The PDF catalog contains actual product names, SKUs, and prices from the CSV data, plus AI-generated marketing content - zero inconsistencies!
For advanced scenarios requiring custom calculations or complex business rules, you can add custom generator functions:
🔧 Click to view custom generators example
# Define custom generator functions
def calculate_tax(row, parent_dfs=None, **kwargs):
"""Calculate tax amount based on subtotal and tax rate"""
subtotal = row.get('subtotal', 0)
tax_rate = row.get('tax_rate', 8.5) # Default 8.5%
return round(subtotal * (tax_rate / 100), 2)
def calculate_total(row, parent_dfs=None, **kwargs):
"""Calculate final total: subtotal + tax - discount"""
subtotal = row.get('subtotal', 0)
tax_amount = row.get('tax_amount', 0)
discount = row.get('discount_amount', 0)
return round(subtotal + tax_amount - discount, 2)
def generate_receipt_items(row, parent_dfs=None, **kwargs):
"""Generate receipt items based on actual transactions"""
items = []
if parent_dfs and 'Product' in parent_dfs and 'Transaction' in parent_dfs:
products_df = parent_dfs['Product']
transactions_df = parent_dfs['Transaction']
# Get customer's transactions
customer_id = row.get('customer_id')
customer_transactions = transactions_df[
transactions_df['customer_id'] == customer_id
]
# Build receipt items from actual transaction data
for _, tx in customer_transactions.iterrows():
product = products_df[products_df['id'] == tx['product_id']].iloc[0]
items.append({
"product_name": product['name'],
"sku": product['sku'],
"quantity": int(tx['quantity']),
"unit_price": float(product['price']),
"item_total": round(tx['quantity'] * product['price'], 2)
})
return items
# Add custom generators to your generation
custom_generators = {
"ProductCatalog": {
"tax_amount": calculate_tax,
"total": calculate_total,
"items": generate_receipt_items
}
}
# Generate with custom business logic
results = generator.generate_for_schemas(
schemas=schemas,
templates=templates,
sample_sizes={"categories": 5, "products": 20, "product_catalogs": 10},
output_dir="output",
custom_generators=custom_generators, # 🆕 Add this line
prompts={
"Category": "Generate retail product categories with hierarchical structure.",
"Product": "Generate retail products with names, SKUs, prices, and descriptions.",
"ProductCatalog": "Generate compelling product catalog pages with marketing copy."
}
)
print("✅ Generated data with custom business logic!")🎯 Custom generators let you:
- Calculate fields based on other data (taxes, totals, discounts)
- Access related data from other tables via
parent_dfs- Implement complex business rules (pricing logic, inventory rules)
- Generate structured data (arrays, nested objects, JSON)
Already using SQLAlchemy? Syda works directly with your existing models - no schema conversion needed!
Click to view SQLAlchemy example
from sqlalchemy import Column, Integer, String, Float, ForeignKey, Boolean
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import relationship
from syda import SyntheticDataGenerator, ModelConfig
from dotenv import load_dotenv
load_dotenv()
Base = declarative_base()
# Your existing SQLAlchemy models
class Customer(Base):
__tablename__ = 'customers'
id = Column(Integer, primary_key=True)
name = Column(String(100), nullable=False, comment='Customer organization name')
industry = Column(String(50), comment='Industry sector')
annual_revenue = Column(Float, comment='Annual revenue in USD')
status = Column(String(20), comment='Active, Inactive, or Prospect')
# Relationships work perfectly
contacts = relationship("Contact", back_populates="customer")
class Contact(Base):
__tablename__ = 'contacts'
id = Column(Integer, primary_key=True)
customer_id = Column(Integer, ForeignKey('customers.id'), nullable=False)
first_name = Column(String(50), nullable=False)
last_name = Column(String(50), nullable=False)
email = Column(String(100), nullable=False, unique=True)
position = Column(String(100), comment='Job title')
is_primary = Column(Boolean, comment='Primary contact for customer')
customer = relationship("Customer", back_populates="contacts")
# Generate data directly from your models
config = ModelConfig(provider="anthropic", model_name="claude-haiku-4-5-20251001")
generator = SyntheticDataGenerator(model_config=config)
results = generator.generate_for_sqlalchemy_models(
sqlalchemy_models=[Customer, Contact],
sample_sizes={"Customer": 10, "Contact": 25},
output_dir="crm_data"
)
print("✅ Generated CRM data with perfect foreign key relationships!")Output:
crm_data/
├── customers.csv # 10 companies with realistic industry data
└── contacts.csv # 25 contacts, all with valid customer_id references🎯 Zero Configuration: Your SQLAlchemy
commentsbecome AI generation hints,ForeignKeyrelationships are automatically maintained, andnullable=Falseconstraints are respected!
Already have a database? DatabaseSchemaLoader connects to it, infers all table schemas (columns, types, primary keys, foreign keys), generates synthetic data, and writes it back — no manual schema definition needed.
Supports any SQLAlchemy-compatible database — SQLite, PostgreSQL, MySQL, MariaDB, MS SQL Server, Oracle, and more. Pass any valid SQLAlchemy connection string and it works.
pip install syda sqlalchemy
# PostgreSQL: pip install psycopg2-binary
# MySQL: pip install pymysqlfrom syda import SyntheticDataGenerator, DatabaseSchemaLoader, ModelConfig
from dotenv import load_dotenv
load_dotenv()
loader = DatabaseSchemaLoader("sqlite:///mydb.db")
schemas = loader.load_schemas() # infer schemas as dicts
generator = SyntheticDataGenerator(model_config=ModelConfig(
provider="anthropic", model_name="claude-haiku-4-5-20251001"
))
results = generator.generate_for_schemas(
schemas=schemas,
sample_sizes={"patient": 10, "claim": 20},
output_dir="output"
)
loader.write_to_database(results) # write generated rows backloader = DatabaseSchemaLoader("postgresql+psycopg2://user:pass@localhost/mydb")
# Save one YAML file per table — edit them before generating if needed
schema_files = loader.save_schemas("schemas/", format="yaml")
results = generator.generate_for_schemas(schemas=schema_files, output_dir="output")
loader.write_to_database(results)Output:
output/
├── patient.csv # generated rows
├── provider.csv
└── claim.csv # all foreign keys reference valid parent rows
schemas/ # (Option B only) editable YAML schema files
├── patient.yaml
└── claim.yaml🎯 FK-safe writes:
write_to_database()inserts rows in topological order (parents before children) so referential integrity is preserved in the target database.
Generate synthetic data directly from the terminal without writing a single line of Python.
pip install syda
export ANTHROPIC_API_KEY=your_key # or OPENAI_API_KEY / GEMINI_API_KEYsyda validate --schema schemas/
# [OK] patient
# [OK] provider
# [OK] appointment# Single table → CSV
syda generate --schema patients.yaml --rows 50 --output patients.csv
# Single table → JSON
syda generate --schema patients.yaml --rows 50 --output patients.json
# Multi-table directory → FK-safe CSV output
syda generate --schema schemas/ --rows 100 --output-dir ./data
# Large dataset — chunked direct mode (3 LLM calls of 50 rows each)
syda generate --schema schemas/product.yml --rows 150 --batch-size 50 --output-dir ./data
# Large dataset — code-gen mode (auto-triggered above 500 rows)
syda generate --schema schemas/ --rows 2000 --output-dir ./data
# Force code-gen for any row count
syda generate --schema schemas/ --rows 50 --large-dataset --output-dir ./data# Infer schemas from a live database
syda db infer --db-url sqlite:///mydb.db --output-dir schemas/
# Generate data from a database schema
syda db generate --db-url sqlite:///mydb.db --rows 50 --output-dir ./data
# Generate and write directly back into the database
syda db generate --db-url postgresql://user:pass@localhost/mydb \
--rows 100 --write-back --if-exists replace- name: Validate schemas
run: syda validate --schema schemas/
- name: Generate test fixtures
run: syda generate --schema schemas/ --rows 20 --output-dir tests/fixtures
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}📖 Full CLI reference: python.syda.ai/deep_dive/cli
Syda has two modes that scale to thousands or millions of rows without blowing your token budget.
Split generation into batch_size-row chunks with automatic retry on transient errors:
generator = SyntheticDataGenerator(
model_config=ModelConfig(
provider="anthropic",
model_name="claude-haiku-4-5-20251001",
generation_mode="auto", # direct ≤500 rows, codegen >500
batch_size=50, # rows per LLM call
max_retries=3,
)
)
results = generator.generate_for_schemas(schemas=schemas, sample_sizes={"products": 300}, output_dir="output")
# [syda] Generating chunk 1/6 (rows 1–50 of 300)...
# [syda] Generating chunk 2/6 (rows 51–100 of 300)...
# ...For > 500 rows (auto-selected) or via generation_mode="codegen", the LLM makes one analysis call that writes Python generator functions for simple columns (IDs, dates, enums, emails). Only semantic columns (descriptions, narratives) call the LLM at runtime — regardless of row count.
# 10,000 rows, ~3 LLM calls total (1 analysis + 2 semantic columns)
generator = SyntheticDataGenerator(
model_config=ModelConfig(provider="grok", model_name="grok-4.3", max_tokens=16384)
)
generator.generate_for_schemas(schemas=schemas, sample_sizes={"orders": 10_000}, output_dir="output")Generated Python functions are cached under output_dir/.syda_cache/ — re-runs are instant on cache hits.
Need specific columns to always use LLM generation in code-gen mode? Mark them force_llm: true in the schema:
tagline:
type: text
description: Short marketing tagline (one punchy sentence)
force_llm: true # always LLM-generated, never replaced by a Python functionEvery run produces a cost breakdown accessible via generator.last_report and saved as an HTML report in output_dir/:
generator.last_report.print_summary()
# Table Rows Mode Calls In tok Out tok Cost
# products 200 direct 4 2,168 15,291 $0.24
# orders 5,000 codegen 100 28,300 25,201 $0.46
# order_items 10,000 codegen 0 0 0 $0.00
# TOTAL 15,200 104 30,468 40,492 $0.70# Chunked direct mode
syda generate --schema schemas/product.yml --rows 300 --batch-size 50 --output-dir ./data
# Auto code-gen (>500 rows)
syda generate --schema schemas/product.yml --rows 1000 --output-dir ./data
# Force code-gen
syda generate --schema schemas/ --rows 5000 --large-dataset --output-dir ./data📖 Full guide: python.syda.ai/deep_dive/large_dataset
Run Syda against any OpenAI-compatible API — local models via Ollama, Groq, Together AI, Fireworks, DeepSeek, Mistral, and more — using the openai_compatible provider:
# Install and start Ollama
brew install ollama && brew services start ollama
ollama pull llama3from syda import SyntheticDataGenerator, ModelConfig
generator = SyntheticDataGenerator(
model_config=ModelConfig(
provider="openai_compatible",
model_name="llama3", # any model your server supports
temperature=0.7,
max_tokens=2048,
extra_kwargs={
"base_url": "http://localhost:11434/v1", # Ollama
"api_key": "ollama", # any string for Ollama
# "response_mode": "tools", # for models with native tool-call support
# "response_mode": "json", # for models returning clean JSON
}
)
)response_mode |
When to use |
|---|---|
"markdown" |
Default — model wraps JSON in ```json ``` fences |
"tools" |
Model supports tool calls natively |
"json" |
Model returns clean JSON without fences |
Works with: Ollama · Groq · Together AI · Fireworks · DeepSeek · Mistral · LM Studio · vLLM · Perplexity — any server that speaks the OpenAI API.
We would love your contributions! Syda is an open-source project that thrives on community involvement.
- Report bugs - Help us identify and fix issues
- Suggest features - Share your ideas for new capabilities
- Improve docs - Help make our documentation even better
- Submit code - Fix bugs, add features, optimize performance
- Add examples - Show how Syda works in your domain
- ⭐ Star the repo - Help others discover Syda
- Check our Contributing Guide for detailed instructions
- Browse open issues to find something to work on
- Join discussions in our GitHub Issues and Discussions
- Fork the repo and submit your first pull request!
Looking for ways to contribute? Check out issues labeled:
good first issue- Perfect for newcomershelp wanted- We'd especially appreciate help heredocumentation- Help improve our docsexamples- Add new use cases and examples
Every contribution matters - from fixing typos to adding major features! 🙏
⭐ Star this repo if Syda helps your workflow • 📖 Read the docs for detailed guides • 🐛 Report issues to help us improve
If you use SYDA in your research, publications, or products, please cite it as follows:
APA:
Lingamgunta, R. K. K. (2025). Syda - AI-Powered Synthetic Data Generation (v0.2.0). Zenodo. https://doi.org/10.5281/zenodo.17345575
IEEE:
[1]R. K. K. Lingamgunta, “Syda - AI-Powered Synthetic Data Generation”. Zenodo, 2025. doi: 10.5281/zenodo.17345575.
BibTeX:
@software{Lingamgunta_Syda_-_AI-Powered_2025,
author = {Lingamgunta, Rama Krishna Kumar},
license = {MIT},
title = {{Syda - AI-Powered Synthetic Data Generation}},
url = {https://github.com/syda-ai/syda},
version = {0.2.0},
year = {2025}
}