diff --git a/kits/automation/time-series-preprocessor/.env.example b/kits/automation/time-series-preprocessor/.env.example new file mode 100644 index 00000000..aeca4c07 --- /dev/null +++ b/kits/automation/time-series-preprocessor/.env.example @@ -0,0 +1,4 @@ +TIME_SERIES_PREPROCESSOR="Your Flow ID from Lamatic Studio" +LAMATIC_API_URL="Your API Endpoint URL" +LAMATIC_PROJECT_ID="Your Project ID" +LAMATIC_API_KEY="Your API Key" diff --git a/kits/automation/time-series-preprocessor/.gitignore b/kits/automation/time-series-preprocessor/.gitignore new file mode 100644 index 00000000..0d1e747b --- /dev/null +++ b/kits/automation/time-series-preprocessor/.gitignore @@ -0,0 +1,29 @@ +# See https://help.github.com/articles/ignoring-files/ for more about ignoring files. + +# dependencies +/node_modules + +# next.js +/.next/ +/out/ + +# production +/build + +# debug +npm-debug.log* +yarn-debug.log* +yarn-error.log* +.pnpm-debug.log* + +# env files +.env + +# vercel +.vercel + +# typescript +*.tsbuildinfo +next-env.d.ts + +.env.local \ No newline at end of file diff --git a/kits/automation/time-series-preprocessor/README.md b/kits/automation/time-series-preprocessor/README.md new file mode 100644 index 00000000..3962d626 --- /dev/null +++ b/kits/automation/time-series-preprocessor/README.md @@ -0,0 +1,162 @@ +# Time-Series Preprocessor — Lamatic AgentKit + +An automation kit that analyzes time-series dataset schemas and generates production-ready Python preprocessing pipelines using `pandas` and `scikit-learn`. Paste a JSON summary of your dataset and receive executable code in seconds. + +--- + +## What It Does + +By providing a JSON summary of your dataset, the agent generates a complete Python script that handles: + +- **Missing value imputation** — forward-fill, mean, and median strategies selected based on column type +- **Feature scaling** — MinMaxScaler or StandardScaler applied appropriately +- **Datetime parsing and index management** — automatic timestamp detection and alignment +- **Categorical encoding** — label or one-hot encoding based on cardinality +- **Standardized implementation** — clean, readable `pandas` + `scikit-learn` code ready to run + +--- + +## Project Background + +This kit was developed to solve the repetitive nature of data cleaning in time-series projects. + +The concept originated during the development of a **water demand forecasting model** for a college located in a rural area near Bhopal. Due to aging sensor hardware and inconsistent data streams, significant time was spent manually handling missing values and aligning disparate data sources — rainfall readings, local water levels, and consumption logs from different systems with mismatched timestamps. + +The goal was to automate the boilerplate preprocessing code, allowing engineers to focus on model performance rather than manual cleanup. This kit is the result of that experience. + +--- + +## Who Is It For + +Data engineers and machine learning engineers who frequently work with time-series data and need a fast way to generate reliable, repeatable preprocessing pipelines without writing boilerplate code from scratch. + +--- + +## Tech Stack + +| Tool | Role | +|---|---| +| [Lamatic.ai](https://lamatic.ai) | Flow orchestration and Edge deployment | +| Gemini 2.5 Pro | AI-driven Python code generation | +| Next.js 14 | Interactive frontend sandbox | +| pandas + scikit-learn | Target libraries for generated pipelines | + +--- + +## Setup + +### 1. Build and Deploy Flow in Lamatic Studio + +1. Sign in at [lamatic.ai](https://lamatic.ai) +2. Create a new project (if you don't have one) +3. Click **+ New Flow** and select **API Request** as the trigger +4. Add a **Generate Text** node and select **Gemini 2.5 Pro** +5. Set the input variable to `dataset_summary` +6. Configure the system prompt to act as an expert data engineer +7. Deploy the flow and copy your credentials from the Studio dashboard + +### 2. Environment Variables + +Create a `.env.local` file in the kit root directory: + +```env +TIME_SERIES_PREPROCESSOR="Your Flow ID from Lamatic Studio" +LAMATIC_API_URL="Your API Endpoint URL" +LAMATIC_PROJECT_ID="Your Project ID" +LAMATIC_API_KEY="Your API Key" +``` + +| Variable | Where to Find It | +|---|---| +| `TIME_SERIES_PREPROCESSOR` | Studio → Your Flow → Settings | +| `LAMATIC_API_URL` | Studio → Your Flow → API Endpoint | +| `LAMATIC_PROJECT_ID` | Studio → Project Settings | +| `LAMATIC_API_KEY` | Studio → Project Settings → API Keys | + +### 3. Install and Run + +```bash +npm install +npm run dev +``` + +Open [http://localhost:3000](http://localhost:3000) to access the frontend interface. + +--- + +## Example Input + +Provide a JSON object describing your dataset structure: + +```json +{ + "dataset_name": "sensor_readings", + "frequency": "1min", + "columns": [ + {"name": "timestamp", "type": "datetime"}, + {"name": "temperature", "type": "float", "missing_pct": 5}, + {"name": "pressure", "type": "float", "missing_pct": 2}, + {"name": "status", "type": "categorical", "missing_pct": 0} + ], + "rows": 50000, + "target_column": "temperature" +} +``` + +## Example Output + +The agent returns a fully executable Python script: + +```python +import pandas as pd +from sklearn.preprocessing import MinMaxScaler + +# Load and index +df = pd.read_csv("sensor_readings.csv", parse_dates=["timestamp"]) +df.set_index("timestamp", inplace=True) + +# Impute missing values +df["temperature"].fillna(method="ffill", inplace=True) +df["pressure"].fillna(df["pressure"].mean(), inplace=True) + +# Scale numerical features +scaler = MinMaxScaler() +df[["temperature", "pressure"]] = scaler.fit_transform(df[["temperature", "pressure"]]) + +print("Preprocessing complete.") +print(df.head()) +``` + +--- + +## Project Structure + +````text +time-series-preprocessor/ +├── actions/ +│ └── orchestrate.ts # Server action calling the Lamatic flow +├── app/ +│ └── page.tsx # Main UI — input form and output display +├── components/ +│ └── ui/ # shadcn/ui components +├── flows/ +│ └── time-series-preprocessor/ +│ ├── config.json # Exported Lamatic flow graph +│ ├── inputs.json # Input schema definition +│ └── meta.json # Flow metadata +├── lib/ +│ └── lamatic-client.ts # Lamatic SDK client +├── .env.example # Environment variable template +├── config.json # Kit metadata +└── README.md +```` + +--- + +## Contributing + +Contributions are welcome. Open an issue or pull request in the [AgentKit repository](https://github.com/Lamatic/AgentKit). + +## License + +MIT License — see [LICENSE](../../../LICENSE). \ No newline at end of file diff --git a/kits/automation/time-series-preprocessor/actions/orchestrate.ts b/kits/automation/time-series-preprocessor/actions/orchestrate.ts new file mode 100644 index 00000000..bafc66bf --- /dev/null +++ b/kits/automation/time-series-preprocessor/actions/orchestrate.ts @@ -0,0 +1,31 @@ +"use server"; + +import { createLamaticClient } from "@/lib/lamatic-client"; + +const client = createLamaticClient(); + +export async function preprocessTimeSeries(datasetSummary: string) { + if (!process.env.TIME_SERIES_PREPROCESSOR) { + throw new Error("TIME_SERIES_PREPROCESSOR environment variable is not set"); + } + + try { + const response = await client.executeFlow({ + flowId: process.env.TIME_SERIES_PREPROCESSOR, + inputs: { + dataset_summary: datasetSummary, + }, + }); + + return { + success: true, + result: response?.data?.generatedText || "", + }; + } catch (error) { + console.error("Error calling Lamatic flow:", error); + return { + success: false, + result: "Failed to generate preprocessing pipeline. Please try again.", + }; + } +} \ No newline at end of file diff --git a/kits/automation/time-series-preprocessor/app/globals.css b/kits/automation/time-series-preprocessor/app/globals.css new file mode 100644 index 00000000..90357922 --- /dev/null +++ b/kits/automation/time-series-preprocessor/app/globals.css @@ -0,0 +1,15 @@ +@tailwind base; +@tailwind components; +@tailwind utilities; + +* { + box-sizing: border-box; + margin: 0; + padding: 0; +} + +body { + font-family: 'Inter', sans-serif; + background-color: #f5f5f5; + color: #111; +} \ No newline at end of file diff --git a/kits/automation/time-series-preprocessor/app/layout.tsx b/kits/automation/time-series-preprocessor/app/layout.tsx new file mode 100644 index 00000000..094e778a --- /dev/null +++ b/kits/automation/time-series-preprocessor/app/layout.tsx @@ -0,0 +1,29 @@ +import type { Metadata } from "next"; +import { ThemeProvider } from "@/components/ThemeProvider" +import "./globals.css"; + +export const metadata: Metadata = { + title: "Time-Series Preprocessor — Lamatic AgentKit", + description: "AI-powered time-series preprocessing pipeline generator", +}; + +export default function RootLayout({ + children, +}: { + children: React.ReactNode; +}) { + return ( + + + + + + + + + {children} + + + + ); +} \ No newline at end of file diff --git a/kits/automation/time-series-preprocessor/app/page.tsx b/kits/automation/time-series-preprocessor/app/page.tsx new file mode 100644 index 00000000..6749ed88 --- /dev/null +++ b/kits/automation/time-series-preprocessor/app/page.tsx @@ -0,0 +1,225 @@ +"use client"; + +import { useState } from "react"; +import { preprocessTimeSeries } from "@/actions/orchestrate"; + +const EXAMPLE_INPUT = `{ + "dataset_name": "sensor_readings", + "frequency": "1min", + "columns": [ + {"name": "timestamp", "type": "datetime"}, + {"name": "temperature", "type": "float", "missing_pct": 5}, + {"name": "pressure", "type": "float", "missing_pct": 2}, + {"name": "status", "type": "categorical", "missing_pct": 0} + ], + "rows": 50000, + "target_column": "temperature" +}`; + +const FEATURES = [ + { title: "Missing value imputation", desc: "Forward-fill, mean, and median strategies selected based on column type" }, + { title: "Feature scaling", desc: "MinMaxScaler or StandardScaler applied appropriately" }, + { title: "Datetime parsing and index management", desc: "Automatic timestamp detection and alignment" }, + { title: "Categorical encoding", desc: "Label or one-hot encoding based on cardinality" }, + { title: "Standardized implementation", desc: "Clean, readable pandas + scikit-learn code ready to run" }, +]; + +export default function Home() { + const [input, setInput] = useState(""); + const [output, setOutput] = useState(""); + const [loading, setLoading] = useState(false); + const [error, setError] = useState(""); + const [copied, setCopied] = useState(false); + + async function handleSubmit() { + if (!input.trim()) { setError("Please enter a dataset summary."); return; } + setLoading(true); setError(""); setOutput(""); + const result = await preprocessTimeSeries(input); + if (result.success) { + setOutput(result.result); + setTimeout(() => { + document.getElementById("output")?.scrollIntoView({ behavior: "smooth" }); + }, 100); + } else { + setError(result.result); + } + setLoading(false); + } + + function handleCopy() { + navigator.clipboard.writeText(output); + setCopied(true); + setTimeout(() => setCopied(false), 2000); + } + + return ( +
+ + {/* Navbar */} + + + {/* Hero */} +
+
+ AgentKit — Automation Kit +
+

+ Turn Dataset Schemas into
+ Python Pipelines Instantly +

+

+ Paste a JSON summary of your time-series dataset and receive a production-ready preprocessing script using pandas and scikit-learn. +

+
+ {["Saves Hours of Boilerplate", "Preprocessing Pipelines", "Ready-to-Run Python Scripts"].map(tag => ( + {tag} + ))} +
+ + Try It Now ↓ + +
+ + {/* Main Content */} +
+ + {/* What It Does */} +
+

+ What It Does +

+

+ By providing a JSON summary of your dataset, the agent generates a complete Python script that handles: +

+
+ {FEATURES.map(item => ( +
+ +
+ {item.title} + — {item.desc} +
+
+ ))} +
+
+ + {/* Input Card */} +
+
+
+
+ Dataset Summary + JSON +
+ +
+