Skip to content

Add templates for csv mc int#825

Open
klei22 wants to merge 4 commits into
ReaLLMASIC:masterfrom
klei22:add-templates-for-csv-mc-int
Open

Add templates for csv mc int#825
klei22 wants to merge 4 commits into
ReaLLMASIC:masterfrom
klei22:add-templates-for-csv-mc-int

Conversation

@klei22

@klei22 klei22 commented May 30, 2026

Copy link
Copy Markdown
Collaborator
image

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds CSV integer multicontext dataset preparation, sampling, demos, and roomba visualization utilities for training and generating tabular integer continuations.

Changes:

  • Adds CSV integer dataset builder scripts, manifest/sample input, and documentation.
  • Extends sample.py to accept CSV multicontext prompts and emit generated CSV outputs.
  • Adds demo/monitoring scripts plus a roomba grayscale/pose HTML viewer.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
sample.py Adds CSV prompt parsing, integer-range tokenizer support, and CSV sample output.
demos/csv_mc_int_demo.sh Adds end-to-end CSV multicontext training/sampling demo.
demos/csv_mc_cone_monitor.py Adds polling monitor that samples multiple futures and plots them.
data/roomba/roomba_grayscale_viewer.html Adds browser viewer for roomba pose and grayscale pixel columns.
data/csv_mc_int/run_roomba_dataset.sh Adds roomba conditioning + dataset generation wrapper.
data/csv_mc_int/roomba_data_conditioning.py Adds roomba CSV integer conditioning script.
data/csv_mc_int/README.md Documents CSV integer multicontext workflow.
data/csv_mc_int/prepare_csv_integer_multicontext.py Adds generic CSV-to-multicontext dataset preparation tool.
data/csv_mc_int/manifest.json Adds generated roomba multicontext manifest.
data/csv_mc_int/input.csv Adds small roomba-style input CSV sample.
data/csv_mc_int/get_dataset.sh Adds shell wrapper for dataset preparation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1 to +2

#!/usr/bin/env bash
Comment on lines +235 to +238
manifest = {
"tokenizer": "csv_integer_range_multicontext_manifest",
"source_csv": str(input_csv),
"has_header": bool(args.has_header),
Comment thread demos/csv_mc_int_demo.sh
Comment on lines +20 to +24
mapfile -t DATASETS < <(python3 - <<PY
import json
from pathlib import Path
manifest = json.loads(Path('data/$OUTPUT_ROOT/manifest.json').read_text())
for dataset in manifest['multicontext_datasets']:
Comment on lines +175 to +178
"--top_k",
"1",
"--num_samples",
str(args.cone_width),
Comment on lines +178 to +180
<label>
Timestamp modulo
<input id="timestampModulo" type="number" min="0" step="1" value="1000" />
Comment on lines +181 to +187
if not args.has_header:
cmd.append("--no-multicontext_csv_has_header")

print("Running:", " ".join(cmd))
subprocess.run(cmd, check=True)
sample_csvs = sorted(run_samples_dir.glob("*.csv"))
write_cone_plot(snapshot, sample_csvs, plot_path)
Comment thread data/csv_mc_int/manifest.json Outdated
@@ -0,0 +1,373 @@
{
"tokenizer": "csv_integer_range_multicontext_manifest",
"source_csv": "/home/kauna/nanogpt_csv_generic_mc_forecast/data/csv_mc_int/roomba_integer.csv",
Comment thread sample.py
Comment on lines +1189 to +1192
if has_header:
headers = [cell.strip() for cell in first]
if any(not header for header in headers):
raise ValueError("CSV prompt header cells must be non-empty")
Comment on lines +130 to +132
for idx, (header, context_name) in enumerate(zip(headers, context_names)):
for alias in column_aliases(header, context_name, idx):
alias_to_idx[alias] = idx
Comment thread data/csv_mc_int/README.md
Comment on lines +21 to +26
```bash
data/csv_mc_int/get_dataset.sh data/csv_mc_int/input.csv \
--output_root csv_mc_int \
--range time:0:100 \
--range temp:0:100 \
--range pressure:900:1100
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants