scripting

Operational scripting toolkit built from hands-on academic IT and instructional operations.

Why this repository exists

This repository captures the scripting patterns I used and refined across two core roles:

Systems & Data Analyst / System Programmer (FUTA, 2017-2023) supporting admissions, examinations, results processing, and reporting workflows at scale
Computer Technology Instructor (Pures College, 2024-2025) teaching practical systems administration, networking, and data workflow implementation in lab environments

The goal is simple: turn recurring operations into reliable, auditable, and repeatable workflows across Linux and Windows, with outputs that can stand up to operational review.

Professional profile

Information Systems Analyst and Computer Science professional (M.Tech. in Computer Science)
Experience supporting large academic IT operations handling 4,000+ student records/users
Delivered data processing automation with Python, MySQL, Excel, and Google Sheets
Built workflows for reconciliation, result integrity checks, and audit-ready reporting
Managed practical IT instruction and assessment workflows for 250+ students

Professional context behind the scripts

These scripts reflect day-to-day operational needs from environments with high-volume records, strict reporting timelines, and compliance-sensitive processes:

reconciliation of admissions and results records between operational extracts and approved reporting outputs
data quality checks before grade publication windows and committee sign-off
repeatable ETL runs for admissions and academic performance datasets
checkpoint/SLA monitoring for exam readiness, compliance, and support tasks
audit packet preparation for leadership, registry, and quality-assurance review teams
lightweight system and database smoke checks before reporting cycles

What this toolkit is used for now

building reusable data-quality routines for CSV/Excel pipelines
validating key records and catching duplicates, nulls, and status anomalies early
documenting reconciliation outcomes for transparent decision-making and approvals
handling incomplete or inconsistent identifiers with fuzzy reconciliation suggestions
resolving source-vs-target field conflicts with explicit survivorship priority rules
supporting analytics preparation for Power BI/Excel reporting layers
creating cross-platform workflows that can run as scheduled jobs

How these scripts were applied in practice

Systems & Data Analyst / System Programmer

supported recurring admissions and examination data cycles with Python, MySQL, Excel, and Google Sheets
reduced manual record handling risk through repeatable validation, reconciliation, and ETL routines
used discrepancy reports and integrity checks to prepare result data before publication workflows
supported operational systems handling 4,000+ records/users, where repeatability and auditability were essential

Computer Technology Instructor

used script-driven data validation examples to teach practical SQL/Python workflows
adapted the same reconciliation and quality-check patterns for student lab assignments and project assessments
reinforced operational thinking: validate inputs, detect anomalies early, and produce traceable outputs

Business and enterprise applicability

Although this repository uses academic-style sample data, the workflow pattern is directly transferable to enterprise operations:

customer/master-data quality validation
CRM/ERP reconciliation and variance reporting
SLA monitoring for operations and support queues
compliance evidence packaging and audit preparation
ETL pre-flight checks and post-run summaries for BI teams

Repository layout

scripts/
  sql/
    mysql/
    postgres/
    sqlite/
    templates/
  python/
    data_quality/
    reconciliation/
    reporting/
    etl/
    systems/
  workflow/
data/
  sample/
  seed/
powerbi/
  powerquery/
google-sheets/
  apps-script/

Expanded sample data included

The sample datasets are intentionally richer to mirror realistic operational review scenarios.

data/sample/student_records_source.csv
- 140 student records across admissions cohorts and multiple departments
- includes intentional quality issues: duplicate IDs, missing email/department, out-of-range scores, and status exceptions
data/sample/student_records_target.csv
- 140 records with overlapping and non-overlapping IDs for reconciliation behavior
- includes realistic score/status/department differences for mismatch reporting
data/sample/exam_tasks.csv
- 140 operational checkpoints tied to admissions, exam integrity, reporting, and compliance
- includes overdue, at-risk, completed-on-time, and late-completed patterns for SLA testing
data/sample/validation_rules.json
- JSON rule pack for config-driven CSV validation
- includes required-field, score-range, status, and conditional validation examples

Equivalent SQL seed data is available in: data/seed/sql_seed_demo_academic_dataset/{mysql.sql,postgres.sql,sqlite.sql}

Python setup

python3 -m venv .venv
source .venv/bin/activate
python -m pip install -U pip
python -m pip install pandas openpyxl psutil python-dotenv

Quick start

Seed a demo database using files in data/seed/sql_seed_demo_academic_dataset/
Run data profiling, normalization, and rules-based validation scripts in scripts/python/data_quality/
Run reconciliation and SLA scripts in scripts/python/reconciliation/ and scripts/python/reporting/
Use workflow wrappers in scripts/workflow/ for scheduled execution

Example commands

# Profile source dataset
python scripts/python/data_quality/csv_profile.py \
  --input data/sample/student_records_source.csv \
  --output reports/student_profile.json

# Clean and normalize source dataset
python scripts/python/data_quality/csv_clean_normalize.py \
  --input data/sample/student_records_source.csv \
  --output output/student_records_clean.csv \
  --date-columns due_date completed_at \
  --drop-duplicates

# Run config-driven validation rules
python scripts/python/data_quality/config_rules_validator.py \
  --input data/sample/student_records_source.csv \
  --rules data/sample/validation_rules.json \
  --output reports/rules_violations.csv \
  --summary reports/rules_validation_summary.json

# Reconcile source vs target
python scripts/python/reconciliation/reconcile_students.py \
  --source data/sample/student_records_source.csv \
  --target data/sample/student_records_target.csv \
  --output reports/reconciliation.csv \
  --summary reports/reconciliation_summary.json

# Fuzzy reconciliation when IDs are missing or inconsistent
python scripts/python/reconciliation/fuzzy_match_students.py \
  --source data/sample/student_records_source.csv \
  --target data/sample/student_records_target.csv \
  --output reports/reconciliation_fuzzy.csv \
  --summary reports/reconciliation_fuzzy_summary.json \
  --threshold 0.86

# Survivorship merge for source-vs-target conflicts
python scripts/python/reconciliation/survivorship_merge_students.py \
  --source data/sample/student_records_source.csv \
  --target data/sample/student_records_target.csv \
  --output reports/reconciliation_survivorship.csv \
  --summary reports/reconciliation_survivorship_summary.json \
  --priority target source

# SLA at-risk report
python scripts/python/reporting/sla_at_risk_report.py \
  --input data/sample/exam_tasks.csv \
  --output reports/sla_at_risk.csv \
  --summary reports/sla_summary.json \
  --threshold-days 2

# Config-driven ETL apply run
python scripts/python/etl/etl_runner.py \
  --config data/sample/etl_config.json \
  --apply

# Build audit packet (CSV outputs + optional Excel workbook when openpyxl is available)
python scripts/python/reporting/excel_export_audit_packet.py \
  --input data/sample/student_records_source.csv \
  --output-dir reports/audit_packet

# System health snapshot
python scripts/python/systems/system_health_snapshot.py \
  --output reports/system_snapshot.json

# Database smoke tests (SQLite demo mode)
python scripts/python/systems/db_smoke_test.py \
  --output reports/db_smoke.json

# Excel workbook validation (replace with your workbook path)
python scripts/python/data_quality/excel_validate_workbook.py \
  --input path/to/workbook.xlsx \
  --output reports/excel_validation_summary.json

# Daily workflow wrappers
bash scripts/workflow/schedule_daily_run.sh
# Optional: capped retries per step (example: 2 retries after initial attempt)
MAX_RETRIES=2 bash scripts/workflow/schedule_daily_run.sh
# Optional: keep only the newest N timestamped run artifacts
RETAIN_RUN_ARTIFACTS=10 bash scripts/workflow/schedule_daily_run.sh
# powershell -ExecutionPolicy Bypass -File scripts/workflow/schedule_daily_run.ps1
# powershell -ExecutionPolicy Bypass -File scripts/workflow/schedule_daily_run.ps1 -MaxRetries 2
# powershell -ExecutionPolicy Bypass -File scripts/workflow/schedule_daily_run.ps1 -RetainRunArtifacts 10

# Build run manifest directly (useful after ad-hoc script execution)
python scripts/python/reporting/run_manifest.py \
  --run-id 20260218-153323 \
  --status success \
  --report-dir reports \
  --output-dir output \
  --log-file logs/daily-run-20260218-153323.log \
  --manifest reports/run_manifest-20260218-153323.json

# Update cross-run history index from a manifest
python scripts/python/reporting/run_history_index.py \
  --manifest reports/run_manifest-20260218-153323.json \
  --history reports/run_history_index.json \
  --max-entries 50

Milestone B close-out run matrix (first-wave coverage)

This matrix keeps each script family tied to clear run commands, expected outputs, and quick verification points.

Family	Primary assets	Typical input	Expected artifact(s)
SQL integrity checks	`scripts/sql/mysql/mysql_validate_results_integrity.sql`, `scripts/sql/postgres/postgres_validate_results_integrity.sql`, `scripts/sql/templates/sql_reconciliation_diff_template.sql`	Seeded `student_records` tables	Query result sets for missing/duplicate/invalid records and source-vs-target diffs
SQL seed demos	`data/seed/sql_seed_demo_academic_dataset/{mysql.sql,postgres.sql,sqlite.sql}`	Local MySQL/Postgres/SQLite demo DB	Reproducible seeded dataset for script demos
Data quality (CSV/Excel)	`csv_profile.py`, `csv_clean_normalize.py`, `config_rules_validator.py`, `excel_validate_workbook.py`	`data/sample/student_records_source.csv` (+ local `.xlsx` for workbook checks)	`reports/student_profile.json`, `output/student_records_clean.csv`, `reports/rules_violations.csv`, `reports/rules_validation_summary.json`, `reports/excel_validation_summary.json`
Reconciliation	`reconcile_students.py`, `fuzzy_match_students.py`, `survivorship_merge_students.py`	`data/sample/student_records_source.csv` + `data/sample/student_records_target.csv`	`reports/reconciliation.csv` + matching `_summary.json` files
Reporting + SLA	`sla_at_risk_report.py`, `excel_export_audit_packet.py`	`data/sample/exam_tasks.csv` and student records CSV	`reports/sla_at_risk.csv`, `reports/sla_summary.json`, `reports/audit_packet/*`
ETL	`etl_runner.py` with `data/sample/etl_config.json`	Config + source CSV path	`output` CSV and ETL summary JSON configured in ETL config
Systems	`system_health_snapshot.py`, `db_smoke_test.py`	Host metrics + optional DB env vars	`reports/system_snapshot.json`, `reports/db_smoke.json`
Workflow wrappers	`scripts/workflow/schedule_daily_run.sh`, `scripts/workflow/schedule_daily_run.ps1`	Built-in sample dataset paths (+ optional retry cap: `MAX_RETRIES` / `-MaxRetries` and retention cap: `RETAIN_RUN_ARTIFACTS` / `-RetainRunArtifacts`)	Timestamped `logs/daily-run-.log`, `reports/step_status-.csv` (attempt-aware rows), `reports/run_manifest-*.json` (including `steps_retried`), `reports/run_history_index.json` (cross-run summary), and automatic pruning of older run logs/manifests/step-status files beyond retention cap
Google Sheets	`google-sheets/apps-script/apps_script_validation_rules.gs`	Google Sheet tab (`SHEET_NAME`)	In-sheet validation rules + exported sanitized CSV to Drive folder
Power Query	`powerbi/powerquery/powerquery_data_quality_template.pq`	Power Query source connector/table	DQ columns (`dq_issue`, `dq_status`) for downstream BI filters

Quick verification checklist

Run the Python commands above and confirm fresh timestamps in reports/ and output/.
Execute bash scripts/workflow/schedule_daily_run.sh and confirm new logs/daily-run-*.log, reports/step_status-*.csv, and reports/run_manifest-*.json files.
Re-run with retries enabled (MAX_RETRIES=2 bash scripts/workflow/schedule_daily_run.sh) and confirm step rows include attempt/max_attempts values; when retries occur, confirm steps_retried increases in run_manifest-*.json.
Re-run with a low retention value (RETAIN_RUN_ARTIFACTS=1 bash scripts/workflow/schedule_daily_run.sh) and confirm older daily-run-*.log, run_manifest-*.json, and step_status-*.csv files are pruned.
Confirm reports/run_history_index.json exists and includes a fresh entry with run_id, status, and counts from the latest manifest.
Run at least one SQL integrity script against seeded demo data to confirm expected mismatch/duplicate outputs.
For Sheets/Power Query assets, confirm template import executes and outputs validation status columns/exports as expected.

Portfolio alignment

This toolkit maps directly to the types of projects I have delivered:

Academic Results Analytics Dashboard
- feeds cleaner, validated datasets into BI layers
- supports anomaly tracking and KPI reliability
Admissions Data Quality Audit
- applies duplicate detection, field validation, and reconciliation workflows
Examination Integrity Monitoring
- structures checkpoint/SLA monitoring and evidence-oriented reporting

Dependencies

Python: pandas, openpyxl, psutil, python-dotenv
MySQL client: optional (for DB smoke tests)
Postgres client: optional (for DB smoke tests)

Notes

Scripts default to safe/read-only behavior where possible.
Keep generated outputs in output/, reports/, or artifacts/ (already gitignored).
Use scripts/workflow/schedule_daily_run.sh (Linux) or scripts/workflow/schedule_daily_run.ps1 (Windows) for full routine execution.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
data		data
google-sheets/apps-script		google-sheets/apps-script
powerbi/powerquery		powerbi/powerquery
scripts		scripts
.gitignore		.gitignore
README.md		README.md
autoRepo-xx.sh		autoRepo-xx.sh
autoRepo.sh		autoRepo.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scripting

Why this repository exists

Professional profile

Professional context behind the scripts

What this toolkit is used for now

How these scripts were applied in practice

Systems & Data Analyst / System Programmer

Computer Technology Instructor

Business and enterprise applicability

Repository layout

Expanded sample data included

Python setup

Quick start

Example commands

Milestone B close-out run matrix (first-wave coverage)

Quick verification checklist

Portfolio alignment

Dependencies

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

scripting

Why this repository exists

Professional profile

Professional context behind the scripts

What this toolkit is used for now

How these scripts were applied in practice

Systems & Data Analyst / System Programmer

Computer Technology Instructor

Business and enterprise applicability

Repository layout

Expanded sample data included

Python setup

Quick start

Example commands

Milestone B close-out run matrix (first-wave coverage)

Quick verification checklist

Portfolio alignment

Dependencies

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages