Skip to content

toughdave/scripting

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

scripting

Operational scripting toolkit built from hands-on academic IT and instructional operations.

Why this repository exists

This repository captures the scripting patterns I used and refined across two core roles:

  • Systems & Data Analyst / System Programmer (FUTA, 2017-2023) supporting admissions, examinations, results processing, and reporting workflows at scale
  • Computer Technology Instructor (Pures College, 2024-2025) teaching practical systems administration, networking, and data workflow implementation in lab environments

The goal is simple: turn recurring operations into reliable, auditable, and repeatable workflows across Linux and Windows, with outputs that can stand up to operational review.

Professional profile

  • Information Systems Analyst and Computer Science professional (M.Tech. in Computer Science)
  • Experience supporting large academic IT operations handling 4,000+ student records/users
  • Delivered data processing automation with Python, MySQL, Excel, and Google Sheets
  • Built workflows for reconciliation, result integrity checks, and audit-ready reporting
  • Managed practical IT instruction and assessment workflows for 250+ students

Professional context behind the scripts

These scripts reflect day-to-day operational needs from environments with high-volume records, strict reporting timelines, and compliance-sensitive processes:

  • reconciliation of admissions and results records between operational extracts and approved reporting outputs
  • data quality checks before grade publication windows and committee sign-off
  • repeatable ETL runs for admissions and academic performance datasets
  • checkpoint/SLA monitoring for exam readiness, compliance, and support tasks
  • audit packet preparation for leadership, registry, and quality-assurance review teams
  • lightweight system and database smoke checks before reporting cycles

What this toolkit is used for now

  • building reusable data-quality routines for CSV/Excel pipelines
  • validating key records and catching duplicates, nulls, and status anomalies early
  • documenting reconciliation outcomes for transparent decision-making and approvals
  • handling incomplete or inconsistent identifiers with fuzzy reconciliation suggestions
  • resolving source-vs-target field conflicts with explicit survivorship priority rules
  • supporting analytics preparation for Power BI/Excel reporting layers
  • creating cross-platform workflows that can run as scheduled jobs

How these scripts were applied in practice

Systems & Data Analyst / System Programmer

  • supported recurring admissions and examination data cycles with Python, MySQL, Excel, and Google Sheets
  • reduced manual record handling risk through repeatable validation, reconciliation, and ETL routines
  • used discrepancy reports and integrity checks to prepare result data before publication workflows
  • supported operational systems handling 4,000+ records/users, where repeatability and auditability were essential

Computer Technology Instructor

  • used script-driven data validation examples to teach practical SQL/Python workflows
  • adapted the same reconciliation and quality-check patterns for student lab assignments and project assessments
  • reinforced operational thinking: validate inputs, detect anomalies early, and produce traceable outputs

Business and enterprise applicability

Although this repository uses academic-style sample data, the workflow pattern is directly transferable to enterprise operations:

  • customer/master-data quality validation
  • CRM/ERP reconciliation and variance reporting
  • SLA monitoring for operations and support queues
  • compliance evidence packaging and audit preparation
  • ETL pre-flight checks and post-run summaries for BI teams

Repository layout

scripts/
  sql/
    mysql/
    postgres/
    sqlite/
    templates/
  python/
    data_quality/
    reconciliation/
    reporting/
    etl/
    systems/
  workflow/
data/
  sample/
  seed/
powerbi/
  powerquery/
google-sheets/
  apps-script/

Expanded sample data included

The sample datasets are intentionally richer to mirror realistic operational review scenarios.

  • data/sample/student_records_source.csv
    • 140 student records across admissions cohorts and multiple departments
    • includes intentional quality issues: duplicate IDs, missing email/department, out-of-range scores, and status exceptions
  • data/sample/student_records_target.csv
    • 140 records with overlapping and non-overlapping IDs for reconciliation behavior
    • includes realistic score/status/department differences for mismatch reporting
  • data/sample/exam_tasks.csv
    • 140 operational checkpoints tied to admissions, exam integrity, reporting, and compliance
    • includes overdue, at-risk, completed-on-time, and late-completed patterns for SLA testing
  • data/sample/validation_rules.json
    • JSON rule pack for config-driven CSV validation
    • includes required-field, score-range, status, and conditional validation examples

Equivalent SQL seed data is available in: data/seed/sql_seed_demo_academic_dataset/{mysql.sql,postgres.sql,sqlite.sql}

Python setup

python3 -m venv .venv
source .venv/bin/activate
python -m pip install -U pip
python -m pip install pandas openpyxl psutil python-dotenv

Quick start

  1. Seed a demo database using files in data/seed/sql_seed_demo_academic_dataset/
  2. Run data profiling, normalization, and rules-based validation scripts in scripts/python/data_quality/
  3. Run reconciliation and SLA scripts in scripts/python/reconciliation/ and scripts/python/reporting/
  4. Use workflow wrappers in scripts/workflow/ for scheduled execution

Example commands

# Profile source dataset
python scripts/python/data_quality/csv_profile.py \
  --input data/sample/student_records_source.csv \
  --output reports/student_profile.json

# Clean and normalize source dataset
python scripts/python/data_quality/csv_clean_normalize.py \
  --input data/sample/student_records_source.csv \
  --output output/student_records_clean.csv \
  --date-columns due_date completed_at \
  --drop-duplicates

# Run config-driven validation rules
python scripts/python/data_quality/config_rules_validator.py \
  --input data/sample/student_records_source.csv \
  --rules data/sample/validation_rules.json \
  --output reports/rules_violations.csv \
  --summary reports/rules_validation_summary.json

# Reconcile source vs target
python scripts/python/reconciliation/reconcile_students.py \
  --source data/sample/student_records_source.csv \
  --target data/sample/student_records_target.csv \
  --output reports/reconciliation.csv \
  --summary reports/reconciliation_summary.json

# Fuzzy reconciliation when IDs are missing or inconsistent
python scripts/python/reconciliation/fuzzy_match_students.py \
  --source data/sample/student_records_source.csv \
  --target data/sample/student_records_target.csv \
  --output reports/reconciliation_fuzzy.csv \
  --summary reports/reconciliation_fuzzy_summary.json \
  --threshold 0.86

# Survivorship merge for source-vs-target conflicts
python scripts/python/reconciliation/survivorship_merge_students.py \
  --source data/sample/student_records_source.csv \
  --target data/sample/student_records_target.csv \
  --output reports/reconciliation_survivorship.csv \
  --summary reports/reconciliation_survivorship_summary.json \
  --priority target source

# SLA at-risk report
python scripts/python/reporting/sla_at_risk_report.py \
  --input data/sample/exam_tasks.csv \
  --output reports/sla_at_risk.csv \
  --summary reports/sla_summary.json \
  --threshold-days 2

# Config-driven ETL apply run
python scripts/python/etl/etl_runner.py \
  --config data/sample/etl_config.json \
  --apply

# Build audit packet (CSV outputs + optional Excel workbook when openpyxl is available)
python scripts/python/reporting/excel_export_audit_packet.py \
  --input data/sample/student_records_source.csv \
  --output-dir reports/audit_packet

# System health snapshot
python scripts/python/systems/system_health_snapshot.py \
  --output reports/system_snapshot.json

# Database smoke tests (SQLite demo mode)
python scripts/python/systems/db_smoke_test.py \
  --output reports/db_smoke.json

# Excel workbook validation (replace with your workbook path)
python scripts/python/data_quality/excel_validate_workbook.py \
  --input path/to/workbook.xlsx \
  --output reports/excel_validation_summary.json

# Daily workflow wrappers
bash scripts/workflow/schedule_daily_run.sh
# Optional: capped retries per step (example: 2 retries after initial attempt)
MAX_RETRIES=2 bash scripts/workflow/schedule_daily_run.sh
# Optional: keep only the newest N timestamped run artifacts
RETAIN_RUN_ARTIFACTS=10 bash scripts/workflow/schedule_daily_run.sh
# powershell -ExecutionPolicy Bypass -File scripts/workflow/schedule_daily_run.ps1
# powershell -ExecutionPolicy Bypass -File scripts/workflow/schedule_daily_run.ps1 -MaxRetries 2
# powershell -ExecutionPolicy Bypass -File scripts/workflow/schedule_daily_run.ps1 -RetainRunArtifacts 10

# Build run manifest directly (useful after ad-hoc script execution)
python scripts/python/reporting/run_manifest.py \
  --run-id 20260218-153323 \
  --status success \
  --report-dir reports \
  --output-dir output \
  --log-file logs/daily-run-20260218-153323.log \
  --manifest reports/run_manifest-20260218-153323.json

# Update cross-run history index from a manifest
python scripts/python/reporting/run_history_index.py \
  --manifest reports/run_manifest-20260218-153323.json \
  --history reports/run_history_index.json \
  --max-entries 50

Milestone B close-out run matrix (first-wave coverage)

This matrix keeps each script family tied to clear run commands, expected outputs, and quick verification points.

Family Primary assets Typical input Expected artifact(s)
SQL integrity checks scripts/sql/mysql/mysql_validate_results_integrity.sql, scripts/sql/postgres/postgres_validate_results_integrity.sql, scripts/sql/templates/sql_reconciliation_diff_template.sql Seeded student_records tables Query result sets for missing/duplicate/invalid records and source-vs-target diffs
SQL seed demos data/seed/sql_seed_demo_academic_dataset/{mysql.sql,postgres.sql,sqlite.sql} Local MySQL/Postgres/SQLite demo DB Reproducible seeded dataset for script demos
Data quality (CSV/Excel) csv_profile.py, csv_clean_normalize.py, config_rules_validator.py, excel_validate_workbook.py data/sample/student_records_source.csv (+ local .xlsx for workbook checks) reports/student_profile.json, output/student_records_clean.csv, reports/rules_violations.csv, reports/rules_validation_summary.json, reports/excel_validation_summary.json
Reconciliation reconcile_students.py, fuzzy_match_students.py, survivorship_merge_students.py data/sample/student_records_source.csv + data/sample/student_records_target.csv reports/reconciliation*.csv + matching *_summary.json files
Reporting + SLA sla_at_risk_report.py, excel_export_audit_packet.py data/sample/exam_tasks.csv and student records CSV reports/sla_at_risk.csv, reports/sla_summary.json, reports/audit_packet/*
ETL etl_runner.py with data/sample/etl_config.json Config + source CSV path output CSV and ETL summary JSON configured in ETL config
Systems system_health_snapshot.py, db_smoke_test.py Host metrics + optional DB env vars reports/system_snapshot.json, reports/db_smoke.json
Workflow wrappers scripts/workflow/schedule_daily_run.sh, scripts/workflow/schedule_daily_run.ps1 Built-in sample dataset paths (+ optional retry cap: MAX_RETRIES / -MaxRetries and retention cap: RETAIN_RUN_ARTIFACTS / -RetainRunArtifacts) Timestamped logs/daily-run-*.log, reports/step_status-*.csv (attempt-aware rows), reports/run_manifest-*.json (including steps_retried), reports/run_history_index.json (cross-run summary), and automatic pruning of older run logs/manifests/step-status files beyond retention cap
Google Sheets google-sheets/apps-script/apps_script_validation_rules.gs Google Sheet tab (SHEET_NAME) In-sheet validation rules + exported sanitized CSV to Drive folder
Power Query powerbi/powerquery/powerquery_data_quality_template.pq Power Query source connector/table DQ columns (dq_issue, dq_status) for downstream BI filters

Quick verification checklist

  1. Run the Python commands above and confirm fresh timestamps in reports/ and output/.
  2. Execute bash scripts/workflow/schedule_daily_run.sh and confirm new logs/daily-run-*.log, reports/step_status-*.csv, and reports/run_manifest-*.json files.
  3. Re-run with retries enabled (MAX_RETRIES=2 bash scripts/workflow/schedule_daily_run.sh) and confirm step rows include attempt/max_attempts values; when retries occur, confirm steps_retried increases in run_manifest-*.json.
  4. Re-run with a low retention value (RETAIN_RUN_ARTIFACTS=1 bash scripts/workflow/schedule_daily_run.sh) and confirm older daily-run-*.log, run_manifest-*.json, and step_status-*.csv files are pruned.
  5. Confirm reports/run_history_index.json exists and includes a fresh entry with run_id, status, and counts from the latest manifest.
  6. Run at least one SQL integrity script against seeded demo data to confirm expected mismatch/duplicate outputs.
  7. For Sheets/Power Query assets, confirm template import executes and outputs validation status columns/exports as expected.

Portfolio alignment

This toolkit maps directly to the types of projects I have delivered:

  1. Academic Results Analytics Dashboard
    • feeds cleaner, validated datasets into BI layers
    • supports anomaly tracking and KPI reliability
  2. Admissions Data Quality Audit
    • applies duplicate detection, field validation, and reconciliation workflows
  3. Examination Integrity Monitoring
    • structures checkpoint/SLA monitoring and evidence-oriented reporting

Dependencies

  • Python: pandas, openpyxl, psutil, python-dotenv
  • MySQL client: optional (for DB smoke tests)
  • Postgres client: optional (for DB smoke tests)

Notes

  • Scripts default to safe/read-only behavior where possible.
  • Keep generated outputs in output/, reports/, or artifacts/ (already gitignored).
  • Use scripts/workflow/schedule_daily_run.sh (Linux) or scripts/workflow/schedule_daily_run.ps1 (Windows) for full routine execution.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors