Operational scripting toolkit built from hands-on academic IT and instructional operations.
This repository captures the scripting patterns I used and refined across two core roles:
- Systems & Data Analyst / System Programmer (FUTA, 2017-2023) supporting admissions, examinations, results processing, and reporting workflows at scale
- Computer Technology Instructor (Pures College, 2024-2025) teaching practical systems administration, networking, and data workflow implementation in lab environments
The goal is simple: turn recurring operations into reliable, auditable, and repeatable workflows across Linux and Windows, with outputs that can stand up to operational review.
- Information Systems Analyst and Computer Science professional (M.Tech. in Computer Science)
- Experience supporting large academic IT operations handling 4,000+ student records/users
- Delivered data processing automation with Python, MySQL, Excel, and Google Sheets
- Built workflows for reconciliation, result integrity checks, and audit-ready reporting
- Managed practical IT instruction and assessment workflows for 250+ students
These scripts reflect day-to-day operational needs from environments with high-volume records, strict reporting timelines, and compliance-sensitive processes:
- reconciliation of admissions and results records between operational extracts and approved reporting outputs
- data quality checks before grade publication windows and committee sign-off
- repeatable ETL runs for admissions and academic performance datasets
- checkpoint/SLA monitoring for exam readiness, compliance, and support tasks
- audit packet preparation for leadership, registry, and quality-assurance review teams
- lightweight system and database smoke checks before reporting cycles
- building reusable data-quality routines for CSV/Excel pipelines
- validating key records and catching duplicates, nulls, and status anomalies early
- documenting reconciliation outcomes for transparent decision-making and approvals
- handling incomplete or inconsistent identifiers with fuzzy reconciliation suggestions
- resolving source-vs-target field conflicts with explicit survivorship priority rules
- supporting analytics preparation for Power BI/Excel reporting layers
- creating cross-platform workflows that can run as scheduled jobs
- supported recurring admissions and examination data cycles with Python, MySQL, Excel, and Google Sheets
- reduced manual record handling risk through repeatable validation, reconciliation, and ETL routines
- used discrepancy reports and integrity checks to prepare result data before publication workflows
- supported operational systems handling 4,000+ records/users, where repeatability and auditability were essential
- used script-driven data validation examples to teach practical SQL/Python workflows
- adapted the same reconciliation and quality-check patterns for student lab assignments and project assessments
- reinforced operational thinking: validate inputs, detect anomalies early, and produce traceable outputs
Although this repository uses academic-style sample data, the workflow pattern is directly transferable to enterprise operations:
- customer/master-data quality validation
- CRM/ERP reconciliation and variance reporting
- SLA monitoring for operations and support queues
- compliance evidence packaging and audit preparation
- ETL pre-flight checks and post-run summaries for BI teams
scripts/
sql/
mysql/
postgres/
sqlite/
templates/
python/
data_quality/
reconciliation/
reporting/
etl/
systems/
workflow/
data/
sample/
seed/
powerbi/
powerquery/
google-sheets/
apps-script/
The sample datasets are intentionally richer to mirror realistic operational review scenarios.
data/sample/student_records_source.csv- 140 student records across admissions cohorts and multiple departments
- includes intentional quality issues: duplicate IDs, missing email/department, out-of-range scores, and status exceptions
data/sample/student_records_target.csv- 140 records with overlapping and non-overlapping IDs for reconciliation behavior
- includes realistic score/status/department differences for mismatch reporting
data/sample/exam_tasks.csv- 140 operational checkpoints tied to admissions, exam integrity, reporting, and compliance
- includes overdue, at-risk, completed-on-time, and late-completed patterns for SLA testing
data/sample/validation_rules.json- JSON rule pack for config-driven CSV validation
- includes required-field, score-range, status, and conditional validation examples
Equivalent SQL seed data is available in:
data/seed/sql_seed_demo_academic_dataset/{mysql.sql,postgres.sql,sqlite.sql}
python3 -m venv .venv
source .venv/bin/activate
python -m pip install -U pip
python -m pip install pandas openpyxl psutil python-dotenv- Seed a demo database using files in
data/seed/sql_seed_demo_academic_dataset/ - Run data profiling, normalization, and rules-based validation scripts in
scripts/python/data_quality/ - Run reconciliation and SLA scripts in
scripts/python/reconciliation/andscripts/python/reporting/ - Use workflow wrappers in
scripts/workflow/for scheduled execution
# Profile source dataset
python scripts/python/data_quality/csv_profile.py \
--input data/sample/student_records_source.csv \
--output reports/student_profile.json
# Clean and normalize source dataset
python scripts/python/data_quality/csv_clean_normalize.py \
--input data/sample/student_records_source.csv \
--output output/student_records_clean.csv \
--date-columns due_date completed_at \
--drop-duplicates
# Run config-driven validation rules
python scripts/python/data_quality/config_rules_validator.py \
--input data/sample/student_records_source.csv \
--rules data/sample/validation_rules.json \
--output reports/rules_violations.csv \
--summary reports/rules_validation_summary.json
# Reconcile source vs target
python scripts/python/reconciliation/reconcile_students.py \
--source data/sample/student_records_source.csv \
--target data/sample/student_records_target.csv \
--output reports/reconciliation.csv \
--summary reports/reconciliation_summary.json
# Fuzzy reconciliation when IDs are missing or inconsistent
python scripts/python/reconciliation/fuzzy_match_students.py \
--source data/sample/student_records_source.csv \
--target data/sample/student_records_target.csv \
--output reports/reconciliation_fuzzy.csv \
--summary reports/reconciliation_fuzzy_summary.json \
--threshold 0.86
# Survivorship merge for source-vs-target conflicts
python scripts/python/reconciliation/survivorship_merge_students.py \
--source data/sample/student_records_source.csv \
--target data/sample/student_records_target.csv \
--output reports/reconciliation_survivorship.csv \
--summary reports/reconciliation_survivorship_summary.json \
--priority target source
# SLA at-risk report
python scripts/python/reporting/sla_at_risk_report.py \
--input data/sample/exam_tasks.csv \
--output reports/sla_at_risk.csv \
--summary reports/sla_summary.json \
--threshold-days 2
# Config-driven ETL apply run
python scripts/python/etl/etl_runner.py \
--config data/sample/etl_config.json \
--apply
# Build audit packet (CSV outputs + optional Excel workbook when openpyxl is available)
python scripts/python/reporting/excel_export_audit_packet.py \
--input data/sample/student_records_source.csv \
--output-dir reports/audit_packet
# System health snapshot
python scripts/python/systems/system_health_snapshot.py \
--output reports/system_snapshot.json
# Database smoke tests (SQLite demo mode)
python scripts/python/systems/db_smoke_test.py \
--output reports/db_smoke.json
# Excel workbook validation (replace with your workbook path)
python scripts/python/data_quality/excel_validate_workbook.py \
--input path/to/workbook.xlsx \
--output reports/excel_validation_summary.json
# Daily workflow wrappers
bash scripts/workflow/schedule_daily_run.sh
# Optional: capped retries per step (example: 2 retries after initial attempt)
MAX_RETRIES=2 bash scripts/workflow/schedule_daily_run.sh
# Optional: keep only the newest N timestamped run artifacts
RETAIN_RUN_ARTIFACTS=10 bash scripts/workflow/schedule_daily_run.sh
# powershell -ExecutionPolicy Bypass -File scripts/workflow/schedule_daily_run.ps1
# powershell -ExecutionPolicy Bypass -File scripts/workflow/schedule_daily_run.ps1 -MaxRetries 2
# powershell -ExecutionPolicy Bypass -File scripts/workflow/schedule_daily_run.ps1 -RetainRunArtifacts 10
# Build run manifest directly (useful after ad-hoc script execution)
python scripts/python/reporting/run_manifest.py \
--run-id 20260218-153323 \
--status success \
--report-dir reports \
--output-dir output \
--log-file logs/daily-run-20260218-153323.log \
--manifest reports/run_manifest-20260218-153323.json
# Update cross-run history index from a manifest
python scripts/python/reporting/run_history_index.py \
--manifest reports/run_manifest-20260218-153323.json \
--history reports/run_history_index.json \
--max-entries 50This matrix keeps each script family tied to clear run commands, expected outputs, and quick verification points.
| Family | Primary assets | Typical input | Expected artifact(s) |
|---|---|---|---|
| SQL integrity checks | scripts/sql/mysql/mysql_validate_results_integrity.sql, scripts/sql/postgres/postgres_validate_results_integrity.sql, scripts/sql/templates/sql_reconciliation_diff_template.sql |
Seeded student_records tables |
Query result sets for missing/duplicate/invalid records and source-vs-target diffs |
| SQL seed demos | data/seed/sql_seed_demo_academic_dataset/{mysql.sql,postgres.sql,sqlite.sql} |
Local MySQL/Postgres/SQLite demo DB | Reproducible seeded dataset for script demos |
| Data quality (CSV/Excel) | csv_profile.py, csv_clean_normalize.py, config_rules_validator.py, excel_validate_workbook.py |
data/sample/student_records_source.csv (+ local .xlsx for workbook checks) |
reports/student_profile.json, output/student_records_clean.csv, reports/rules_violations.csv, reports/rules_validation_summary.json, reports/excel_validation_summary.json |
| Reconciliation | reconcile_students.py, fuzzy_match_students.py, survivorship_merge_students.py |
data/sample/student_records_source.csv + data/sample/student_records_target.csv |
reports/reconciliation*.csv + matching *_summary.json files |
| Reporting + SLA | sla_at_risk_report.py, excel_export_audit_packet.py |
data/sample/exam_tasks.csv and student records CSV |
reports/sla_at_risk.csv, reports/sla_summary.json, reports/audit_packet/* |
| ETL | etl_runner.py with data/sample/etl_config.json |
Config + source CSV path | output CSV and ETL summary JSON configured in ETL config |
| Systems | system_health_snapshot.py, db_smoke_test.py |
Host metrics + optional DB env vars | reports/system_snapshot.json, reports/db_smoke.json |
| Workflow wrappers | scripts/workflow/schedule_daily_run.sh, scripts/workflow/schedule_daily_run.ps1 |
Built-in sample dataset paths (+ optional retry cap: MAX_RETRIES / -MaxRetries and retention cap: RETAIN_RUN_ARTIFACTS / -RetainRunArtifacts) |
Timestamped logs/daily-run-*.log, reports/step_status-*.csv (attempt-aware rows), reports/run_manifest-*.json (including steps_retried), reports/run_history_index.json (cross-run summary), and automatic pruning of older run logs/manifests/step-status files beyond retention cap |
| Google Sheets | google-sheets/apps-script/apps_script_validation_rules.gs |
Google Sheet tab (SHEET_NAME) |
In-sheet validation rules + exported sanitized CSV to Drive folder |
| Power Query | powerbi/powerquery/powerquery_data_quality_template.pq |
Power Query source connector/table | DQ columns (dq_issue, dq_status) for downstream BI filters |
- Run the Python commands above and confirm fresh timestamps in
reports/andoutput/. - Execute
bash scripts/workflow/schedule_daily_run.shand confirm newlogs/daily-run-*.log,reports/step_status-*.csv, andreports/run_manifest-*.jsonfiles. - Re-run with retries enabled (
MAX_RETRIES=2 bash scripts/workflow/schedule_daily_run.sh) and confirm step rows includeattempt/max_attemptsvalues; when retries occur, confirmsteps_retriedincreases inrun_manifest-*.json. - Re-run with a low retention value (
RETAIN_RUN_ARTIFACTS=1 bash scripts/workflow/schedule_daily_run.sh) and confirm olderdaily-run-*.log,run_manifest-*.json, andstep_status-*.csvfiles are pruned. - Confirm
reports/run_history_index.jsonexists and includes a fresh entry withrun_id,status, andcountsfrom the latest manifest. - Run at least one SQL integrity script against seeded demo data to confirm expected mismatch/duplicate outputs.
- For Sheets/Power Query assets, confirm template import executes and outputs validation status columns/exports as expected.
This toolkit maps directly to the types of projects I have delivered:
- Academic Results Analytics Dashboard
- feeds cleaner, validated datasets into BI layers
- supports anomaly tracking and KPI reliability
- Admissions Data Quality Audit
- applies duplicate detection, field validation, and reconciliation workflows
- Examination Integrity Monitoring
- structures checkpoint/SLA monitoring and evidence-oriented reporting
- Python:
pandas,openpyxl,psutil,python-dotenv - MySQL client: optional (for DB smoke tests)
- Postgres client: optional (for DB smoke tests)
- Scripts default to safe/read-only behavior where possible.
- Keep generated outputs in
output/,reports/, orartifacts/(already gitignored). - Use
scripts/workflow/schedule_daily_run.sh(Linux) orscripts/workflow/schedule_daily_run.ps1(Windows) for full routine execution.