Skip to content

thecoderr13/Clock-It

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Clock It — Parking Disruption Intelligence

Bengaluru Traffic Police · Hackathon Submission

Clock It converts 298,450 historical parking violations into a bias-corrected enforcement schedule — telling each police station where to deploy, at what time, and with what expected impact on disruption.


The Core Insight

60.1% of violations in the dataset fall between midnight and 6 AM. Not because parking is worst then, but because that is when patrol teams go out. A raw heatmap of this data shows where police already patrol, not where the problem is worst.

Clock It corrects for this by dividing violation density by estimated patrol-hour exposure at each junction, surfacing locations that are genuinely underenforced relative to their actual violation demand. No other analysis of this dataset accounts for this artifact. Every PDI score, every deployment recommendation, and every ML prediction in this system is computed on the corrected figure.


Run Commands

Prerequisites

  • Python 3.10 or later
  • The raw dataset placed at data/raw/violations.csv

Step 1 — Install dependencies

cd clockit
pip install -r requirements.txt

If you are using a system-managed Python installation (Debian, Ubuntu, newer macOS):

pip install -r requirements.txt --break-system-packages

Or use a virtual environment:

python -m venv .venv
source .venv/bin/activate        # Windows: .venv\Scripts\activate
pip install -r requirements.txt

Note: pulp is listed in requirements but is optional. If it is unavailable in your environment, the optimizer falls back to a greedy algorithm automatically. All other packages are required.

Step 2 — Run the data pipeline

python pipeline.py

Expected runtime: 30–90 seconds depending on hardware.

What it does:

  • Loads and cleans all 298,450 violation records
  • Parses datetime fields and computes hour, day-of-week, week, and month columns
  • Computes enforcement bias correction per junction (violations divided by patrol-window exposure)
  • Calculates the Parking Disruption Index (PDI) for all 168 BTP-coded named junctions
  • Runs the greedy enforcement schedule optimizer (70 deployment slots across 7 days x 2 shifts x 5 teams)
  • Writes GeoJSON for the Leaflet map (top 80 junctions)
  • Writes hourly breakdowns, station totals, vehicle distribution, and monthly trend files
  • Writes summary.json used by the Flask dashboard

Output: 10 files written to data/processed/

Step 3 — Train the ML model

python ml/train.py

Expected runtime: 60–120 seconds.

What it does:

  • Engineers 22 junction-level features from raw violation records (temporal, spatial, violation-type, vehicle severity)
  • Labels junctions: PDI in the top 40% = high-impact hotspot (label 1)
  • Splits 75/25 train/test with stratification
  • Trains Logistic Regression and Random Forest with balanced class weights
  • Selects the best model by AUC on the held-out test set
  • Runs 5-fold cross-validation and reports mean AUC and standard deviation
  • Saves the trained pipeline to ml/model.pkl
  • Scores all 168 junctions with hotspot probability and merges into junction_pdi.csv

Reported metrics on this dataset: AUC 0.981, F1 0.839, CV AUC 0.963 +/- 0.034

Step 4 — Start the web application

python app.py

Open: http://localhost:5000

For production deployment:

gunicorn app:app --bind 0.0.0.0:5000 --workers 2

Re-running from the UI

Once the app is running, the ML Model page (/model) provides a training panel with step-by-step controls. "Re-run pipeline" and "Train model" buttons POST to /api/run-pipeline and /api/run-training respectively, execute the Python scripts server-side, stream a log console to the browser, and update the displayed metrics on completion. No terminal access is required for re-runs.


Project Structure

clockit/
|
+-- data/
|   +-- raw/
|   |   \-- violations.csv              <- Place your dataset here
|   \-- processed/                      <- Auto-generated by pipeline.py
|       +-- junction_pdi.csv            <- PDI scores for all 168 junctions
|       +-- junctions.geojson           <- Top 80 junctions for Leaflet map
|       +-- enforcement_schedule.csv    <- Weekly deployment calendar
|       +-- junction_hourly.csv         <- Per-junction hour breakdown
|       +-- city_hourly.csv             <- City-wide hourly distribution
|       +-- city_dow.csv                <- Day-of-week breakdown
|       +-- police_stations.csv         <- Station totals and junction counts
|       +-- vehicle_types.csv           <- Vehicle type breakdown
|       +-- monthly_trend.csv           <- Month-by-month trend
|       \-- summary.json                <- Dashboard KPIs
|
+-- ml/
|   +-- train.py                        <- ML training script (run after pipeline)
|   +-- model.pkl                       <- Trained classifier (auto-generated)
|   +-- metrics.json                    <- AUC, precision, recall, F1 (auto-generated)
|   +-- feature_importance.json         <- Feature rankings (auto-generated)
|   \-- feature_names.json              <- Feature list for inference (auto-generated)
|
+-- static/
|   +-- css/main.css                    <- Light theme stylesheet
|   \-- js/main.js                      <- Chart.js defaults + shared utilities
|
+-- templates/
|   +-- base.html                       <- Top rail layout + shared navigation
|   +-- index.html                      <- Overview dashboard
|   +-- map.html                        <- Leaflet hotspot map
|   +-- schedule.html                   <- Enforcement schedule (calendar/table/map)
|   +-- analytics.html                  <- Analytics and bias analysis
|   +-- model.html                      <- ML model metrics + in-app training panel
|   \-- brief.html                      <- Per-junction enforcement brief
|
+-- utils/
|   +-- __init__.py
|   \-- pdf_generator.py                <- ReportLab PDF brief generator
|
+-- pipeline.py                         <- Step 1: Data processing and PDI engine
+-- app.py                              <- Step 2: Flask web application
+-- requirements.txt
+-- .gitignore
\-- README.md

What This System Does (and Does Not)

Claim Supported Method
Identifies parking hotspots Yes Spatial clustering at BTP-coded junction level
Scores disruption potential Yes PDI formula — transparent, every coefficient auditable
Corrects for enforcement bias Yes Violations divided by patrol-window exposure
Optimises enforcement deployment Yes Greedy schedule maximising PDI reduction per team
Predicts high-impact junctions Yes Logistic Regression on 22 raw observation features
Reduces traffic congestion Cannot claim No traffic sensor data to validate against
Quantifies congestion impact Proxy only PDI approximates disruption potential, not measured flow

The distinction between the last two rows and the rest is deliberate. Stating it explicitly is stronger than a claim that cannot be defended under questioning.


The PDI Formula

PDI(junction) =
    0.35 x bias_corrected_frequency     <- violations divided by patrol-window exposure
  + 0.20 x main_road_rate               <- share involving carriageway blockage
  + 0.20 x repeat_consistency           <- share of weeks the junction was active
  + 0.15 x multi_violation_rate         <- share of records with 2 or more offences
  + 0.10 x peak_hour_rate               <- share during 07-10h and 17-20h

  x avg_vehicle_severity_weight         <- Private Bus 1.5x, Car 1.4x, Scooter 0.8x

All five components are computed directly from the dataset. No external data. No external calibration. Every junction score is reproducible from the raw CSV in a single pipeline run.


API Reference

Method Endpoint Description
GET /api/summary Dashboard KPIs
GET /api/junctions GeoJSON for Leaflet map. Optional: ?tier=Critical
GET /api/pdi-leaderboard Top N junctions by PDI. Optional: ?n=20
GET /api/schedule Enforcement schedule. Optional: ?day=Monday&shift=Morning
GET /api/hourly City-wide hourly violation distribution
GET /api/junction-hourly/<btp_code> Per-junction hourly breakdown
GET /api/stations Police station totals and junction counts
GET /api/vehicle-types Vehicle type distribution
GET /api/monthly Month-by-month violation trend
GET /api/dow Day-of-week distribution
GET /api/model-metrics ML classifier metrics
GET /api/feature-importance Feature importance ranking
GET /api/brief-pdf/<btp_code> Download PDF enforcement brief
POST /api/run-pipeline Trigger pipeline.py server-side
POST /api/run-training Trigger ml/train.py server-side, returns updated metrics
POST /api/predict Score a junction given its feature values

Stack

Backend: Python 3.10+, Flask 3.0 Data processing: pandas, numpy Machine learning: scikit-learn (LogisticRegression, RandomForestClassifier) Schedule optimizer: PuLP with greedy fallback PDF generation: ReportLab Frontend: Jinja2, Chart.js 4, Leaflet.js 1.9, vanilla JS Map tiles: CartoDB Light (no API key required) Fonts: DM Serif Display (brand mark only), Inter (all UI)

No external data sources. No API keys. Runs entirely offline.


Data

Source: Bengaluru Traffic Police violation records Period: November 2023 to April 2024 Records: 298,450 Named junctions: 168 BTP-coded intersections Police stations: 34 across Bengaluru


Troubleshooting

ModuleNotFoundError: No module named 'pulp' PuLP is optional. The optimizer falls back to greedy automatically. Remove it from requirements.txt if your environment cannot install it.

ERROR: Run pipeline.py first! The app checks for data/processed/junction_pdi.csv on startup. Run python pipeline.py before python app.py.

WARNING: ML metrics not found The app runs without the ML model but the Model page will show empty metrics. Run python ml/train.py to generate them.

Map shows no pins Confirm data/processed/junctions.geojson exists and has features. Re-run python pipeline.py if not.

PDF brief download fails ReportLab must be installed: pip install reportlab. Check the /api/brief-pdf/<btp_code> endpoint for the error message in the JSON response.