Clock It — Parking Disruption Intelligence

Bengaluru Traffic Police · Hackathon Submission

Clock It converts 298,450 historical parking violations into a bias-corrected enforcement schedule — telling each police station where to deploy, at what time, and with what expected impact on disruption.

The Core Insight

60.1% of violations in the dataset fall between midnight and 6 AM. Not because parking is worst then, but because that is when patrol teams go out. A raw heatmap of this data shows where police already patrol, not where the problem is worst.

Clock It corrects for this by dividing violation density by estimated patrol-hour exposure at each junction, surfacing locations that are genuinely underenforced relative to their actual violation demand. No other analysis of this dataset accounts for this artifact. Every PDI score, every deployment recommendation, and every ML prediction in this system is computed on the corrected figure.

Run Commands

Prerequisites

Python 3.10 or later
The raw dataset placed at data/raw/violations.csv

Step 1 — Install dependencies

cd clockit
pip install -r requirements.txt

If you are using a system-managed Python installation (Debian, Ubuntu, newer macOS):

pip install -r requirements.txt --break-system-packages

Or use a virtual environment:

python -m venv .venv
source .venv/bin/activate        # Windows: .venv\Scripts\activate
pip install -r requirements.txt

Note: pulp is listed in requirements but is optional. If it is unavailable in your environment, the optimizer falls back to a greedy algorithm automatically. All other packages are required.

Step 2 — Run the data pipeline

python pipeline.py

Expected runtime: 30–90 seconds depending on hardware.

What it does:

Loads and cleans all 298,450 violation records
Parses datetime fields and computes hour, day-of-week, week, and month columns
Computes enforcement bias correction per junction (violations divided by patrol-window exposure)
Calculates the Parking Disruption Index (PDI) for all 168 BTP-coded named junctions
Runs the greedy enforcement schedule optimizer (70 deployment slots across 7 days x 2 shifts x 5 teams)
Writes GeoJSON for the Leaflet map (top 80 junctions)
Writes hourly breakdowns, station totals, vehicle distribution, and monthly trend files
Writes summary.json used by the Flask dashboard

Output: 10 files written to data/processed/

Step 3 — Train the ML model

python ml/train.py

Expected runtime: 60–120 seconds.

What it does:

Engineers 22 junction-level features from raw violation records (temporal, spatial, violation-type, vehicle severity)
Labels junctions: PDI in the top 40% = high-impact hotspot (label 1)
Splits 75/25 train/test with stratification
Trains Logistic Regression and Random Forest with balanced class weights
Selects the best model by AUC on the held-out test set
Runs 5-fold cross-validation and reports mean AUC and standard deviation
Saves the trained pipeline to ml/model.pkl
Scores all 168 junctions with hotspot probability and merges into junction_pdi.csv

Reported metrics on this dataset: AUC 0.981, F1 0.839, CV AUC 0.963 +/- 0.034

Step 4 — Start the web application

python app.py

Open: http://localhost:5000

For production deployment:

gunicorn app:app --bind 0.0.0.0:5000 --workers 2

Re-running from the UI

Once the app is running, the ML Model page (/model) provides a training panel with step-by-step controls. "Re-run pipeline" and "Train model" buttons POST to /api/run-pipeline and /api/run-training respectively, execute the Python scripts server-side, stream a log console to the browser, and update the displayed metrics on completion. No terminal access is required for re-runs.

Project Structure

clockit/
|
+-- data/
|   +-- raw/
|   |   \-- violations.csv              <- Place your dataset here
|   \-- processed/                      <- Auto-generated by pipeline.py
|       +-- junction_pdi.csv            <- PDI scores for all 168 junctions
|       +-- junctions.geojson           <- Top 80 junctions for Leaflet map
|       +-- enforcement_schedule.csv    <- Weekly deployment calendar
|       +-- junction_hourly.csv         <- Per-junction hour breakdown
|       +-- city_hourly.csv             <- City-wide hourly distribution
|       +-- city_dow.csv                <- Day-of-week breakdown
|       +-- police_stations.csv         <- Station totals and junction counts
|       +-- vehicle_types.csv           <- Vehicle type breakdown
|       +-- monthly_trend.csv           <- Month-by-month trend
|       \-- summary.json                <- Dashboard KPIs
|
+-- ml/
|   +-- train.py                        <- ML training script (run after pipeline)
|   +-- model.pkl                       <- Trained classifier (auto-generated)
|   +-- metrics.json                    <- AUC, precision, recall, F1 (auto-generated)
|   +-- feature_importance.json         <- Feature rankings (auto-generated)
|   \-- feature_names.json              <- Feature list for inference (auto-generated)
|
+-- static/
|   +-- css/main.css                    <- Light theme stylesheet
|   \-- js/main.js                      <- Chart.js defaults + shared utilities
|
+-- templates/
|   +-- base.html                       <- Top rail layout + shared navigation
|   +-- index.html                      <- Overview dashboard
|   +-- map.html                        <- Leaflet hotspot map
|   +-- schedule.html                   <- Enforcement schedule (calendar/table/map)
|   +-- analytics.html                  <- Analytics and bias analysis
|   +-- model.html                      <- ML model metrics + in-app training panel
|   \-- brief.html                      <- Per-junction enforcement brief
|
+-- utils/
|   +-- __init__.py
|   \-- pdf_generator.py                <- ReportLab PDF brief generator
|
+-- pipeline.py                         <- Step 1: Data processing and PDI engine
+-- app.py                              <- Step 2: Flask web application
+-- requirements.txt
+-- .gitignore
\-- README.md

What This System Does (and Does Not)

Claim	Supported	Method
Identifies parking hotspots	Yes	Spatial clustering at BTP-coded junction level
Scores disruption potential	Yes	PDI formula — transparent, every coefficient auditable
Corrects for enforcement bias	Yes	Violations divided by patrol-window exposure
Optimises enforcement deployment	Yes	Greedy schedule maximising PDI reduction per team
Predicts high-impact junctions	Yes	Logistic Regression on 22 raw observation features
Reduces traffic congestion	Cannot claim	No traffic sensor data to validate against
Quantifies congestion impact	Proxy only	PDI approximates disruption potential, not measured flow

The distinction between the last two rows and the rest is deliberate. Stating it explicitly is stronger than a claim that cannot be defended under questioning.

The PDI Formula

PDI(junction) =
    0.35 x bias_corrected_frequency     <- violations divided by patrol-window exposure
  + 0.20 x main_road_rate               <- share involving carriageway blockage
  + 0.20 x repeat_consistency           <- share of weeks the junction was active
  + 0.15 x multi_violation_rate         <- share of records with 2 or more offences
  + 0.10 x peak_hour_rate               <- share during 07-10h and 17-20h

  x avg_vehicle_severity_weight         <- Private Bus 1.5x, Car 1.4x, Scooter 0.8x

All five components are computed directly from the dataset. No external data. No external calibration. Every junction score is reproducible from the raw CSV in a single pipeline run.

API Reference

Method	Endpoint	Description
GET	`/api/summary`	Dashboard KPIs
GET	`/api/junctions`	GeoJSON for Leaflet map. Optional: `?tier=Critical`
GET	`/api/pdi-leaderboard`	Top N junctions by PDI. Optional: `?n=20`
GET	`/api/schedule`	Enforcement schedule. Optional: `?day=Monday&shift=Morning`
GET	`/api/hourly`	City-wide hourly violation distribution
GET	`/api/junction-hourly/<btp_code>`	Per-junction hourly breakdown
GET	`/api/stations`	Police station totals and junction counts
GET	`/api/vehicle-types`	Vehicle type distribution
GET	`/api/monthly`	Month-by-month violation trend
GET	`/api/dow`	Day-of-week distribution
GET	`/api/model-metrics`	ML classifier metrics
GET	`/api/feature-importance`	Feature importance ranking
GET	`/api/brief-pdf/<btp_code>`	Download PDF enforcement brief
POST	`/api/run-pipeline`	Trigger pipeline.py server-side
POST	`/api/run-training`	Trigger ml/train.py server-side, returns updated metrics
POST	`/api/predict`	Score a junction given its feature values

Stack

Backend: Python 3.10+, Flask 3.0 Data processing: pandas, numpy Machine learning: scikit-learn (LogisticRegression, RandomForestClassifier) Schedule optimizer: PuLP with greedy fallback PDF generation: ReportLab Frontend: Jinja2, Chart.js 4, Leaflet.js 1.9, vanilla JS Map tiles: CartoDB Light (no API key required) Fonts: DM Serif Display (brand mark only), Inter (all UI)

No external data sources. No API keys. Runs entirely offline.

Data

Source: Bengaluru Traffic Police violation records Period: November 2023 to April 2024 Records: 298,450 Named junctions: 168 BTP-coded intersections Police stations: 34 across Bengaluru

Troubleshooting

ModuleNotFoundError: No module named 'pulp' PuLP is optional. The optimizer falls back to greedy automatically. Remove it from requirements.txt if your environment cannot install it.

ERROR: Run pipeline.py first! The app checks for data/processed/junction_pdi.csv on startup. Run python pipeline.py before python app.py.

WARNING: ML metrics not found The app runs without the ML model but the Model page will show empty metrics. Run python ml/train.py to generate them.

Map shows no pins Confirm data/processed/junctions.geojson exists and has features. Re-run python pipeline.py if not.

PDF brief download fails ReportLab must be installed: pip install reportlab. Check the /api/brief-pdf/<btp_code> endpoint for the error message in the JSON response.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Clock It — Parking Disruption Intelligence

Bengaluru Traffic Police · Hackathon Submission

The Core Insight

Run Commands

Prerequisites

Step 1 — Install dependencies

Step 2 — Run the data pipeline

Step 3 — Train the ML model

Step 4 — Start the web application

Re-running from the UI

Project Structure

What This System Does (and Does Not)

The PDI Formula

API Reference

Stack

Data

Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
ml		ml
static		static
templates		templates
utils		utils
.gitignore		.gitignore
README.md		README.md
app.py		app.py
check.py		check.py
pipeline.py		pipeline.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Clock It — Parking Disruption Intelligence

Bengaluru Traffic Police · Hackathon Submission

The Core Insight

Run Commands

Prerequisites

Step 1 — Install dependencies

Step 2 — Run the data pipeline

Step 3 — Train the ML model

Step 4 — Start the web application

Re-running from the UI

Project Structure

What This System Does (and Does Not)

The PDI Formula

API Reference

Stack

Data

Troubleshooting

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages