Clock It converts 298,450 historical parking violations into a bias-corrected enforcement schedule — telling each police station where to deploy, at what time, and with what expected impact on disruption.
60.1% of violations in the dataset fall between midnight and 6 AM. Not because parking is worst then, but because that is when patrol teams go out. A raw heatmap of this data shows where police already patrol, not where the problem is worst.
Clock It corrects for this by dividing violation density by estimated patrol-hour exposure at each junction, surfacing locations that are genuinely underenforced relative to their actual violation demand. No other analysis of this dataset accounts for this artifact. Every PDI score, every deployment recommendation, and every ML prediction in this system is computed on the corrected figure.
- Python 3.10 or later
- The raw dataset placed at
data/raw/violations.csv
cd clockit
pip install -r requirements.txtIf you are using a system-managed Python installation (Debian, Ubuntu, newer macOS):
pip install -r requirements.txt --break-system-packagesOr use a virtual environment:
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txtNote: pulp is listed in requirements but is optional. If it is unavailable in your environment, the optimizer falls back to a greedy algorithm automatically. All other packages are required.
python pipeline.pyExpected runtime: 30–90 seconds depending on hardware.
What it does:
- Loads and cleans all 298,450 violation records
- Parses datetime fields and computes hour, day-of-week, week, and month columns
- Computes enforcement bias correction per junction (violations divided by patrol-window exposure)
- Calculates the Parking Disruption Index (PDI) for all 168 BTP-coded named junctions
- Runs the greedy enforcement schedule optimizer (70 deployment slots across 7 days x 2 shifts x 5 teams)
- Writes GeoJSON for the Leaflet map (top 80 junctions)
- Writes hourly breakdowns, station totals, vehicle distribution, and monthly trend files
- Writes summary.json used by the Flask dashboard
Output: 10 files written to data/processed/
python ml/train.pyExpected runtime: 60–120 seconds.
What it does:
- Engineers 22 junction-level features from raw violation records (temporal, spatial, violation-type, vehicle severity)
- Labels junctions: PDI in the top 40% = high-impact hotspot (label 1)
- Splits 75/25 train/test with stratification
- Trains Logistic Regression and Random Forest with balanced class weights
- Selects the best model by AUC on the held-out test set
- Runs 5-fold cross-validation and reports mean AUC and standard deviation
- Saves the trained pipeline to
ml/model.pkl - Scores all 168 junctions with hotspot probability and merges into
junction_pdi.csv
Reported metrics on this dataset: AUC 0.981, F1 0.839, CV AUC 0.963 +/- 0.034
python app.pyOpen: http://localhost:5000
For production deployment:
gunicorn app:app --bind 0.0.0.0:5000 --workers 2Once the app is running, the ML Model page (/model) provides a training panel with step-by-step controls. "Re-run pipeline" and "Train model" buttons POST to /api/run-pipeline and /api/run-training respectively, execute the Python scripts server-side, stream a log console to the browser, and update the displayed metrics on completion. No terminal access is required for re-runs.
clockit/
|
+-- data/
| +-- raw/
| | \-- violations.csv <- Place your dataset here
| \-- processed/ <- Auto-generated by pipeline.py
| +-- junction_pdi.csv <- PDI scores for all 168 junctions
| +-- junctions.geojson <- Top 80 junctions for Leaflet map
| +-- enforcement_schedule.csv <- Weekly deployment calendar
| +-- junction_hourly.csv <- Per-junction hour breakdown
| +-- city_hourly.csv <- City-wide hourly distribution
| +-- city_dow.csv <- Day-of-week breakdown
| +-- police_stations.csv <- Station totals and junction counts
| +-- vehicle_types.csv <- Vehicle type breakdown
| +-- monthly_trend.csv <- Month-by-month trend
| \-- summary.json <- Dashboard KPIs
|
+-- ml/
| +-- train.py <- ML training script (run after pipeline)
| +-- model.pkl <- Trained classifier (auto-generated)
| +-- metrics.json <- AUC, precision, recall, F1 (auto-generated)
| +-- feature_importance.json <- Feature rankings (auto-generated)
| \-- feature_names.json <- Feature list for inference (auto-generated)
|
+-- static/
| +-- css/main.css <- Light theme stylesheet
| \-- js/main.js <- Chart.js defaults + shared utilities
|
+-- templates/
| +-- base.html <- Top rail layout + shared navigation
| +-- index.html <- Overview dashboard
| +-- map.html <- Leaflet hotspot map
| +-- schedule.html <- Enforcement schedule (calendar/table/map)
| +-- analytics.html <- Analytics and bias analysis
| +-- model.html <- ML model metrics + in-app training panel
| \-- brief.html <- Per-junction enforcement brief
|
+-- utils/
| +-- __init__.py
| \-- pdf_generator.py <- ReportLab PDF brief generator
|
+-- pipeline.py <- Step 1: Data processing and PDI engine
+-- app.py <- Step 2: Flask web application
+-- requirements.txt
+-- .gitignore
\-- README.md
| Claim | Supported | Method |
|---|---|---|
| Identifies parking hotspots | Yes | Spatial clustering at BTP-coded junction level |
| Scores disruption potential | Yes | PDI formula — transparent, every coefficient auditable |
| Corrects for enforcement bias | Yes | Violations divided by patrol-window exposure |
| Optimises enforcement deployment | Yes | Greedy schedule maximising PDI reduction per team |
| Predicts high-impact junctions | Yes | Logistic Regression on 22 raw observation features |
| Reduces traffic congestion | Cannot claim | No traffic sensor data to validate against |
| Quantifies congestion impact | Proxy only | PDI approximates disruption potential, not measured flow |
The distinction between the last two rows and the rest is deliberate. Stating it explicitly is stronger than a claim that cannot be defended under questioning.
PDI(junction) =
0.35 x bias_corrected_frequency <- violations divided by patrol-window exposure
+ 0.20 x main_road_rate <- share involving carriageway blockage
+ 0.20 x repeat_consistency <- share of weeks the junction was active
+ 0.15 x multi_violation_rate <- share of records with 2 or more offences
+ 0.10 x peak_hour_rate <- share during 07-10h and 17-20h
x avg_vehicle_severity_weight <- Private Bus 1.5x, Car 1.4x, Scooter 0.8x
All five components are computed directly from the dataset. No external data. No external calibration. Every junction score is reproducible from the raw CSV in a single pipeline run.
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/summary |
Dashboard KPIs |
| GET | /api/junctions |
GeoJSON for Leaflet map. Optional: ?tier=Critical |
| GET | /api/pdi-leaderboard |
Top N junctions by PDI. Optional: ?n=20 |
| GET | /api/schedule |
Enforcement schedule. Optional: ?day=Monday&shift=Morning |
| GET | /api/hourly |
City-wide hourly violation distribution |
| GET | /api/junction-hourly/<btp_code> |
Per-junction hourly breakdown |
| GET | /api/stations |
Police station totals and junction counts |
| GET | /api/vehicle-types |
Vehicle type distribution |
| GET | /api/monthly |
Month-by-month violation trend |
| GET | /api/dow |
Day-of-week distribution |
| GET | /api/model-metrics |
ML classifier metrics |
| GET | /api/feature-importance |
Feature importance ranking |
| GET | /api/brief-pdf/<btp_code> |
Download PDF enforcement brief |
| POST | /api/run-pipeline |
Trigger pipeline.py server-side |
| POST | /api/run-training |
Trigger ml/train.py server-side, returns updated metrics |
| POST | /api/predict |
Score a junction given its feature values |
Backend: Python 3.10+, Flask 3.0 Data processing: pandas, numpy Machine learning: scikit-learn (LogisticRegression, RandomForestClassifier) Schedule optimizer: PuLP with greedy fallback PDF generation: ReportLab Frontend: Jinja2, Chart.js 4, Leaflet.js 1.9, vanilla JS Map tiles: CartoDB Light (no API key required) Fonts: DM Serif Display (brand mark only), Inter (all UI)
No external data sources. No API keys. Runs entirely offline.
Source: Bengaluru Traffic Police violation records Period: November 2023 to April 2024 Records: 298,450 Named junctions: 168 BTP-coded intersections Police stations: 34 across Bengaluru
ModuleNotFoundError: No module named 'pulp'
PuLP is optional. The optimizer falls back to greedy automatically. Remove it from requirements.txt if your environment cannot install it.
ERROR: Run pipeline.py first!
The app checks for data/processed/junction_pdi.csv on startup. Run python pipeline.py before python app.py.
WARNING: ML metrics not found
The app runs without the ML model but the Model page will show empty metrics. Run python ml/train.py to generate them.
Map shows no pins
Confirm data/processed/junctions.geojson exists and has features. Re-run python pipeline.py if not.
PDF brief download fails
ReportLab must be installed: pip install reportlab. Check the /api/brief-pdf/<btp_code> endpoint for the error message in the JSON response.