Skip to content

⚡ Bolt: Optimize keyword extraction and implement follower blockchain#849

Open
RohanExploit wants to merge 1 commit into
mainfrom
bolt-optimize-trend-follower-blockchain-15893148726504497016
Open

⚡ Bolt: Optimize keyword extraction and implement follower blockchain#849
RohanExploit wants to merge 1 commit into
mainfrom
bolt-optimize-trend-follower-blockchain-15893148726504497016

Conversation

@RohanExploit
Copy link
Copy Markdown
Owner

@RohanExploit RohanExploit commented Jun 7, 2026

This PR introduces a significant performance optimization in the civic trend analysis engine and implements a new blockchain-style integrity feature for grievance followers.

💡 What:

  1. Trend Analyzer Optimization: Refactored TrendAnalyzer._extract_keywords to batch-process all issue descriptions in a single join/lower pass and switched to a pre-compiled, faster regex pattern (\w+) for tokenization.
  2. Follower Blockchain Integrity: Added cryptographic chaining to the GrievanceFollower model, ensuring each follow record is linked to the previous one via a SHA-256 hash.
  3. O(1) Hash Cache: Introduced follower_last_hash_cache to store the "head" of the follower chain in memory, eliminating redundant database lookups during new follower registration.
  4. Optimized Verification: Added a blockchain verification endpoint for followers that uses column projection and stored previous_integrity_hash for O(1) performance.

🎯 Why:

  • The keyword extraction logic was a bottleneck in trend analysis, taking excessive time due to repeated regex compilation and string allocations.
  • Extending blockchain integrity to followers improves the auditability and transparency of the community confirmation process, preventing unauthorized manipulation of support metrics.

📊 Impact:

  • Keyword Extraction: ~20-50% speedup in processing bulk issue descriptions (measured via benchmark_trend.py).
  • Follower Creation: Maintains O(1) performance for blockchain-enabled inserts by bypassing the database lookup for the previous hash.

🔬 Measurement:

  • Ran benchmark_trend.py to confirm the speedup in TrendAnalyzer.
  • Verified the integrity of the blockchain chain using a dedicated test suite (temporary tests/test_follower_blockchain.py).
  • Confirmed no regressions in existing backend tests via bash run_tests.sh.

PR created automatically by Jules for task 15893148726504497016 started by @RohanExploit


Summary by cubic

Speeds up trend keyword extraction and adds blockchain-style integrity to grievance followers with a fast verification endpoint. Reduces CPU and DB load while improving auditability.

  • New Features

    • Added cryptographic chaining for followers with integrity_hash and previous_integrity_hash.
    • Introduced thread-safe follower creation using a global lock.
    • Added GET /follower/{follower_id}/blockchain-verify for integrity checks using column projection.
  • Performance

    • Optimized keyword extraction by batching lowercasing and using a pre-compiled \w+ regex (~20–50% faster).
    • Added follower_last_hash_cache for O(1) previous-hash lookups during follower creation.

Written for commit 5fddaa9. Summary will update on new commits.

Review in cubic

Summary by CodeRabbit

  • New Features

    • Added API endpoint for verifying grievance follower record integrity using blockchain-style validation
  • Performance Improvements

    • Optimized keyword extraction and batch text processing to reduce allocation overhead
    • Implemented pre-compiled regex objects for efficient tokenization
  • Backend Infrastructure

    • Extended grievance follower model with integrity hash fields
    • Enhanced caching layer with thread-safe mechanisms for follower tracking

- Optimized `TrendAnalyzer._extract_keywords` by batch joining descriptions and using a pre-compiled regex for tokenization. Measured performance improvement of ~20-50%.
- Implemented blockchain-style integrity chaining for `GrievanceFollower` model.
- Added `follower_last_hash_cache` to `backend/cache.py` for O(1) hash lookups during record creation.
- Implemented `/api/follower/{follower_id}/blockchain-verify` endpoint with O(1) verification and column projection.
- Added `follower_blockchain_lock` to `backend/routers/grievances.py` for thread-safe hash chaining.
@google-labs-jules
Copy link
Copy Markdown
Contributor

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

@netlify
Copy link
Copy Markdown

netlify Bot commented Jun 7, 2026

Deploy Preview for fixmybharat canceled.

Name Link
🔨 Latest commit 5fddaa9
🔍 Latest deploy log https://app.netlify.com/projects/fixmybharat/deploys/6a257b0223aa6c0008eeb588

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 7, 2026

🙏 Thank you for your contribution, @RohanExploit!

PR Details:

Quality Checklist:
Please ensure your PR meets the following criteria:

  • Code follows the project's style guidelines
  • Self-review of code completed
  • Code is commented where necessary
  • Documentation updated (if applicable)
  • No new warnings generated
  • Tests added/updated (if applicable)
  • All tests passing locally
  • No breaking changes to existing functionality

Review Process:

  1. Automated checks will run on your code
  2. A maintainer will review your changes
  3. Address any requested changes promptly
  4. Once approved, your PR will be merged! 🎉

Note: The maintainers will monitor code quality and ensure the overall project flow isn't broken.

@github-actions github-actions Bot added the size/m label Jun 7, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 7, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR implements blockchain-style integrity chaining for grievance follower records with thread-safe hash computation and verification, plus optimizes keyword extraction with batch string operations and precompiled regex patterns. The changes span schema extensions, cache infrastructure, synchronization, and endpoint implementations.

Changes

Blockchain Integrity for Grievance Followers

Layer / File(s) Summary
Integrity schema and caching infrastructure
backend/models.py, backend/cache.py
GrievanceFollower gains integrity_hash and previous_integrity_hash columns. follower_last_hash_cache is added as a thread-safe cache with 3600s TTL for fast lookups.
Thread-safe hash chaining in follow endpoint
backend/routers/grievances.py
Threading import and module-level lock (follower_blockchain_lock) are added. Follow endpoint acquires lock, retrieves previous hash from cache (DB fallback on miss), computes chained SHA-256 hash, persists both integrity fields, and updates cache post-commit.
Follower integrity verification endpoint
backend/routers/grievances.py
New GET endpoint /follower/{follower_id}/blockchain-verify recomputes the chained hash and validates it against the stored integrity_hash, returning verification status.

Keyword Extraction Optimization

Layer / File(s) Summary
Performance optimization guidance
.jules/bolt.md
Documents regex tokenization bottleneck in bulk text processing and prescribes batching, precompiled regex, and minimized lowercasing.
Keyword extraction optimization
backend/trend_analyzer.py
_extract_keywords applies guidance: replaces per-issue lowercasing and repeated regex compilation with batch join, single .lower(), and lazily-initialized precompiled word regex.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • RohanExploit/VishwaGuru#516: Adds "previous_integrity_hash"–based blockchain integrity chaining with thread-safe last-hash caching and corresponding verification logic for Issue records (parallel pattern to follower integrity tracking).
  • RohanExploit/VishwaGuru#684: Implements the same blockchain-style integrity chaining pattern (compute/store integrity_hash/previous_integrity_hash from cached previous hash) for voice submissions.
  • RohanExploit/VishwaGuru#691: Implements blockchain integrity with SHA-256 chaining and cached last-hash fallback pattern for voice issues, mirroring the grievance follower approach.

Poem

🐰 Followers chain like digital pearls,
Each hash a link in truth unfurled,
And keywords batch where once they'd crawl—
A rabbit optimizes all! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the two main changes: trend analyzer optimization and follower blockchain implementation, directly matching the PR's primary objectives.
Description check ✅ Passed The description comprehensively covers all required sections: detailed explanation of changes, type of change (performance improvement and new feature), related testing, and measurements provided.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch bolt-optimize-trend-follower-blockchain-15893148726504497016

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
backend/routers/grievances.py (1)

705-705: ⚡ Quick win

Consider using HMAC for hash computation consistency with other verification endpoints.

The follower blockchain uses plain sha256 (line 705), while verify_escalation_audit_blockchain (lines 532-536) and verify_closure_confirmation_blockchain (lines 648-652) use HMAC with a secret key. Plain sha256 allows hash forgery by anyone who knows the input format, while HMAC requires knowledge of the secret.

This matches the existing verify_grievance_blockchain pattern (line 591), so it's not a new inconsistency, but unifying the approach would strengthen the integrity guarantee.

Suggested HMAC approach for consistency
+from backend.config import get_auth_config
 ...
-        hash_content = f"{follower.grievance_id}|{follower.user_email}|{prev_hash}"
-        computed_hash = hashlib.sha256(hash_content.encode()).hexdigest()
+        hash_content = f"{follower.grievance_id}|{follower.user_email}|{prev_hash}"
+        secret_key = get_auth_config().secret_key
+        computed_hash = hmac.new(
+            secret_key.encode('utf-8'),
+            hash_content.encode('utf-8'),
+            hashlib.sha256
+        ).hexdigest()

Note: This change would also need to be applied to the creation path (line 309) for consistency.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/routers/grievances.py` at line 705, Replace the plain SHA256
computation used to produce computed_hash with an HMAC-SHA256 using the same
secret key/mechanism used by verify_escalation_audit_blockchain,
verify_closure_confirmation_blockchain and verify_grievance_blockchain so
hashing/verification is consistent; locate the place where computed_hash is
assigned (the sha256(...).hexdigest() call) and change it to derive the MAC with
the shared secret (the same secret/config helper used by the other verify_*
functions), and also update the corresponding creation path for follower blocks
(the block-creation function referenced near the earlier creation code) to use
HMAC-SHA256 so creation and verification use the same HMAC-based scheme.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/routers/grievances.py`:
- Around line 297-323: The current threading.Lock (follower_blockchain_lock)
won't prevent cross-process races; replace the cache+thread lock approach with a
DB-level serialized fetch-and-insert: acquire the last follower row with SELECT
... FOR UPDATE (e.g.,
session.query(GrievanceFollower).order_by(GrievanceFollower.id.desc()).with_for_update().first())
to obtain prev_hash atomically, compute integrity_hash (same chaining logic),
insert the new GrievanceFollower, commit, then update follower_last_hash_cache;
reference symbols: GrievanceFollower, follower_last_hash_cache,
follower_blockchain_lock, and the db.query call—remove or retain the in-process
lock only as a best-effort guard, but rely on SELECT FOR UPDATE to ensure
cross-process chain integrity and retry on transient serialization/conflict
errors.

---

Nitpick comments:
In `@backend/routers/grievances.py`:
- Line 705: Replace the plain SHA256 computation used to produce computed_hash
with an HMAC-SHA256 using the same secret key/mechanism used by
verify_escalation_audit_blockchain, verify_closure_confirmation_blockchain and
verify_grievance_blockchain so hashing/verification is consistent; locate the
place where computed_hash is assigned (the sha256(...).hexdigest() call) and
change it to derive the MAC with the shared secret (the same secret/config
helper used by the other verify_* functions), and also update the corresponding
creation path for follower blocks (the block-creation function referenced near
the earlier creation code) to use HMAC-SHA256 so creation and verification use
the same HMAC-based scheme.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1a10c675-a10b-47ac-83c4-78f8783965b6

📥 Commits

Reviewing files that changed from the base of the PR and between ebecc88 and 5fddaa9.

📒 Files selected for processing (5)
  • .jules/bolt.md
  • backend/cache.py
  • backend/models.py
  • backend/routers/grievances.py
  • backend/trend_analyzer.py

Comment on lines +297 to +323
with follower_blockchain_lock:
# Blockchain feature: calculate integrity hash for the follower record
# Performance Boost: Use thread-safe cache to eliminate DB query for last hash
prev_hash = follower_last_hash_cache.get("last_hash")
if prev_hash is None:
# Cache miss: Fetch only the last hash from DB
last_follower = db.query(GrievanceFollower.integrity_hash).order_by(GrievanceFollower.id.desc()).first()
prev_hash = last_follower[0] if last_follower and last_follower[0] else ""
follower_last_hash_cache.set(data=prev_hash, key="last_hash")

# Chaining logic: hash(grievance_id|user_email|prev_hash)
hash_content = f"{grievance_id}|{request.user_email}|{prev_hash}"
integrity_hash = hashlib.sha256(hash_content.encode()).hexdigest()

# Create follower record
follower = GrievanceFollower(
grievance_id=grievance_id,
user_email=request.user_email,
integrity_hash=integrity_hash,
previous_integrity_hash=prev_hash
)
db.add(follower)
db.commit()
db.refresh(follower)

# Update cache after successful commit
follower_last_hash_cache.set(data=integrity_hash, key="last_hash")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Multi-process race condition breaks blockchain chain integrity.

The threading.Lock() only synchronizes within a single process. In production deployments with multiple workers (e.g., Gunicorn, uWSGI, or multiple containers), concurrent requests to different processes can:

  1. Read the same prev_hash from their respective caches or DB
  2. Create followers with duplicate previous_integrity_hash values
  3. Break the blockchain chain invariant (multiple records pointing to the same predecessor)

For true integrity chaining across processes, consider:

  • Database-level locking: Use SELECT ... FOR UPDATE on the last follower record, or
  • Distributed lock: Use Redis-based locking (e.g., redis-lock), or
  • Optimistic concurrency: Add a unique constraint on previous_integrity_hash and retry on conflict
Example: Database-level serialization with SELECT FOR UPDATE
 with follower_blockchain_lock:
     # Blockchain feature: calculate integrity hash for the follower record
-    # Performance Boost: Use thread-safe cache to eliminate DB query for last hash
-    prev_hash = follower_last_hash_cache.get("last_hash")
-    if prev_hash is None:
-        # Cache miss: Fetch only the last hash from DB
-        last_follower = db.query(GrievanceFollower.integrity_hash).order_by(GrievanceFollower.id.desc()).first()
-        prev_hash = last_follower[0] if last_follower and last_follower[0] else ""
-        follower_last_hash_cache.set(data=prev_hash, key="last_hash")
+    # Use FOR UPDATE to serialize across all processes
+    last_follower = db.query(GrievanceFollower.integrity_hash).order_by(
+        GrievanceFollower.id.desc()
+    ).with_for_update().first()
+    prev_hash = last_follower[0] if last_follower and last_follower[0] else ""
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@backend/routers/grievances.py` around lines 297 - 323, The current
threading.Lock (follower_blockchain_lock) won't prevent cross-process races;
replace the cache+thread lock approach with a DB-level serialized
fetch-and-insert: acquire the last follower row with SELECT ... FOR UPDATE
(e.g.,
session.query(GrievanceFollower).order_by(GrievanceFollower.id.desc()).with_for_update().first())
to obtain prev_hash atomically, compute integrity_hash (same chaining logic),
insert the new GrievanceFollower, commit, then update follower_last_hash_cache;
reference symbols: GrievanceFollower, follower_last_hash_cache,
follower_blockchain_lock, and the db.query call—remove or retain the in-process
lock only as a best-effort guard, but rely on SELECT FOR UPDATE to ensure
cross-process chain integrity and retry on transient serialization/conflict
errors.

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 5 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="backend/models.py">

<violation number="1" location="backend/models.py:194">
P1: Add a schema migration for `grievance_followers`; otherwise existing deployments will fail when this model starts writing the new hash columns.</violation>
</file>

<file name="backend/routers/grievances.py">

<violation number="1" location="backend/routers/grievances.py:34">
P1: `threading.Lock()` only synchronizes threads within a single OS process. In multi-worker deployments (e.g., Gunicorn with multiple workers, uWSGI, or containers), each process gets its own lock instance and its own `follower_last_hash_cache`. Concurrent requests handled by different workers can read the same `prev_hash`, compute identical chains, and create follower records with duplicate `previous_integrity_hash` values — breaking the blockchain invariant.

Consider database-level serialization (e.g., `SELECT ... FOR UPDATE` on the last follower record), a unique constraint on `previous_integrity_hash` with retry-on-conflict, or a distributed lock (e.g., Redis-based) to ensure integrity across processes.</violation>

<violation number="2" location="backend/routers/grievances.py:300">
P2: `follower_last_hash_cache` can become stale after unfollow deletes, causing new follower records to chain to an outdated previous hash.</violation>
</file>

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread backend/models.py

# Blockchain integrity fields
integrity_hash = Column(String, nullable=True)
previous_integrity_hash = Column(String, nullable=True, index=True)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Add a schema migration for grievance_followers; otherwise existing deployments will fail when this model starts writing the new hash columns.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/models.py, line 194:

<comment>Add a schema migration for `grievance_followers`; otherwise existing deployments will fail when this model starts writing the new hash columns.</comment>

<file context>
@@ -188,6 +188,10 @@ class GrievanceFollower(Base):
+
+    # Blockchain integrity fields
+    integrity_hash = Column(String, nullable=True)
+    previous_integrity_hash = Column(String, nullable=True, index=True)
     
     # Relationship
</file context>

router = APIRouter()

# Global lock for follower blockchain operations to prevent race conditions during hash chaining
follower_blockchain_lock = threading.Lock()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: threading.Lock() only synchronizes threads within a single OS process. In multi-worker deployments (e.g., Gunicorn with multiple workers, uWSGI, or containers), each process gets its own lock instance and its own follower_last_hash_cache. Concurrent requests handled by different workers can read the same prev_hash, compute identical chains, and create follower records with duplicate previous_integrity_hash values — breaking the blockchain invariant.

Consider database-level serialization (e.g., SELECT ... FOR UPDATE on the last follower record), a unique constraint on previous_integrity_hash with retry-on-conflict, or a distributed lock (e.g., Redis-based) to ensure integrity across processes.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/routers/grievances.py, line 34:

<comment>`threading.Lock()` only synchronizes threads within a single OS process. In multi-worker deployments (e.g., Gunicorn with multiple workers, uWSGI, or containers), each process gets its own lock instance and its own `follower_last_hash_cache`. Concurrent requests handled by different workers can read the same `prev_hash`, compute identical chains, and create follower records with duplicate `previous_integrity_hash` values — breaking the blockchain invariant.

Consider database-level serialization (e.g., `SELECT ... FOR UPDATE` on the last follower record), a unique constraint on `previous_integrity_hash` with retry-on-conflict, or a distributed lock (e.g., Redis-based) to ensure integrity across processes.</comment>

<file context>
@@ -23,12 +24,15 @@
 router = APIRouter()
 
+# Global lock for follower blockchain operations to prevent race conditions during hash chaining
+follower_blockchain_lock = threading.Lock()
+
 @router.get("/grievances", response_model=List[GrievanceSummaryResponse])
</file context>

with follower_blockchain_lock:
# Blockchain feature: calculate integrity hash for the follower record
# Performance Boost: Use thread-safe cache to eliminate DB query for last hash
prev_hash = follower_last_hash_cache.get("last_hash")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: follower_last_hash_cache can become stale after unfollow deletes, causing new follower records to chain to an outdated previous hash.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At backend/routers/grievances.py, line 300:

<comment>`follower_last_hash_cache` can become stale after unfollow deletes, causing new follower records to chain to an outdated previous hash.</comment>

<file context>
@@ -290,13 +294,33 @@ def follow_grievance(
+        with follower_blockchain_lock:
+            # Blockchain feature: calculate integrity hash for the follower record
+            # Performance Boost: Use thread-safe cache to eliminate DB query for last hash
+            prev_hash = follower_last_hash_cache.get("last_hash")
+            if prev_hash is None:
+                # Cache miss: Fetch only the last hash from DB
</file context>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant