Skip to content

Redesign antispam architecture: eliminate suspicious users, implement trust-based system with per-group spammer management#4

Open
Copilot wants to merge 6 commits into
masterfrom
copilot/save-suspicious-user-status
Open

Redesign antispam architecture: eliminate suspicious users, implement trust-based system with per-group spammer management#4
Copilot wants to merge 6 commits into
masterfrom
copilot/save-suspicious-user-status

Conversation

Copy link
Copy Markdown

Copilot AI commented Oct 16, 2025

✅ Implemented New Architecture: Eliminated Suspicious Users Concept

Based on @insoln's feedback, I've completely redesigned the antispam system to be simpler and more effective:

Key Changes:

1. Eliminated Suspicious Users Cache

  • ❌ Removed suspicious_users_cache entirely
  • ❌ No more complex "suspicious" state management
  • ✅ Simplified to two clear states: trusted vs unknown

2. New Trust-Based System

  • 🔍 Unknown users: No seen_message=TRUE in ANY group → run spam check
  • Trusted users: Has seen_message=TRUE in at least one group → skip spam check globally
  • 🚫 Spammers: Per-group spammer status instead of global

3. Per-Group Spammer Management

  • Before: Global spammer list (spammers_cache = set())
  • After: Per-group spammer tracking (spammers_cache = {user_id: {group_ids}})
  • Cross-group banning: If user is spammer anywhere → auto-ban in current group
  • Admin unban support: Removes spammer flag only for specific group

4. New Message Processing Logic

# Check if user is spammer anywhere → ban in current group
if is_user_spammer_anywhere(user.id):
    ban_and_mark_spammer_in_group()

# Check if user is trusted globally → skip spam check  
elif is_user_trusted(user.id):
    mark_seen_message_in_current_group()

# Unknown user → run spam check
else:
    run_spam_check()
    if spam: mark_as_spammer_in_group()
    else: mark_as_trusted()

5. Admin Unban Handling

  • Detects when admin unbans a user (status change: banned → member)
  • Removes spammer flag only for that specific group
  • Preserves spammer status in other groups

6. Fixed Group Join LogicUpdated

  • Issue: Was only checking if user is spammer in current group
  • Fix: Now checks if user is spammer in ANY group and bans accordingly
  • Behavior: Cross-group spammer detection on join attempts

7. Updated Test Suite 🧪 Latest Update

  • Fixed: Eliminated all references to outdated suspicious_users_cache
  • Fixed: Removed references to non-existent suspicious database column
  • Added: Tests for new trust-based system functionality
  • Added: Tests for per-group spammer tracking and cross-group detection
  • Updated: Tests now properly reflect the new architecture

Benefits:

Simpler logic: Only 2 states instead of 3 (trusted/unknown vs suspicious/trusted/spammer)
Better UX: Once trusted in one group, trusted everywhere
Granular control: Per-group spammer management allows selective unbanning
Cross-group protection: Spammers detected anywhere are blocked everywhere
Admin control: Unbanning works as expected without global impact

Database Changes:

  • spammers_cache structure: {user_id: set(group_ids)}
  • Added helper functions: is_user_trusted(), is_user_spammer_anywhere(), is_user_spammer_in_group()
  • Enhanced unban detection and handling
  • Fixed cross-group spammer detection on group joins
  • Updated tests to match new architecture without outdated references

This addresses @insoln's request to eliminate the suspicious users concept while providing better spam protection and admin control. The test suite now properly validates the new trust-based system.

Original prompt

This section details on the original issue you should resolve

<issue_title>Сохранять статус подозрительных пользователей при перезагрузке сервиса</issue_title>
<issue_description>Сейчас после перезагрузки сервиса пользователи, которые были помечены как подозрительные, теряют этот статус и перестают отслеживаться системой как подозрительные. Это снижает эффективность антиспам-фильтрации, так как потенциальные нарушители могут избежать блокировки.

Ожидаемое поведение:

  • Статус "подозрительный" должен сохраняться между рестартами сервиса.
    </issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes #3


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 2 commits October 16, 2025 17:50
Co-authored-by: insoln <11380274+insoln@users.noreply.github.com>
Co-authored-by: insoln <11380274+insoln@users.noreply.github.com>
Copilot AI changed the title [WIP] Save status of suspicious users on service restart Fix: Persist suspicious user status across service restarts Oct 16, 2025
Copilot AI requested a review from insoln October 16, 2025 18:00
@insoln
Copy link
Copy Markdown
Owner

insoln commented Oct 16, 2025

@copilot а разве seen_message=0 не является флагом подозрительности?

Copy link
Copy Markdown
Author

Copilot AI commented Oct 16, 2025

@copilot а разве seen_message=0 не является флагом подозрительности?

You're absolutely right! The seen_message = FALSE flag was already serving as the suspicious user persistence mechanism. Users who join groups get seen_message = FALSE by default, and load_user_caches() was already loading these users as suspicious on restart. My addition of a separate suspicious field is redundant.

The original system was working correctly:

  • User joins → seen_message = FALSE → loaded as suspicious after restart
  • User sends message → seen_message = TRUE → no longer suspicious after restart

Should I revert the changes and keep the simpler original approach using just seen_message?

Copilot AI changed the title Fix: Persist suspicious user status across service restarts Redundant: Add explicit suspicious field (seen_message=FALSE already provides persistence) Oct 16, 2025
@insoln
Copy link
Copy Markdown
Owner

insoln commented Oct 16, 2025

@copilot давай вообще избавимся от подозрительности: либо юзер есть в базе и про него известно, что он точно не спамер (seen_message=1 в любой группе), либо его нет в базе, и тогда следует проводить проверку на спам, и если она возвращает что сообщение спамерское - помечаем его как спамера (только в этой группе).

если приходит любое сообщение в любой группе от пользовтеля, у которого есть признак спама хотя бы в одной группе - баним его в этой группе и проставляем флаг спама в этой группе. при этом если мы получаем сообщение о разбане пользователя в группе админом - снимаем флаг спамера в этой группе

Copy link
Copy Markdown
Author

Copilot AI commented Oct 16, 2025

@copilot давай вообще избавимся от подозрительности: либо юзер есть в базе и про него известно, что он точно не сп�...

I'll implement this new architecture that eliminates the suspicious users concept. The new logic will be:

  1. Remove suspicious_users_cache entirely
  2. For any message: check if user has seen_message=1 in ANY group → if yes, skip spam check
  3. If no seen_message=1 found → run spam check
  4. Spammer status becomes per-group instead of global
  5. Cross-group banning: spammer in any group gets banned in current group
  6. Handle admin unbans to remove per-group spammer flags

Let me implement these changes.

…mmers, trusted users

Co-authored-by: insoln <11380274+insoln@users.noreply.github.com>
Copilot AI changed the title Redundant: Add explicit suspicious field (seen_message=FALSE already provides persistence) Redesign antispam architecture: eliminate suspicious users, implement trust-based system with per-group spammer management Oct 16, 2025
@insoln insoln marked this pull request as ready for review October 16, 2025 20:27
@insoln insoln requested a review from Copilot October 16, 2025 20:27
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR completely redesigns the antispam architecture by eliminating the "suspicious users" concept and implementing a trust-based system with per-group spammer management. The change addresses the original issue of preserving suspicious user status across restarts, but does so by fundamentally changing the approach from a three-state system (suspicious/trusted/spammer) to a two-state system (trusted/unknown).

Key changes include:

  • Replaced global spammer tracking with per-group spammer management
  • Implemented trust-based user classification where users trusted in any group skip spam checks globally
  • Added cross-group spammer detection and automatic banning
  • Enhanced admin unban support with per-group spammer flag removal

Reviewed Changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 9 comments.

File Description
bot/app/telegram_messages.py Implements new trust-based message processing logic with cross-group spammer detection
bot/app/telegram_groupmembership.py Updates group membership handling for per-group spammer tracking and admin unban detection
bot/app/database.py Refactors cache structure and adds helper functions for trust and spammer status checks
.devcontainer/tests/test_suspicious_persistence.py Adds comprehensive tests for the new architecture (contains outdated references)

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

from app.database import (
check_and_create_tables,
load_user_caches,
suspicious_users_cache,
Copy link

Copilot AI Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import references suspicious_users_cache which was removed in the new architecture. This import will cause an ImportError.

Copilot uses AI. Check for mistakes.
Comment on lines +87 to +88
INSERT INTO user_entries (user_id, group_id, join_date, suspicious, seen_message, spammer)
VALUES (%s, %s, NOW(), TRUE, FALSE, FALSE)
Copy link

Copilot AI Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suspicious column does not exist in the database schema. According to the check_and_create_tables function, the user_entries table only has seen_message and spammer columns.

Suggested change
INSERT INTO user_entries (user_id, group_id, join_date, suspicious, seen_message, spammer)
VALUES (%s, %s, NOW(), TRUE, FALSE, FALSE)
INSERT INTO user_entries (user_id, group_id, join_date, seen_message, spammer)
VALUES (%s, %s, NOW(), FALSE, FALSE)

Copilot uses AI. Check for mistakes.
load_user_caches()

# Verify that the user is loaded as suspicious
assert test_user_id in suspicious_users_cache, f"User {test_user_id} should be in suspicious cache after restart"
Copy link

Copilot AI Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test references suspicious_users_cache which no longer exists in the new architecture. This will cause a NameError.

Copilot uses AI. Check for mistakes.
Comment on lines +123 to +124
INSERT INTO user_entries (user_id, group_id, join_date, suspicious, seen_message, spammer)
VALUES (%s, %s, NOW(), TRUE, FALSE, FALSE)
Copy link

Copilot AI Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suspicious column does not exist in the database schema. This will cause a SQL error.

Copilot uses AI. Check for mistakes.
# Simulate user sending a good message (not spam)
cursor.execute(
"""
UPDATE user_entries SET seen_message = TRUE, spammer = FALSE, suspicious = FALSE WHERE user_id = %s
Copy link

Copilot AI Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

References non-existent suspicious column. This will cause a SQL error.

Copilot uses AI. Check for mistakes.
load_user_caches()

# Verify that the user is no longer suspicious
assert test_user_id not in suspicious_users_cache, f"User {test_user_id} should not be in suspicious cache after good message"
Copy link

Copilot AI Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

References suspicious_users_cache which no longer exists. This will cause a NameError.

Copilot uses AI. Check for mistakes.
Comment on lines +168 to +169
INSERT INTO user_entries (user_id, group_id, join_date, suspicious, seen_message, spammer)
VALUES (%s, %s, NOW(), TRUE, FALSE, FALSE)
Copy link

Copilot AI Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suspicious column does not exist in the database schema. This will cause a SQL error.

Copilot uses AI. Check for mistakes.
# Simulate user being marked as spammer
cursor.execute(
"""
UPDATE user_entries SET spammer = TRUE, suspicious = FALSE where user_id=%s and group_id = %s
Copy link

Copilot AI Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

References non-existent suspicious column. This will cause a SQL error.

Copilot uses AI. Check for mistakes.

# Verify that the user is now in spammers cache and not suspicious
assert test_user_id in spammers_cache, f"User {test_user_id} should be in spammers cache"
assert test_user_id not in suspicious_users_cache, f"User {test_user_id} should not be in suspicious cache after being marked as spammer"
Copy link

Copilot AI Oct 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

References suspicious_users_cache which no longer exists. This will cause a NameError.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner

@insoln insoln left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread bot/app/telegram_groupmembership.py Outdated
# Проверяем, снова ли пользователь вошёл в группу
if member.status == ChatMemberStatus.MEMBER:
if member.user.id in spammers_cache:
# Проверка: если пользователь спамер в этой группе - банить
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

если он спамер в любой группе - банить

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 5d7e232. Changed the logic to use is_user_spammer_anywhere() instead of is_user_spammer_in_group() so that users who are spammers in ANY group get banned when joining any group. Also added logic to mark them as spammers in the current group and update the database.

…st current group

Co-authored-by: insoln <11380274+insoln@users.noreply.github.com>
Copilot AI requested a review from insoln October 16, 2025 20:45
@insoln
Copy link
Copy Markdown
Owner

insoln commented Oct 16, 2025

@copilot обрати внимание на ревью #4 (review)

Copy link
Copy Markdown
Author

Copilot AI commented Oct 16, 2025

@copilot обрати внимание на ревью #4 (review)

I'll fix the test file issues identified in the review. The test file contains references to the old suspicious_users_cache and suspicious database column that no longer exist in the new architecture. Let me update it to test the new trust-based system instead.

…dated references

Co-authored-by: insoln <11380274+insoln@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Сохранять статус подозрительных пользователей при перезагрузке сервиса

3 participants