Implement extended spam detection research for premium users, bio invite links, and channel messages by Copilot · Pull Request #12 · insoln/buzz_buster

Copilot · 2025-10-22T15:36:54Z

This PR implements comprehensive research into extended criteria for automatic spam detection as requested in issue #XX. The implementation provides three new detection mechanisms to enhance the existing antispam system.

New Detection Criteria

1. Bio Invite Link Detection 🟡 Partially Implemented

Automatically detects users with Telegram invite links in their profile descriptions:

Supports patterns: t.me/joinchat/*, t.me/+*, telegram.me/joinchat/*, telegram.me/+*
Works for users with public @username (Bot API limitation)
Configurable via EXTENDED_BIO_INVITE_CHECK_ENABLED

2. Channel Message Detection 🟢 Fully Implemented

Identifies messages sent on behalf of channels while respecting existing auto-forward logic:

Detects messages with sender_chat (channel posting)
Safely excludes auto-forwards from linked discussion channels (user.id == 777000)
Maintains compatibility with existing channel handling
Configurable via EXTENDED_CHANNEL_MESSAGE_CHECK_ENABLED

3. Premium User Channel Analysis 🔴 Research Framework

Provides foundation for detecting premium users with suspicious linked channels:

Framework ready for implementation when MTProto API access available
Currently returns false due to Bot API limitations (no premium status or linked channel access)
Disabled by default: EXTENDED_PREMIUM_CHANNEL_CHECK_ENABLED=false

Technical Implementation

The new ExtendedSpamDetector class integrates seamlessly with the existing spam detection workflow in telegram_messages.py. Extended checks run before OpenAI analysis for performance optimization and include comprehensive error handling.

# Example usage in existing workflow
if not is_spam and EXTENDED_SPAM_DETECTION_ENABLED:
    is_spam = await check_extended_spam_criteria(user, message, context)

API Research Findings

Available through Telegram Bot API:

✅ message.sender_chat for channel detection
✅ User bio via getChat() for public users only
❌ User premium status (not provided by Bot API)
❌ User's linked channels (not accessible)
❌ Channel message history (requires admin privileges)

Risk Assessment & Safety

Low Risk: Channel message detection uses clear technical indicators
Medium Risk: Bio invite detection may affect legitimate community administrators
High Risk: Premium analysis (when fully implemented) could impact new legitimate users

All detection methods are individually configurable to allow gradual rollout and risk management.

Configuration Options

EXTENDED_SPAM_DETECTION_ENABLED=true                # Master toggle
EXTENDED_BIO_INVITE_CHECK_ENABLED=true              # Bio invite links
EXTENDED_CHANNEL_MESSAGE_CHECK_ENABLED=true         # Channel messages  
EXTENDED_PREMIUM_CHANNEL_CHECK_ENABLED=false        # Premium analysis (disabled)

Security & Testing

✅ CodeQL analysis: 0 vulnerabilities found
✅ Regex patterns validated against injection attacks
✅ Comprehensive test coverage for implemented features
✅ Graceful error handling prevents system disruption

Documentation

Complete technical documentation in EXTENDED_SPAM_DETECTION.md
Updated README with feature descriptions and usage examples
Configuration examples in .env.example
Demonstration script showing all functionality

This implementation provides enhanced spam detection capabilities while maintaining system stability and respecting current API limitations. The modular design enables future enhancements as additional API access becomes available.

Original prompt

This section details on the original issue you should resolve

<issue_title>Исследовать расширенные критерии автоматического определения спамеров при входе и отправке сообщений</issue_title>
<issue_description>Необходимо провести исследование возможности автоматического отнесения к спамерам следующих категорий пользователей:

Премиум-юзеры с привязанным каналом, в котором только одно сообщение со ссылкой или кнопкой на другой канал или чатбота.

Юзеры, у которых в био (описании профиля) есть инвайт-ссылка.

Сообщения, отправленные от имени каналов (за исключением форварда из канала, куда группа привязана как группа для комментариев).

Исследовать:

Технические возможности реализации фильтрации по этим признакам.

Какие данные доступны через Telegram API для автоматического детектирования таких случаев.

Риски ложнопозитивных срабатываний.

Варианты интеграции с уже существующим антиспам-ботом.

Цель — повысить точность и автоматизм обнаружения спамеров при присоединении к чату или отправке сообщения.</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes #11

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Co-authored-by: insoln <11380274+insoln@users.noreply.github.com>

Initial plan

0880ef9

Copilot AI assigned Copilot and insoln Oct 22, 2025

Copilot started work on behalf of insoln October 22, 2025 15:37 View session

Implement extended spam detection research features

572a506

Co-authored-by: insoln <11380274+insoln@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Investigate advanced criteria for automatic spam detection~~ Implement extended spam detection research for premium users, bio invite links, and channel messages Oct 22, 2025

Copilot AI requested a review from insoln October 22, 2025 15:50

Copilot finished work on behalf of insoln October 22, 2025 15:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement extended spam detection research for premium users, bio invite links, and channel messages#12

Implement extended spam detection research for premium users, bio invite links, and channel messages#12
Copilot wants to merge 2 commits into
masterfrom
copilot/research-anti-spam-criteria

Copilot AI commented Oct 22, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Oct 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

New Detection Criteria

1. Bio Invite Link Detection 🟡 Partially Implemented

2. Channel Message Detection 🟢 Fully Implemented

3. Premium User Channel Analysis 🔴 Research Framework

Technical Implementation

API Research Findings

Risk Assessment & Safety

Configuration Options

Security & Testing

Documentation

Comments on the Issue (you are @copilot in this section)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Oct 22, 2025 •

edited

Loading