Skip to content

Implement extended spam detection research for premium users, bio invite links, and channel messages#12

Draft
Copilot wants to merge 2 commits into
masterfrom
copilot/research-anti-spam-criteria
Draft

Implement extended spam detection research for premium users, bio invite links, and channel messages#12
Copilot wants to merge 2 commits into
masterfrom
copilot/research-anti-spam-criteria

Conversation

Copy link
Copy Markdown

Copilot AI commented Oct 22, 2025

This PR implements comprehensive research into extended criteria for automatic spam detection as requested in issue #XX. The implementation provides three new detection mechanisms to enhance the existing antispam system.

New Detection Criteria

1. Bio Invite Link Detection 🟡 Partially Implemented

Automatically detects users with Telegram invite links in their profile descriptions:

  • Supports patterns: t.me/joinchat/*, t.me/+*, telegram.me/joinchat/*, telegram.me/+*
  • Works for users with public @username (Bot API limitation)
  • Configurable via EXTENDED_BIO_INVITE_CHECK_ENABLED

2. Channel Message Detection 🟢 Fully Implemented

Identifies messages sent on behalf of channels while respecting existing auto-forward logic:

  • Detects messages with sender_chat (channel posting)
  • Safely excludes auto-forwards from linked discussion channels (user.id == 777000)
  • Maintains compatibility with existing channel handling
  • Configurable via EXTENDED_CHANNEL_MESSAGE_CHECK_ENABLED

3. Premium User Channel Analysis 🔴 Research Framework

Provides foundation for detecting premium users with suspicious linked channels:

  • Framework ready for implementation when MTProto API access available
  • Currently returns false due to Bot API limitations (no premium status or linked channel access)
  • Disabled by default: EXTENDED_PREMIUM_CHANNEL_CHECK_ENABLED=false

Technical Implementation

The new ExtendedSpamDetector class integrates seamlessly with the existing spam detection workflow in telegram_messages.py. Extended checks run before OpenAI analysis for performance optimization and include comprehensive error handling.

# Example usage in existing workflow
if not is_spam and EXTENDED_SPAM_DETECTION_ENABLED:
    is_spam = await check_extended_spam_criteria(user, message, context)

API Research Findings

Available through Telegram Bot API:

  • message.sender_chat for channel detection
  • ✅ User bio via getChat() for public users only
  • ❌ User premium status (not provided by Bot API)
  • ❌ User's linked channels (not accessible)
  • ❌ Channel message history (requires admin privileges)

Risk Assessment & Safety

  • Low Risk: Channel message detection uses clear technical indicators
  • Medium Risk: Bio invite detection may affect legitimate community administrators
  • High Risk: Premium analysis (when fully implemented) could impact new legitimate users

All detection methods are individually configurable to allow gradual rollout and risk management.

Configuration Options

EXTENDED_SPAM_DETECTION_ENABLED=true                # Master toggle
EXTENDED_BIO_INVITE_CHECK_ENABLED=true              # Bio invite links
EXTENDED_CHANNEL_MESSAGE_CHECK_ENABLED=true         # Channel messages  
EXTENDED_PREMIUM_CHANNEL_CHECK_ENABLED=false        # Premium analysis (disabled)

Security & Testing

  • ✅ CodeQL analysis: 0 vulnerabilities found
  • ✅ Regex patterns validated against injection attacks
  • ✅ Comprehensive test coverage for implemented features
  • ✅ Graceful error handling prevents system disruption

Documentation

  • Complete technical documentation in EXTENDED_SPAM_DETECTION.md
  • Updated README with feature descriptions and usage examples
  • Configuration examples in .env.example
  • Demonstration script showing all functionality

This implementation provides enhanced spam detection capabilities while maintaining system stability and respecting current API limitations. The modular design enables future enhancements as additional API access becomes available.

Original prompt

This section details on the original issue you should resolve

<issue_title>Исследовать расширенные критерии автоматического определения спамеров при входе и отправке сообщений</issue_title>
<issue_description>Необходимо провести исследование возможности автоматического отнесения к спамерам следующих категорий пользователей:

  1. Премиум-юзеры с привязанным каналом, в котором только одно сообщение со ссылкой или кнопкой на другой канал или чатбота.
  2. Юзеры, у которых в био (описании профиля) есть инвайт-ссылка.
  3. Сообщения, отправленные от имени каналов (за исключением форварда из канала, куда группа привязана как группа для комментариев).

Исследовать:

  • Технические возможности реализации фильтрации по этим признакам.
  • Какие данные доступны через Telegram API для автоматического детектирования таких случаев.
  • Риски ложнопозитивных срабатываний.
  • Варианты интеграции с уже существующим антиспам-ботом.

Цель — повысить точность и автоматизм обнаружения спамеров при присоединении к чату или отправке сообщения.</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes #11


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Co-authored-by: insoln <11380274+insoln@users.noreply.github.com>
Copilot AI changed the title [WIP] Investigate advanced criteria for automatic spam detection Implement extended spam detection research for premium users, bio invite links, and channel messages Oct 22, 2025
Copilot AI requested a review from insoln October 22, 2025 15:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants