CoRetweetsNumpyApproach produces incorrect results compared to CoRetweetsApproach

Title: CoRetweetsNumpyApproach produces incorrect results compared to CoRetweetsApproach

Labels: bug, priority:high, correctness

---

## Description

The NumPy-optimized version of the co-retweets detection approach (`CoRetweetsNumpyApproach`) does not produce equivalent results to the standard implementation (`CoRetweetsApproach`), despite passing synthetic correctness tests.

## Location

- Implementation: `src/approaches/coretweets_numpy.py`
- Standard version: `src/approaches/coretweets.py`
- Correctness tests: `examples/test_numpy_correctness.py`

## Problem

While the NumPy implementation passes all synthetic tests in `test_numpy_correctness.py`, it appears to produce different results on real datasets compared to the standard approach. This suggests:

1. The synthetic tests may not cover all edge cases present in real data
2. There may be subtle differences in how the vectorized operations handle certain scenarios
3. Potential issues with timestamp handling or floating-point precision

## Expected Behavior

`CoRetweetsNumpyApproach` should produce **identical results** to `CoRetweetsApproach` for all datasets and parameters, with the only difference being improved performance.

## Current Status

- ✅ Synthetic tests pass (10/10 test cases)
- ❌ Real dataset results differ from standard approach
- ⚠️ Performance optimization is effective but correctness is not guaranteed

## Impact

- Users should **not use** `coretweets_numpy` in production until this is resolved
- The standard `coretweets` approach remains reliable and should be used for all experiments
- This issue affects reproducibility and comparability of results

## Reproduction

```bash
# Run correctness tests (these pass)
python examples/test_numpy_correctness.py

# Benchmark on real data (results will differ)
python examples/benchmark_coretweets.py /path/to/dataset/Processed/

# Run both approaches on same dataset and compare
python bin/run_experiments.py --approaches coretweets coretweets_numpy --datasets Armenia
```

## Investigation Needed

1. **Identify discrepancies**: Run both approaches on real datasets and compare pair-by-pair results
2. **Debug vectorization logic**: Review lines 68-86 in `coretweets_numpy.py` for issues with:
   - Time difference calculations (relative vs absolute timestamps)
   - Boolean masking and early termination logic
   - Handling of edge cases (boundary times, same user pairs)
3. **Enhance test coverage**: Add tests with real data characteristics:
   - Timestamp precision edge cases
   - Large time gaps between retweets
   - Viral tweets with many retweets
   - Users retweeting same tweet multiple times

## Possible Fixes

1. Revert to non-vectorized approach for correctness
2. Fix the vectorization logic to handle all edge cases
3. Add comprehensive integration tests with real data
4. Consider alternative vectorization strategies (e.g., using pandas or polars)

## Workaround

Until fixed, use the standard `coretweets` approach:

```python
# DO use this
approach = ApproachFactory.create('coretweets', window_sec=60, min_coactions=1)

# DO NOT use this (yet)
# approach = ApproachFactory.create('coretweets_numpy', window_sec=60, min_coactions=1)
```

## Priority

**High** - This affects correctness of detection results and could lead to invalid research conclusions if used in production.

## Related Files

- `src/approaches/coretweets_numpy.py` - NumPy implementation
- `src/approaches/coretweets.py` - Standard implementation (reference)
- `examples/test_numpy_correctness.py` - Current test suite
- `examples/benchmark_coretweets.py` - Performance comparison tool

## Next Steps

1. [ ] Create integration test with real dataset
2. [ ] Compare output pair-by-pair between implementations
3. [ ] Identify specific scenarios where results differ
4. [ ] Fix vectorization logic or revert to standard approach
5. [ ] Verify fix with comprehensive test suite
6. [ ] Document any performance trade-offs


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CoRetweetsNumpyApproach produces incorrect results compared to CoRetweetsApproach #1

Description

Location

Problem

Expected Behavior

Current Status

Impact

Reproduction

Investigation Needed

Possible Fixes

Workaround

Priority

Related Files

Next Steps

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

CoRetweetsNumpyApproach produces incorrect results compared to CoRetweetsApproach #1

Description

Description

Location

Problem

Expected Behavior

Current Status

Impact

Reproduction

Investigation Needed

Possible Fixes

Workaround

Priority

Related Files

Next Steps

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions