Skip to content

scst_user: Fix infinite cleanup loop caused by stale SGV pool reference#378

Open
smileusd wants to merge 1 commit into
SCST-project:masterfrom
smileusd:upstream_scst_D_state_issue
Open

scst_user: Fix infinite cleanup loop caused by stale SGV pool reference#378
smileusd wants to merge 1 commit into
SCST-project:masterfrom
smileusd:upstream_scst_D_state_issue

Conversation

@smileusd
Copy link
Copy Markdown

@smileusd smileusd commented Jun 5, 2026

fix the issue #279

The full analize doc at https://smileusd.github.io/2026/06/01/2026-06-01-scst-infinite-loop-deadlock-analysis/

The device cleanup loop in dev_user_process_cleanup() spins at ~2 million iterations per second and never exits, ultimately triggering a kernel soft lockup. The previous workaround panicked the system after 10,000 iterations.

Root cause (confirmed by instrumentation):

A ucmd gets permanently stuck in ucmd_hash with:
state = UCMD_STATE_ON_FREE_SKIPPED (7)
cmd = NULL
ref = 1
sent_to_user = 0

The stuck ref=1 is the reference taken by dev_user_alloc_pages() via
ucmd_get() for the first scatter-gather page. It is released only by
dev_user_free_sg_entries() → ucmd_put(), which fires when the SGV pool
evicts a cached object. The sequence that prevents this eviction:

  1. dev_user_unjam_dev() finds an EXECING command (sent_to_user=1, ref=2: alloc + alloc_pages), bumps ref to 3 via ucmd_get_check(), then calls dev_user_unjam_cmd().

  2. dev_user_unjam_cmd() releases cmd_list_lock and calls scst_cmd_done(SCST_CONTEXT_THREAD), which synchronously runs the full SCST completion pipeline:

    dev_user_on_free_cmd()
    ucmd->cmd = NULL
    ucmd->state = UCMD_STATE_ON_FREE_SKIPPED (type == IGNORE)
    dev_user_process_reply_on_free()
    dev_user_free_sgv()
    sgv_pool_free(ucmd->sgv) ← returns SGV to pool LRU cache;
    dev_user_free_sg_entries() NOT called;
    alloc_pages ucmd_get() NOT balanced
    ucmd->sgv = NULL
    ucmd_put() ← ref: 3→2

  3. Back in dev_user_unjam_dev(): ucmd_put() ← ref: 2→1. ref != 0, so dev_user_free_ucmd() / cmd_remove_hash() are NOT called. ucmd remains in ucmd_hash.

  4. unjam_cmd also reset sent_to_user=0, so on every subsequent pass through dev_user_unjam_dev() the ucmd is counted (res++) but skipped (!sent_to_user → continue). dev_user_get_next_cmd() returns -EAGAIN (ucmd is not in ready_cmd_list). With cleanup_done=1 the while(1) loop has no exit condition.

The sgv_pool_flush() calls at the TOP of dev_user_unjam_dev() run
BEFORE any commands are unjammed. SGV objects cached during unjamming
are therefore never flushed; dev_user_free_sg_entries() never fires.

Fix:

Add sgv_pool_flush() for both pools at the BOTTOM of dev_user_unjam_dev(),
after the unjam loop exits and after the spinlock is released. This evicts
all SGV objects cached during unjamming, triggering:
dev_user_free_sg_entries() → ucmd_put() → dev_user_free_ucmd()
→ cmd_remove_hash()
removing the stuck ucmd from the hash. On the next cleanup-loop iteration
dev_user_unjam_dev() returns res=0 and dev_user_process_cleanup() breaks.

Also replace the panic() workaround with schedule() so the CPU yields
to let SGV free callbacks complete between iterations.

The device cleanup loop in dev_user_process_cleanup() spins at ~2 million
iterations per second and never exits, ultimately triggering a kernel soft
lockup. The previous workaround panicked the system after 10,000
iterations.

Root cause (confirmed by instrumentation):

  A ucmd gets permanently stuck in ucmd_hash with:
    state  = UCMD_STATE_ON_FREE_SKIPPED (7)
    cmd    = NULL
    ref    = 1
    sent_to_user = 0

  The stuck ref=1 is the reference taken by dev_user_alloc_pages() via
  ucmd_get() for the first scatter-gather page. It is released only by
  dev_user_free_sg_entries() → ucmd_put(), which fires when the SGV pool
  *evicts* a cached object. The sequence that prevents this eviction:

  1. dev_user_unjam_dev() finds an EXECING command (sent_to_user=1,
     ref=2: alloc + alloc_pages), bumps ref to 3 via ucmd_get_check(),
     then calls dev_user_unjam_cmd().

  2. dev_user_unjam_cmd() releases cmd_list_lock and calls
     scst_cmd_done(SCST_CONTEXT_THREAD), which synchronously runs the
     full SCST completion pipeline:

       dev_user_on_free_cmd()
         ucmd->cmd = NULL
         ucmd->state = UCMD_STATE_ON_FREE_SKIPPED  (type == IGNORE)
         dev_user_process_reply_on_free()
           dev_user_free_sgv()
             sgv_pool_free(ucmd->sgv)  ← returns SGV to pool LRU cache;
                                          dev_user_free_sg_entries() NOT called;
                                          alloc_pages ucmd_get() NOT balanced
             ucmd->sgv = NULL
           ucmd_put()  ← ref: 3→2

  3. Back in dev_user_unjam_dev(): ucmd_put() ← ref: 2→1.
     ref != 0, so dev_user_free_ucmd() / cmd_remove_hash() are NOT called.
     ucmd remains in ucmd_hash.

  4. unjam_cmd also reset sent_to_user=0, so on every subsequent pass
     through dev_user_unjam_dev() the ucmd is counted (res++) but skipped
     (!sent_to_user → continue). dev_user_get_next_cmd() returns -EAGAIN
     (ucmd is not in ready_cmd_list). With cleanup_done=1 the while(1)
     loop has no exit condition.

  The sgv_pool_flush() calls at the TOP of dev_user_unjam_dev() run
  BEFORE any commands are unjammed. SGV objects cached during unjamming
  are therefore never flushed; dev_user_free_sg_entries() never fires.

Fix:

  Add sgv_pool_flush() for both pools at the BOTTOM of dev_user_unjam_dev(),
  after the unjam loop exits and after the spinlock is released. This evicts
  all SGV objects cached during unjamming, triggering:
    dev_user_free_sg_entries() → ucmd_put() → dev_user_free_ucmd()
      → cmd_remove_hash()
  removing the stuck ucmd from the hash. On the next cleanup-loop iteration
  dev_user_unjam_dev() returns res=0 and dev_user_process_cleanup() breaks.

  sgv_pool_flush() is fully synchronous (calls sgv_dtor_and_free() inline);
  by the time it returns the callbacks have already fired and the ucmd has
  already been removed from the hash. No schedule() or sleep is needed.
@smileusd smileusd force-pushed the upstream_scst_D_state_issue branch from 1b7c0d1 to 17a53bf Compare June 5, 2026 10:05
spin_unlock_irq(&dev->udev_cmd_threads.cmd_list_lock);

/*
* Post-unjam SGV pool flush — fixes the infinite-loop root cause.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please reduce the in-code comment significantly. The full lifecycle explanation is useful in the commit message, but the code comment should only describe why a second flush is needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants