Skip to content

Bug in run_cmd_with_timeout means you always get an error about ls -d timing out #225

@Timmmm

Description

@Timmmm

If the SCRATCH_ROOT env var is set you hit this code:

if scratch_root is None:
arg_scratch_root = default_scratch_root
else:
# Scratch space could be mounted in a filesystem (such as NFS) on a network drive.
# If the network is down, it could cause the access access check to hang. So run a
# simple ls command with a timeout to prevent the hang.
(out, status) = run_cmd_with_timeout(
cmd="ls -d " + scratch_root,
timeout=1,
exit_on_failure=0,
)

That calls this function:

def run_cmd_with_timeout(
cmd: str,
*,
timeout: float | None = None,
exit_on_failure: bool = True,
) -> tuple[str, int]:
"""Run a command with a specified timeout.
If the command does not finish before the timeout, then it returns -1. Else
it returns the command output. If the command fails, it throws an exception
and returns the stderr.
"""

(With the wrong type for exit_on_failure btw, but who needs correct types?)

The problem is that function will basically always say that the command has timed out. The bug is due to the classic truthiness footgun that Python shares with Javascript. Unlike in Typescript where you can (with considerable effort) set up ESLint to ban truthiness bugs, I'm not aware of a way to do that in Python so you just have to be vigilant.

Here is the bug (and the other poll further down):

if p.poll():
break

It needs to be if p.poll() is not None:.

poll() returns the exit code or None if the process has not exited. If the command succeeds then p.poll() returns 0, and if 0: will never succeed so it thinks the command times out.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions