Skip to content

Improve lua safety, improve lua path resolution, fix various bugs, and add automated tests#318

Open
jbms wants to merge 27 commits into
dawsers:masterfrom
jbms:catch-lua-errors
Open

Improve lua safety, improve lua path resolution, fix various bugs, and add automated tests#318
jbms wants to merge 27 commits into
dawsers:masterfrom
jbms:catch-lua-errors

Conversation

@jbms

@jbms jbms commented Jun 17, 2026

Copy link
Copy Markdown

No description provided.

If a Lua script registers a callback (e.g. for workspace focus or window
mapping) and that callback raises an error, the compositor previously
crashed with "unprotected lua error" because it used the unprotected
`lua_call` API.

This change introduces a `safe_pcall` helper that wraps `lua_pcall`.
It is now used for all callback executions. If an error occurs, it is
logged as an error, but the compositor continues running.
@jbms jbms force-pushed the catch-lua-errors branch from fe1acd7 to 413393e Compare June 17, 2026 23:55
@jbms

jbms commented Jun 17, 2026

Copy link
Copy Markdown
Author

I also added a nix flake and a basic python-based automated test suite to test the fix. However, I can exclude those changes from this PR if you prefer.

@dawsers

dawsers commented Jun 18, 2026

Copy link
Copy Markdown
Owner

First of all, thank you very much for taking the time to do all this. I will be happy to merge the Lua part of this PR. These are my comments:

Crashes

I understand this PR deals with crashes due to an invalid Lua stack while executing Lua code. I would appreciate if you could provide some example (as a comment in this thread), so I can see how frequent this is. Your python test uses error(), which of course works, but it would be nice to see a more common error you have encountered, so I can see how this happens in practice. The reason is, currently, the main source of crashes is invalid pointers. There is no check for the validity of raw structures like views, containers or workspaces. So a user can store a raw C pointer and then use it when it is not valid. For example:

local args, state = ...

local scroll = require("scroll")

local views_table = {}

local function print_views()
  for _, view in ipairs(views_table) do
    print(scroll.view_get_app_id(view))
  end
end

local function on_view_map(view, _)
  table.insert(views_table, view)
  print_views()
end

scroll.add_callback("view_map", on_view_map, nil)

If I close an application, the next time view_map is called, scroll will crash. This PR doesn't deal with this.

I haven't fixed this problem because even though it is easy to do, it would add quite some runtime overhead to the system, as I would need to verify the validity of pointers for each operation. I may do this in the future if it becomes a problem, but as it is now, scroll can be easily crashed using Lua code, even with this PR's changes.

External Dependencies and Maintenance

I thank you for the test suite and Nix flake, but I would rather not include things that I will not be able to maintain or add more external dependencies. Unfortunately, scroll is basically me. The project is not popular, and contributions like this one are rare. People come and go, and there is not a consistent structure of maintainers with assigned areas.

There used to be a NixOS flake at some point here or in hyprscroller, I don't remember right now, and at some point the maintainer moved on, so people were filing issue reports I couldn't deal with. Currently, the flake is maintained externally by @Diax170, who has been a "faithful" scroll user for quite some time, and participates a lot in bug reporting and feature requests. I would rather keep it that way, and suggest that if you have improvements to what he is doing, submit PRs to his repository. That way, we will have strong Nix support from people who use Nix, otherwise the responsibility lies on me, and I am an Arch user.

Again, thanks a lot for your help!

@jbms jbms force-pushed the catch-lua-errors branch 2 times, most recently from fe2647f to 35fb02f Compare June 18, 2026 21:55
Containers that exactly touched the right or bottom edge of the output
were being incorrectly considered invisible due to using `>=` instead of
`>` in the bounds check.

For example, on a 1280x720 screen, a single tiling window filling the
screen has geometry x=0, y=0, width=1280, height=720. Its bottom edge
is at y + height = 720, which is equal to maxy. The check:
`y + height >= maxy` evaluated to true, marking the container as
invisible.

This caused commands like `jump` (which filter for visible containers)
to fail to detect any windows when a single window was fullscreen or
tiling and filling the screen.

Reproduction:
1. Start scroll/sway with a single tiling window open.
2. Run the `jump` command.
3. Observe that jump mode is not entered (no labels appear, input is
   not captured).

Fix this by using strict inequality `>` for the upper bounds check, so
containers that touch the edge but do not exceed it are considered
visible.
@jbms jbms force-pushed the catch-lua-errors branch 2 times, most recently from dfeafc6 to 14559da Compare June 18, 2026 22:35
@jbms

jbms commented Jun 18, 2026

Copy link
Copy Markdown
Author

Thanks for your quick response to my PR. I previously used Niri but started using scroll very recently because of the flexibility provided by the base sway functionality and the lua scripting support that you added.

As a disclosure, I have used AI assistance to create these changes but I am a very experienced programmer and have thoroughly reviewed all of the changes myself.

Re crashes: The specific ones that I ran into were missing/incorrect function names and invalid format strings. Regarding use-after-free crashes for the container pointers exposed to Lua code, I have now added an additional commit that fixes that problem by providing node ids rather than pointers to the lua code, and adds a hash table to make the lookup efficient (this also makes lookup by id for commands more efficient).

Re tests: The tests only add python and pytest (available as python-pytest on arch) as a dependency --- using Python seemed like the most expedient choice, but I could also rework it to a different language or test framework if that would be preferable. I would say that I think for a project like this a test suite is particularly valuable, because there is a very stable API surface provided by the command interface, IPC interface, lua interface, plus normal wayland and X11 interface that can be used for testing in a way that can catch regressions without imposing much maintenance overhead. Given that there is a large amount of complicated behavior, and that you have to deal with merging in changes from sway and wlroots, I think having automated tests would be quite valuable. As an example, in the course of adding tests an additional bug with the visibility criteria used for jump was found, which I also fixed as a separate commit.

Of course I am happy to exclude them from this PR if that is what you prefer though.

Re nix: I am aware of the existing nix flake, in fact I am using it. It includes some extra logic to properly set up scroll as a wayland session, while I added just the subset needed to define the package and development environment. That makes it easy for nix users to build and test it. I removed it from this PR per your request. However, you might find it to be useful: nix terminology is kind of confusing but there is:

  • nix the language
  • nix the package manager and the large nixpkgs repository of package definitions defined using nix the language
  • nixos the linux distribution that uses nix the package manager.

Nix the package manager can be used as an additional package manager on any linux distribution, such as arch linux, and indeed that is how I have used it also. (The only caveat is that a few things, like GL drivers, require some special treatment when using nix packages on non-nixos, in particular for GL you need to add an additional nixGL wrapper). It can be used to create isolated dev environments, or to set up per-user environments independent of the system package manager. It can also be convenient for testing on github actions since then you don't have to worry about separately maintaining github actions scripts for setting up dependencies.

@jbms jbms force-pushed the catch-lua-errors branch 4 times, most recently from 8128de3 to d86148b Compare June 19, 2026 04:20
@jbms

jbms commented Jun 19, 2026

Copy link
Copy Markdown
Author

I added a number of additional bug fixes.

@dawsers

dawsers commented Jun 19, 2026

Copy link
Copy Markdown
Owner

I have nothing against well reviewed AI proposals, because I understand access to Goose, Agent Smith, Kythe or any other more up to date tools can increase productivity quickly, I just fear other people who only have access to Open Source models without support for a good integration of semi-large code bases due to limited context, will start spamming the repo with low quality PRs. I only had one AI generated PR before, but it was well reasoned (like this one), and I merged it.

I am trying to review everything, but please, don't make this a hotchpotch. I would prefer to have one PR for the Lua stuff, and leave all those bug fixes for later.

The issue with bug fixes is some are scroll problems, but most of the leaks are indirect leaks at exit (I also use libasan). This is inherited from sway, which doesn't do a proper cleanup at exit, but the leaks, as far I have investigated, are not adding memory overhead at runtime, just unfreed memory at exit. I usually try to change as few things as possible (from sway) if they are not really needed, to make merging easier. Leaks when the compositor is exiting are not a real problem for users. Of course, it is more elegant to remove them and have a completely clean exit -and it saves me time when looking at the libasan report-, but on the other hand, I don't really want to diverge the code base too much from sway because then merging their changes becomes much harder. There are two ways to go about this, one is to submit PRs to sway, so they also fix them and then I merge their changes, or, if the bug is really pressing because it affects runtime, and sway maintainers don't respond, we fix the bug here. If the project grows and we decide "we don't need sway any more", this may change, but I don't think the time has arrived yet, as code contributions like this one had never happened.

I am going to keep on reading all this, I just wanted to send you a quick heads up. Thank you again for taking the time to do all this!

@Diax170

Diax170 commented Jun 19, 2026

Copy link
Copy Markdown

@jbms I think you kind of messed up the branches lol

FYI you usually keep your changes on a separate branch (catch-lua-errors in this case) and keep master/main/whatever-the-default-branch-name-is unchanged. The default branch is fetched directly from the upstream and the changes are then merged into the other branch.

Let me know if you need me to clarify :)

This commit fixes a bug in layout move commands (`layout_move_container_nomode`
and `layout_move_container_to_workspace`) where they were calling
`container_reap_empty` on the parent container.

However, these parent containers were already being reaped/destroyed as part of
the normal tree rearrangement (e.g., when the last child is moved out of a
split container, or during simplification). The duplicate call resulted in trying
to destroy the same container twice.

When safety checks (node hash map) are enabled, this double reap triggers a
"Node not present in table" abort in `node_map_remove` because the node is
removed from the map during the first reap and is not present during the second.

Without the node hash map safety checks, the double reap does not cause a crash
or trigger ASan. The duplicate call to `container_begin_destroy` is safe because
it returns early if `con->node.destroying` is already true, and the duplicate
`node_set_dirty` also returns early. The container is only added to the
transaction once and thus freed once.

We removed the redundant `container_reap_empty` calls.

Reproduction instructions:
1. Enable safety checks (node hash map) in `node.c`.
2. Start scroll.
3. Open two windows (they will be tiled vertically by default).
4. Focus the second window.
5. Run the command: `move left nomode`.
6. The compositor will abort/crash with "Node not present in table" in `node_map_remove`.
@dawsers

dawsers commented Jun 19, 2026

Copy link
Copy Markdown
Owner

@jbms I think you kind of messed up the branches lol

FYI you usually keep your changes on a separate branch (catch-lua-errors in this case) and keep master/main/whatever-the-default-branch-name-is unchanged. The default branch is fetched directly from the upstream and the changes are then merged into the other branch.

Let me know if you need me to clarify :)

It doesn't matter what he has in his master branch, because the PR is pointing to the right branch. He may want to have his master contain all the changes, and then create different branches for several different PRs. My comment was suggesting there are too many things in a single PR, so that could be a way to split it.

@Diax170

Diax170 commented Jun 19, 2026

Copy link
Copy Markdown

Also, since it's related, I've just added the dev shell to my flake. If anyone wants to report problems regarding it, open a PR or email me I guess (issues and discussions are disabled for some reason)

@Diax170

Diax170 commented Jun 19, 2026

Copy link
Copy Markdown

It doesn't matter what he has in his master branch, because the PR is pointing to the right branch. He may want to have his master contain all the changes, and then create different branches for several different PRs. My comment was suggesting there are too many things in a single PR, so that could be a way to split it.

Yes, that's possible but I wanted to inform them just in case they actually got confused... if you know what I mean (sorry if I just created even more confusion >.<)

jbms added 3 commits June 19, 2026 14:55
This commit introduces a safe lookup mechanism for container/view IDs.
Instead of casting arbitrary integers to pointers (which could lead to
crashes if invalid or stale IDs are used), we now maintain a hash map
(node_table) mapping unique IDs to actual sway_node pointers.

The node map is initialized during server startup (`server_init`) and
finalized during shutdown in `main` after the root node has been destroyed.
All nodes are registered in the map upon creation (`node_init`) and
removed upon destruction (`node_fini`).

This enables:
- O(1) node lookup for `swap container with con_id`.
- O(1) node lookup for criteria matching by `con_id`.
This commit fixes a use-after-free and double-reap crash when sending a
tiled container to the scratchpad.
When the container is tiled, `root_scratchpad_add_container` calls
`container_set_floating`, which detaches the container from its tiled parent
and reaps the parent if it becomes empty.
However, `root_scratchpad_add_container` was then:
1. Accessing the `parent` pointer (which might have been freed during reap),
   causing a use-after-free.
2. Calling `container_reap_empty` on the parent again, causing a double-reap
   (abort in `node_map_remove`).
3. Calling `arrange_container(parent)` on the potentially freed parent.

We fix this by:
1. Removing the redundant `container_reap_empty(parent)` call, as the parent
   is already reaped in `container_set_floating` (if it was tiled) or it is
   NULL (if it was already floating).
2. Using the safe node lookup (`node_by_id`) to verify if the parent still
   exists before calling `arrange_container(parent)`. If it was destroyed,
   we fallback to arranging the workspace.

Reproduction instructions:
1. Enable safety checks (node hash map) in `node.c`.
2. Start sway.
3. Open two windows in a split container (e.g. open one window, split it, open a second window).
4. Focus the first window and run `move scratchpad`.
5. Focus the second window and run `move scratchpad` (this makes the parent split container empty, triggering the reap).
6. The compositor will crash due to UAF in `arrange_container`, or abort with "Node not present in table" if safety checks are enabled.
When a container is detached from its parent, we must clear it from the
parent's pending focused_inactive_child pointer if it was pointing to it.
Otherwise, the parent's pending focused_inactive_child remains dangling
and can lead to use-after-free when container_get_active_view is called.

The transaction model keeps current.focused_inactive_child safe because
it is rebuilt from seat focus stack during transaction commits. However,
pending.focused_inactive_child is maintained manually and was not being
cleared on detach.
@jbms jbms force-pushed the catch-lua-errors branch from d86148b to 4d7c941 Compare June 19, 2026 22:35
@jbms jbms changed the title Catch lua errors to avoid the compositor crashing Improve lua safety, improve lua path resolution, fix various bugs, and add automated tests Jun 19, 2026
@jbms

jbms commented Jun 19, 2026

Copy link
Copy Markdown
Author

So this still contains a bunch of changes, but I've reorganized them as follows:

The changes relevant to sway (node hash map plus a few bug fixes and the shutdown leak fixes) have been submitted as a separate PR to sway:
swaywm/sway#9190

This PR includes those exact commits as parents, in addition to the scroll-specific changes.

I have tried to organize this PR into a set of commits where each commit contains one logical change. Therefore it may be easiest to review the commits individually rather than the overall changes.

I am also happy to split this PR into multiple PRs however you like, exclude the tests, etc.

The main reason to keep many of the bug fixes together with the node map changes is that the node map change adds stricter validation for node destruction which then causes the compositor to abort more readily in various cases without the bug fixes.

As far as my master branch, yes I intentionally had that include all of the changes for my own use.

@jbms

jbms commented Jun 19, 2026

Copy link
Copy Markdown
Author

It would be nicer to put the test suite into sway itself as well, and then just add scroll-specific changes here, but I understand that sway has previously rejected adding automated tests.

jbms added 2 commits June 19, 2026 17:11
This commit fixes several memory leaks in swaybar and sway compositor on exit.

Swaybar fixes:
- Call `pango_cairo_font_map_set_default(NULL)`, `cairo_debug_reset_static_data()`, and `FcFini()` on exit to release Pango/Cairo/Fontconfig static caches.
- Free the DBus signal match slots and object vtable slots in watcher and host instead of leaving them floating, which pinned the bus and leaked memory.
- Free the async property call slots when they complete.
- Free the parsed JSON header in `i3bar_handle_readable` when it is invalid or after parsing it.

Compositor exit fixes:
- Free the pango font description in `free_config`.
- Call `cairo_debug_reset_static_data()` and `FcFini()` in `sway/main.c` on exit.
- Added fontconfig dependency to sway target.

Reproduction (swaybar):
1. Run `swaybar` under LSan/ASan.
2. Feed it status lines with invalid/unsupported i3bar protocol JSON headers, or exit swaybar.
3. LSan will report leaks of parsed JSON headers in `status_line.c` and systemd/dbus match slots in `tray/`. Pango and Cairo static caches also leak.

Reproduction (compositor):
1. Run sway under LSan.
2. Exit sway.
3. LSan will report leaks in pango (font description) and fontconfig (static data).
This commit resolves a use-after-free (UAF) crash that can occur during
compositor shutdown when there are still active client windows.

- Added check in `transaction_destroy` to verify if the node is still
  valid before accessing it, specifically handling cases where nodes
  are destroyed during shutdown before pending transactions are finished.
- Ensure that nodes with active transaction references (`node->ntxnrefs > 0`)
  are not immediately freed if they are accessed by pending transactions.
- Introduced `transaction_shut_down` to clean up pending/queued transactions
  and clear dirty flags on shutdown.
- Implemented recursive node destruction in `root_destroy` (`container_destroy_recursive`,
  `workspace_destroy_recursive`, `output_destroy_recursive`) to ensure all nodes
  are properly marked as destroying and their transaction references are cleared.
- Properly destroy input manager and seats on compositor teardown.

Reproduction:
1. Build sway with ASan enabled (`-Db_sanitize=address`).
2. Open multiple clients (e.g., two terminals).
3. Exit/kill the compositor.
4. ASan will report a use-after-free (UAF) error. This occurs because during shutdown, nodes (containers/workspaces) are destroyed while they still have pending transaction references (`node->ntxnrefs > 0`). The transaction destruction attempts to access these already-freed nodes.
Alternatively, run `pytest tests/test_leak_two_clients.py` which specifically reproduces this UAF on shutdown.
jbms added 19 commits June 19, 2026 17:11
This commit resolves memory leaks of workspaces, containers, and outputs
state lists during transaction application and destruction.

- Instead of deep copying, we transfer ownership of the lists from the
  transaction instructions to the node's current state during
  `transaction_apply` (setting the instruction's list pointers to NULL).
- Updated `transaction_destroy` to safely free the private lists
  associated with transaction instructions (which will only be non-NULL
  if the transaction was not applied).

Reproduction:
1. Build sway with ASan/LSan enabled (`-Db_sanitize=address`).
2. Run sway with a status bar (like swaybar) and some clients.
3. Perform actions that trigger transactions (e.g., open and close windows, change focus).
4. Exit sway.
5. LSan will report memory leaks of `list_t` structures (and their item arrays) allocated in `copy_list` or transaction state initialization, specifically for `workspaces` (in `apply_output_state`), `floating`/`tiling` (in `apply_workspace_state`), and `children` (in `apply_container_state`).
Alternatively, run the integration test `pytest tests/test_leak_two_clients.py` under ASan/LSan.
This merges the clean branch containing the following generic sway improvements and bug fixes:
1. tree: introduce node hash map for O(1) lookups and safety
2. Fix scratchpad UAF and double reap
3. Fix use-after-free of focused_inactive_child in container
4. Fix memory leaks and use-after-free at exit (generic)
Passing raw C pointers (light userdata) to Lua scripts is unsafe because
if the underlying C object (workspace, container, view) is destroyed, the
Lua script will hold a dangling pointer. Using this pointer in subsequent
API calls will crash the compositor.

We fix this by using node IDs (size_t) to reference nodes in Lua, and
looking them up safely using node_by_id().

Reproduction:
1. Register a Lua callback that receives a container pointer.
2. Store this pointer in a global Lua variable.
3. Destroy the container.
4. Call a Lua API function passing the stored pointer.
5. The compositor will crash due to invalid pointer dereference.
Alternatively, run the integration test 'pytest tests/test_lua_safety.py'.
This introduces the pytest-based test harness infrastructure, including
conftest.py and IPC helpers, along with the C-based Wayland and X11 test
clients used to simulate windows in integration tests.
Adds Python integration tests to verify Lua APIs (geometry, outputs,
workspaces), safe pointer lookup (use-after-free prevention), and client
management (mapping/unmapping clients).
- Use wordexp to expand the script path in the lua command, allowing tilde and environment variable expansion.
- Enforce that the path must expand to exactly one file, returning specific errors otherwise.
- Resolve relative paths against the directory of the loaded configuration file during config loading (and use initial working directory otherwise).
- Update documentation in scroll.5.scd to mention wordexp and relative path support.
- Mock HOME in tests to test tilde expansion safely.
This commit introduces a new Lua API function, `scroll.node_get_type`,
which allows Lua scripts to retrieve the type of a given node ID.
The function maps the node ID to a `sway_node` and returns its type
as a string (e.g., "root", "output", "workspace", "container", etc.).
If the node ID is invalid or not found, it returns `nil`.

This commit also updates:
- `scroll.lua` with LSP annotations for the new function.
- `sway/scroll.5.scd` man page with documentation for the new function.
This commit adds unit tests for the newly introduced `scroll.node_get_type`
Lua API function. It verifies that the function correctly returns the type
for workspace and output nodes, and returns `nil` for invalid node IDs.
This commit adds a test that reproduces a use-after-free and double-reap
crash when sending a tiled container to the scratchpad when it is the only
child of its parent container.
This commit addresses potential use-after-free (UAF) risks in layout and space code:
1. `extract_view` in `sway/tree/layout.c` is refactored to use `container_detach`
   instead of manual list manipulation. This ensures that parent/workspace pointers
   are properly cleared on detach, avoiding dangling pointers.
2. `find_and_detach_container` in `sway/tree/space.c` is also refactored to use
   `container_detach` for tiling containers.
3. `layout_space_container_restore_tiling` and `layout_space_container_restore_floating`
   in `sway/tree/space.c` are updated to set the container's parent/workspace
   pointers BEFORE calling `arrange_container`, ensuring that `arrange_container`
   does not access old, potentially freed parent/workspace pointers.

Reproduction:
1. Run scroll with space/layout features.
2. Trigger layout changes that use 'extract_view' (e.g. moving containers).
3. If parent pointers are not cleared, subsequent container arrangements can trigger UAF crashes.
Alternatively, run the integration test 'pytest tests/test_space_crash.py'.
This commit adds:
1. `tests/test_space_crash.py` which verifies that space restore does not crash
   due to UAF during `arrange_container` if the old workspace was destroyed.
2. `tests/test_move_cleanup_crash.py` which verifies that move container cleanup
   properly handles workspace destruction and does not crash when arranging
   the old workspace.
This commit adds `tests/test_workspace_split_uaf.py` which reproduces
a Use-After-Free (UAF) crash. The crash occurs when a workspace is split,
one of the split workspaces is destroyed (because it is empty and inactive
during output evacuation), and then the remaining sibling workspace is
rearranged (e.g. when it also becomes empty or during workspace switch).
The remaining workspace still has a dangling pointer to the destroyed sibling,
leading to UAF in `arrange_output` when it tries to check sibling's fullscreen state.
When a split workspace is destroyed, its sibling workspace's split state was not being properly cleaned up. Specifically, the sibling's split.split was remaining set to WORKSPACE_SPLIT_HORIZONTAL or WORKSPACE_SPLIT_VERTICAL, and it still pointed to the destroyed workspace as its sibling.

This led to use-after-free when the sibling workspace was later accessed or rearranged, as it tried to access the destroyed sibling workspace.

We fix this by resetting the sibling's split state to WORKSPACE_SPLIT_NONE, clearing the sibling pointer, clearing the output box, marking the sibling node as dirty, and arranging the sibling workspace so it can claim the full output area.

Reproduction:
1. Split a workspace into two sibling workspaces.
2. Close all windows on one of the sibling workspaces so it gets destroyed.
3. Interact with the remaining sibling workspace (e.g. move focus or windows).
4. The compositor will crash with a use-after-free (UAF) when attempting to access the destroyed sibling workspace.
Alternatively, run the integration test 'pytest tests/test_workspace_split_uaf.py'.
Calling transaction_commit_dirty() directly inside workspace_switch_callback_end()
can cause use-after-free (UAF) crashes if nodes (like workspaces) are destroyed
during the animation cancel/cleanup process, and their destruction is processed
while we are still in the middle of handling the workspace switch command.

We fix this by deferring the transaction commit to the Wayland event loop idle handler.

Reproduction:
1. Enable workspace switch animations.
2. Switch from Workspace 1 (non-empty) to Workspace 2.
3. While the animation is running, kill the client window on Workspace 1 (so Workspace 1 becomes empty and is marked for destruction).
4. Switch back to Workspace 1 (cancelling the animation).
5. The compositor will crash with a use-after-free (UAF) in transaction execution.
Alternatively, run the integration test 'pytest tests/test_workspace_animation_uaf.py'.
Factor out common test helpers into conftest.py fixtures

- Add wayland_client and wait_for_client_map as pytest fixtures in tests/conftest.py.
- Remove duplicate definitions of these helpers from all 10 integration test files.
- Update test functions to accept these fixtures as arguments with proper type annotations.
- Add tests/test_space_aba.py to test for potential ABA issues during
  space restore when a view is destroyed and a new one is created with
  the same address before loading the space.
- Add tests/test_workspace_animation_uaf.py to test for potential
  use-after-free when a workspace is switched and the old workspace is
  destroyed during the animation.
- Modify struct sway_space_view to store container_id (size_t)
  instead of struct sway_view *view.
- This avoids potential use-after-free and ABA problems when a view is
  destroyed and a new view is allocated at the same address before the
  space is restored.
- Update space_save to record container->node.id.
- Update space_load and helper functions to look up containers by ID
  using node_by_id(). If the container (and thus the view) was
  destroyed, it will safely return NULL and skip restoring it.

Reproduction:
1. Run scroll.
2. Open a window and save the workspace/space layout.
3. Close the window.
4. Open a new window that happens to be allocated at the same memory address as the closed window.
5. Restore the space.
6. The compositor will attempt to move the new window into the restored space layout, incorrectly identifying it as the old window (ABA problem), or crash with UAF if the address is not reallocated.
Alternatively, run the integration test 'pytest tests/test_space_aba.py'.
This commit contains the scroll-specific parts of the memory leak fixes at exit (spaces cleanup).

Reproduction:
1. Run scroll.
2. Create one or more spaces (e.g. using scroll Lua API 'space_create').
3. Exit scroll.
4. LSan will report memory leaks of 'struct sway_space' objects and the 'root->spaces' list.
This commit adds a test 'tests/test_leak_two_clients.py' that runs the
compositor, spawns two dummy wayland clients, and terminates the compositor
while clients are still active, verifying that the compositor exits cleanly
(exit code 0) under ASan/LSan.
@jbms jbms force-pushed the catch-lua-errors branch from 4d7c941 to a1dc9c1 Compare June 21, 2026 04:50
@dawsers

dawsers commented Jun 21, 2026

Copy link
Copy Markdown
Owner

Sorry, I had to merge something I had been working on for the last week. Now I will have more time to look at this PR. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants