Remove interrupt tasks entirely#1222
Conversation
We no longer need this. The potential issue was that we'd call some R code while on the LSP thread, and behind our back while on that LSP thread R would run finalizers that were only meant to be run on the main R thread. This can no longer happen because `r_task()`s are always run on the main R thread. There is no longer any R code that runs on the LSP thread. We could run `R_RunPendingFinalizers()` ourselves more aggressively after each top level command finishes, but RStudio does not do this, so I don't currently see a need for us to either.
We now only run interrupt tasks at idle time, so in a follow up commit we can remove interrupt tasks as a concept. We still run activity handlers regularly, particularly important for the later R package We still call `R_ProcessEvents()` somewhat regularly while idling, mostly out of good faith to whatever R's default methods do, but this is also where our `self.debug_filter` draining occurs now - and this is newly now also being called on Windows, so that's good. I think if we do use `R_ProcessEvents()` for more going forward, it should only be for "non critical" things that can happen at any arbitrary time.
So LSP didChange notifications can invalidate breakpoints while paused in the debugger
Just to be extra safe
We have some indirect tests that get at this too, but a direct test feels quite useful and explanatory
| r_task::spawn(RTask::interrupt({ | ||
| // | ||
| // Use `idle_any_prompt` so DAP breakpoint invalidation still fires when | ||
| // the LSP signals a document change while R is paused at `browser()`. | ||
| r_task::spawn(RTask::idle_any_prompt({ | ||
| let dap_clone = console.debug_dap.clone(); | ||
| async move || { | ||
| async move |_| { | ||
| Console::process_console_notifications(console_notification_rx, dap_clone).await |
There was a problem hiding this comment.
Added a test for this as well in dap_breakpoints.rs
| // For Sync tasks (i.e. only `r_task()`s), we want to log excessive waiting, | ||
| // because we are blocking the calling thread | ||
| if matches!(task, QueuedRTask::Sync(_)) { |
There was a problem hiding this comment.
We used to log "excessive waiting" for interrupt tasks
We don't have interrupt tasks anymore, but we do have Sync tasks, which are only used by r_task().
I thought it was appropriate to change this to log excessive waiting for r_task()s, since those are blocking, and since they previously were interrupt tasks. So it felt like the spirit was the same.
| @@ -2475,33 +2503,6 @@ impl Console { | |||
| if let Some(text) = self.debug_filter.check_timeout() { | |||
| self.emit_stdout(text); | |||
| } | |||
There was a problem hiding this comment.
I have kept the debug_filter check as part of an R_ProcessEvents() hook, which is now also being run on Windows.
This is now the ONLY THING special that we do at interrupt time. And it doesn't talk to the R API in any way, so should be very safe.
It's important we run this regularly-ish because the whole point is to not hold onto user output that might (for whatever reason) look like debug: output too long during long running computation
We talked about moving this to another thread, which would let us remove interrupt time hooks altogether. I still think that idea is kind of nice in theory, but I wasn't quite sure how that thread would work.
- Does it just busy loop with a 25ms sleep between checks?
- Also, it would force
DebugFilterStateto be wrapped in aMutexso it could be shared betweenConsoleand this extra thread, and that felt gross
I think we should keep this as is for this PR and if we still feel like we want to move it to a separate thread, we should do a follow up PR and aim for the simplest strategy possible (or, again, explore removing this check entirely)
There was a problem hiding this comment.
The main thing that gives me pause about this thread is using up 2mb stack space just for that. If a tokio task, that becomes much nicer, with the huge caveat that now Console depends on tokio :P
| // Run pending finalizers. We need to do this eagerly as otherwise finalizers | ||
| // might end up being executed on the LSP thread. | ||
| // https://github.com/rstudio/positron/issues/431 | ||
| unsafe { R_RunPendingFinalizers() }; |
There was a problem hiding this comment.
No longer needed since we don't run R code off the main R thread
| // NOTE: This is a legit use of interrupt-time task. No R access here, and | ||
| // we need to go through Console since it owns the UI comm. | ||
| r_task(|| { | ||
| if let Some(ui) = Console::get().ui_comm() { | ||
| ui.show_message(String::from(user_message)); | ||
| } | ||
| let request = client.send_request::<request::ShowMessageRequest>(ShowMessageRequestParams { | ||
| typ: MessageType::ERROR, | ||
| message: String::from(user_message), | ||
| actions: None, | ||
| }); | ||
| match tokio::time::timeout(Duration::from_secs(5), request).await { | ||
| Ok(result) => { | ||
| result.log_err(); | ||
| }, | ||
| Err(_) => { | ||
| log::warn!("Timed out waiting for frontend to acknowledge LSP crash notification"); | ||
| }, | ||
| } |
There was a problem hiding this comment.
report_crash() moves off needing to use the ui comm!
| /// For rare performance sensitive cases where you'd like to avoid the cost of | ||
| /// poking R options via [Self::pin_with_capture()] and you know you don't need | ||
| /// the safety of capturing output because you aren't running R code | ||
| pub(crate) fn send_idle_without_capture<F, Fut>(fun: F) -> Self |
There was a problem hiding this comment.
I did end up wanting this for Drop in RThreadSafe
There was a problem hiding this comment.
No longer relevant since R_PolledEvents is a no-op if no one sets it, and we don't set this anymore
| // TODO: These days this just means `RLocalInterruptsSuspended`. | ||
| // Should we simplify the name from `r_sandbox()` to `r_interrupts_suspended()`? | ||
| let _scope = crate::raii::RLocalSandbox::new(); |
There was a problem hiding this comment.
I was curious about your thoughts on this, since r_sandbox() does less now
There was a problem hiding this comment.
I think I'd rather keep the name distinct from implementation details?
There was a problem hiding this comment.
Looks like a step in the right direction!
The main thing I'd like us to be careful about is to not break native infrastructure that could potentially be used outside Positron:
- Native graphics devices: GraphApp-based on Windows, Cocoa/Quartz on macOS, X11 on Linux.
- Less important but comes up in practice: tcl/tk GUIs. R prompts user for a CRAN repo with a tcl/tk menu.
For some reason the tcltk package refuses to even load on my machine (R freezes) so I couldn't verify that it still works. But I could verify the Quartz graphics device no longer works, even though it does work on main. That's because Ark now sets ptr_R_ProcessEvents, see inline comment.
I did a deep dive in R's messy ecosystem of events loop (see below). I think we could simplify things and keep everything working with the following setup:
Windows:
pub fn pump_idle_events() {
// R_ProcessEvents on Windows contains peekevent/doevent (GraphApp pump),
// time limits, UserBreak, ptr_ProcessEvents (our hook), and R_Tcl_do.
// No R_runHandlers / input-handler concept on Windows.
unsafe { R_ProcessEvents() };
}
pub fn setup_r(args: &Vec<String>) {
(*params).CallBack = Some(r_flush_debug_filter);
}Unix:
pub fn pump_idle_events() {
// ptr_R_ProcessEvents (Quartz), R_PolledEvents (debug_filter + tcltk),
// R_CheckTimeLimits.
unsafe { R_ProcessEvents() };
// Drain ready input handlers (Quartz pipe, X11, later, help server).
let mut fdset = unsafe { R_checkActivity(0, 1) };
while !fdset.is_null() {
unsafe { R_runHandlers(libr::get(R_InputHandlers), fdset) };
fdset = unsafe { R_checkActivity(0, 1) };
}
// Match R's sys-std.c:1136 idiom: one final call with NULL fires
// Rg_PolledEvents (Cairo X11 buffered display flush) and R_PolledEvents
// (re-fires debug_filter + tcltk; both idempotent).
unsafe { R_runHandlers(libr::get(R_InputHandlers), std::ptr::null_mut()) };
}
pub fn setup_r(args: &Vec<String>) {
libr::set(R_PolledEvents, Some(r_flush_debug_filter));
libr::set(R_wait_usec, 10000);
// ptr_R_ProcessEvents intentionally not set as it's reserved for Quartz.
}The other thing I'm wondering for a follow-up PR is if we should wait on the activity hanlder fds directly, on Unix. R_InputHandlers is a public linked list so it should be possible, I think Kevin had a comment about this (might have been deleted since): https://github.com/r-devel/r-svn/blob/714af4bf22bebd387437e31c397167996b4d5fbc/src/include/R_ext/eventloop.h#L93
The advantage would be that we'd run {later} callbacks and R's help server requests as soon as the events come in, without any timeout delays.
Notes from deep dive:
Interrupt-time: R_CheckUserInterrupt() calls R_ProcessEvents()
Unix R_ProcessEvents()
Ptr_R_ProcessEvents()R_PolledEvents()R_CheckTimeLimits()
Windows R_ProcessEvents()
- GraphApp events
R_CheckTimeLimits()- Interrupt check
Ptr_R_ProcessEvents()R_Tcl_do()
R_runHandlers (Unix only)
- Run activity/input handlers registered by packages (e.g. grDevices on mac, later on linux and mac)
- Runs only when R is idle, never when busy. Called by ReadConsole's Unix implementation.
Ptr_R_ProcessEvents
- macOS: Set by Cocoa, only if pointer still
NULL. We can't set this without breaking Cocoa GUI. - Windows: Mapped to
Rp->Callback
R_PolledEvents (Unix only): Called from ProcessEvents()
-> Global pointer, only set by tcl/tk in stack-like fashion (calls old pointer)
Rg_PolledEvents: Separate hook for X11, called alongside R_PolledEvents when R_runHandlers is passed a NULL readMask (when R_checkActivity() returns NULL).
tcl/tk (Unix):
- Interrupt-time: Sets
R_PolledEventson Unix (in a stack-like pattern, calling the old handler if any is set),R_Tcl_doon Windows - Idle: Does not register an activity input handler
- Bidirectional: Sets a TCL event loop handler to call
R_runHandlerswhen TCL is itself busy (e.g. running a modal dialog).
X11 (Unix): Runs on the R thread
- Interrupt-time: nothing. X11 does not register
Ptr_R_ProcessEvents, so X11
windows freeze during R computation. (Cairo-buffered X11 devices additionally
setRg_PolledEvents = CairoHandlerfor periodic buffer flushes, but
Rg_PolledEventsonly fires from the legacyR_runHandlers(handlers, NULL)
branch, which ark never invokes.) - Idle: Registers an activity input handler on the X11 connection fd
(addInputHandler(R_InputHandlers, ConnectionNumber(display), R_ProcessX11Events, XActivity)). The X
server pushes events to the connection fd, select() sees the fd ready,
R_runHandlersdispatches toR_ProcessX11Events.
Quartz/Cocoa: Runs on the R thread
- Interrupt-time: Registers
Ptr_R_ProcessEventsif NULL - Idle-time: Registers an activity input handler for idle time events. Tickled by a background thread that sets a byte on the write end every 100ms.
- If
Ptr_R_ProcessEventsis NULL, also initialises Cocoa GUI in the current process, which sets it as a full-blown foreground process visible in the Dock.
HTML help server (Unix): Server runs on the R thread
- Interrupt-time: Nothing, Help server is unresponsive when R is busy
- Idle-time: Registers an activity input handler on the listening TCP socket
(srv_input_handler).
When a connection arrives, that handler calls
accept()and registers another input handler on the per-connection
socket (worker_input_handler) to serve the request. So each in-flight
request has its own input handler.
HTML help server (Windows): Server runs in a separate thread.
later (Unix):
- Interrupt-time: Nothing
- Idle-time: Registers an activity input handler (that checks call stack is empty in case of debugger read-prompt)
later (Windows):
- Spawns a message-only window that checks every 10ms
|
|
||
| // Async messages for the Console. Processed at interrupt time. | ||
| // Async messages for the Console. Polled at any idle prompt (top-level or | ||
| // browser) so DAP doc-change handling continues during debug sessions. |
| let polled_events_rx = crossbeam::channel::tick(Duration::from_millis(50)); | ||
| let activity_handlers_rx = crossbeam::channel::tick(Duration::from_millis(50)); | ||
|
|
||
| // Run `R_ProcessEvents()` regularly out of good faith. We don't think this |
There was a problem hiding this comment.
Here is what it does:
- Checks for interrupts on Windows (not needed at idle time)
- Checks time limits (not really supported by Ark)
- Run events callbacks: Cocoa/Quartz (graphics device), Windows GraphApp-based events (graphics device), TCL/TK (CRAN repo selection)
Native events can potentially be used in other contexts than Positron: CRAN repo selection, plot GUI, etc. To try this, set options(device = "quartz") in a Jupyter app and create plots. Note you'll need to try it outside of this branch because Quartz has been broken by setting ptr_R_ProcessEvents, see other comment.
Since it's involved in GUI processing, we should keep polling at idle time every 50ms.
| @@ -2475,33 +2503,6 @@ impl Console { | |||
| if let Some(text) = self.debug_filter.check_timeout() { | |||
| self.emit_stdout(text); | |||
| } | |||
There was a problem hiding this comment.
The main thing that gives me pause about this thread is using up 2mb stack space just for that. If a tokio task, that becomes much nicer, with the huge caveat that now Console depends on tokio :P
| // For Sync tasks (i.e. only `r_task()`s), we want to log excessive waiting, | ||
| // because we are blocking the calling thread | ||
| if matches!(task, QueuedRTask::Sync(_)) { |
| libr::set(ptr_R_ShowMessage, Some(r_show_message)); | ||
| libr::set(ptr_R_Busy, Some(r_busy)); | ||
| libr::set(ptr_R_Suicide, Some(r_suicide)); | ||
| libr::set(ptr_R_ProcessEvents, Some(r_process_events)); |
There was a problem hiding this comment.
This breaks the quartz device on macOS because it checks that this hook is still NULL before initialising: https://github.com/r-devel/r-svn/blob/714af4bf22bebd387437e31c397167996b4d5fbc/src/library/grDevices/src/qdCocoa.m#L712-L713
options(device = "quartz")
plot(1)While none of this is used in Positron, I think we should avoid unnecessary regressions on this front to make it easier to integrate Ark in other contexts, e.g. jupyter-based REPLs.
| if let Err(err) = r_sandbox(|| Console::get_mut().polled_events()) { | ||
| panic!("Unexpected longjump while polling events: {err:?}"); | ||
| pub unsafe extern "C-unwind" fn r_process_events() { | ||
| if let Err(err) = r_sandbox(|| Console::get_mut().process_events()) { |
There was a problem hiding this comment.
Should we remove the sandbox here, and document prominently that no R code should ever run in process_events()?
Or we keep it but mention here that is purely defensive against future changes.
| // TODO: These days this just means `RLocalInterruptsSuspended`. | ||
| // Should we simplify the name from `r_sandbox()` to `r_interrupts_suspended()`? | ||
| let _scope = crate::raii::RLocalSandbox::new(); |
There was a problem hiding this comment.
I think I'd rather keep the name distinct from implementation details?
This seems a bit odd, because the location that `R_wait_usec` is defined at makes me think it would only be used by `R_PolledEvents()`, which we no longer set, but in reality it is also used in `Rsleep()` as the interval to check interrupts on
Closes #1187
In #1187 we realized that our "interrupt time" tasks have never been running on Windows, and...no one has complained.
Practically, this means that LSP features like completions or hover have not been running on Windows when R is busy, i.e. when R is running some long running computation, even if that code is checking for interrupts periodically via
R_CheckUserInterrupt(). Unix machines, on the other hand, do return completions or hover results at interrupt time, but it is fairly unreliable and highly subject to the R code being run, i.e. this video shows how when you run a long runninglm(), you can basically hang the LSP on a Mac because it almost never callsR_CheckUserInterrupt():Screen.Recording.2026-05-15.at.12.21.07.PM.mov
We've long through that doing extensive work at R interrupt time is rather dangerous (you could accidentally load a namespace mid-R execution, which would be wild), so we've wanted to remove these. And with oak, we are going to rely on interrupt time less and less anyways, because more things can be done statically without talking to the main R session.
So this PR removes interrupt tasks entirely. I'll leave more notes inline.