Skip to content

Misc fixes found by Claude Code AI#20

Merged
openwrt-bot merged 27 commits into
openwrt:masterfrom
hauke:fix-ai-bugs
May 23, 2026
Merged

Misc fixes found by Claude Code AI#20
openwrt-bot merged 27 commits into
openwrt:masterfrom
hauke:fix-ai-bugs

Conversation

@hauke

@hauke hauke commented Apr 13, 2026

Copy link
Copy Markdown
Member

This PR fixes multiple bugs in the code. I let Claude code do a review of ubus and it found and fixed these.
I manually checked all of them. I think none of them is really critical.

Output of the AI:

Security and Bug Review — Findings and Fixes
Fixed (11 bugs across 7 files)
CRITICAL

# Severity File Line Bug Fix
1 CRITICAL ubusd_proto.c 585–601 Client ID tree corruption — ubusd_proto_init_retmsg failure skipped ubus_free_id, leaving freed cl in the ID tree (use-after-free); ubusd_acl_free_client also never called on error paths, leaking user/group/extra_gid. Restructured error labels: goto delete frees retmsg + blob buf + ID; goto acl_free frees ACL strings.
2 HIGH libubus-acl.c 107–124 blob_memdup(msg) can return NULL (OOM). Old ACL tree was already cleared, but acl_add() then stored pointers into the transient msg buffer — dangling pointers on return. Allocate new blob first; bail on NULL; only clear old state after new allocation succeeds; parse from the stable copy.
3 HIGH ubusd_proto.c 464 strcmp(target->client->user, cl->user) crashes if either user or group is NULL (possible when strdup fails). Added explicit NULL guards; treat NULL as non-matching (deny).
4 HIGH ubusd_acl.c 259–260 strdup() for cl->user and cl->group was never checked. A NULL result would propagate to any later strcmp call. Added NULL checks with proper cleanup (free the already-allocated string on failure).
5 HIGH ubusd_acl.c 668 ubusd_create_object_internal can return NULL (OOM), but acl_obj->recv_msg was assigned unconditionally — immediate NULL dereference. Guard assignment with if (acl_obj).
6 HIGH ubusd_event.c 131–134 fill_cb(cb_priv, id) can return NULL (OOM in ubus_msg_new), but (*ub)->hdr.type = ... was written on the next line unconditionally. Added if (!*ub) return; after the fill_cb call.
7 HIGH ubusd_main.c 248–251 sighup_handler called ubusd_acl_load() directly — which calls malloc, fopen, glob, syslog, avl_insert, none of which are async-signal-safe. Re-entrant invocation during a malloc in the main loop = crash or deadlock. Replaced with a self-pipe: handler writes one byte (async-signal-safe write); a uloop fd watcher drains the pipe and calls ubusd_acl_load() safely from the event loop.
8 HIGH lua/ubus.c 562 Inverted logic: if (lua_istable(L, 1)) raised an error when arg 1 is a table, but arg 1 is always the ubus_context userdata. Any ubus.add(c, table) call silently skipped the guard; non-table arg 2 was never rejected. Changed to if (!lua_istable(L, 2)).
9 MEDIUM libubus.c 83 calloc_a() in ubus_queue_msg was not checked for NULL; pending->hdr.data = data would immediately dereference NULL on OOM. Added if (!pending) return;.
10 MEDIUM ubusd_acl.c 318 calloc_a() in ubusd_acl_alloc_obj was not checked; o->partial = partial on the next line would crash on OOM. Added if (!o) return NULL;.
11 MEDIUM lua/ubus.c 505–513 calloc(mlen, sizeof(struct ubus_method)) was not checked for NULL; m[midx] would be dereferenced later. calloc(0, …) is also implementation-defined. On type allocation failure, the already-allocated m was leaked. Guard calloc with a NULL check; handle mlen == 0 explicitly; free m on type allocation failure path.

Notable bugs NOT fixed (design-level or require broader refactoring)

  • TOCTOU in /proc//status read (ubusd_acl.c:182) — the process can exec() between SO_PEERCRED and the /proc read, changing its GID. No simple fix without kernel-level credential locking.
  • World-writable socket (ubusd_main.c:296) — umask(0111) creates a 0666 socket. Intentional design for an embedded bus, but amplifies the impact of the bugs above.
  • ubus_shutdown double-close (libubus.c:441) — ctx->sock.fd is not reset to -1 after close(); a second call would close a potentially reused fd. Low-risk in practice.
  • Lua dangling pointers from lua_tostring (lua/ubus.c:416,445,502) — method names and policy names point into Lua-managed strings that can be GC'd. Fixing requires duplicating strings at object registration time.

@hauke hauke marked this pull request as draft April 13, 2026 23:25
Comment thread ubusd_id.c
#ifdef __linux__
return getrandom(buf, len, 0);
#else
if (random_fd < 0) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non Linux platforms tend to use arc4random for this.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This application is mostly used on Linux. I think it is fine to use /dev/urandom on others.

Comment thread lua/ubus.c Outdated
return NULL;
}
} else {
m = NULL;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a point of style, why not initialize m up in the declarations like all the rest of the locals?

@efahl

efahl commented Apr 14, 2026

Copy link
Copy Markdown

I gotta say, the work in 3f4353e (fix async-signal-unsafe SIGHUP handler) is pretty impressive.

@hauke

hauke commented Apr 16, 2026

Copy link
Copy Markdown
Member Author

I gotta say, the work in 3f4353e (fix async-signal-unsafe SIGHUP handler) is pretty impressive.

Sometimes LLMs have very good ideas, sometimes stupid. They need handholding. I think it came up with this one by itself, but introduced a bug. In the AI review it found the bug.

@hauke hauke marked this pull request as ready for review May 4, 2026 00:11
hauke and others added 21 commits May 23, 2026 02:46
calloc_a() can return NULL under memory pressure. The returned pointer
was used immediately without a check, causing a NULL dereference crash.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Link: openwrt#20
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
blob_memdup() can return NULL under memory pressure. The old code cleared
the ACL tree and freed acl_blob first, then called blob_memdup(). On
failure the tree was empty and acl_blob was NULL, but acl_add() still
ran and stored pointers into the transient 'msg' buffer — dangling
pointers once the caller freed that buffer.

Fix by allocating the new blob first. Only clear the old state after the
allocation succeeds. Also parse from the stable acl_blob copy rather
than from the transient 'msg' argument.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Link: openwrt#20
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
calloc_a() can return NULL under memory pressure. The returned pointer
was dereferenced immediately on the next line without a NULL check.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Link: openwrt#20
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Three related fixes for unchecked allocations on the client-init path:

  - strdup() for cl->group and cl->user was not checked. A failed
    allocation would leave NULL pointers that later crash any code
    comparing user or group strings (e.g. ubusd_handle_add_watch).

  - ubusd_acl_load_extra_gids() collapsed two unrelated failures into
    a single -1 return: fopen("/proc/<pid>/status") failing (e.g. /proc
    not mounted, or the pid already gone) and malloc() of cl->extra_gid
    failing under memory pressure. The caller ignored the return
    entirely, so the malloc failure silently produced a client with no
    supplementary groups, which would then evaluate ACLs incorrectly.

    Distinguish the two: fopen failure now returns 0 (degraded but
    functional — no extra gids available); malloc failure returns -1
    (real OOM).

  - Have ubusd_acl_init_client() check the extra-gid return and tear
    down the already-allocated user/group strings on failure, matching
    the existing strdup cleanup pattern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Link: openwrt#20
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
ubusd_create_object_internal() can return NULL if ubus_alloc_id()
fails (e.g. when reading from /dev/urandom fails). The recv_msg
assignment was performed unconditionally, causing a NULL dereference.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Link: openwrt#20
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
fill_cb() calls ubus_msg_new(), which can return NULL under memory
pressure. The returned pointer was dereferenced unconditionally on the
next line to set hdr.type and hdr.peer, causing a NULL dereference
crash.

ubusd_send_event_msg() also returned void, so catching that NULL still
left the function returning silently while the caller in
ubusd_send_event() kept iterating the pattern tree. Every remaining
iteration called fill_cb() again (almost certainly hitting the same
OOM) and ubusd_send_event() unconditionally returned 0 — the
originating client received UBUS_STATUS_OK even though no subscriber
had been delivered the event.

Check the fill_cb() result for NULL, convert ubusd_send_event_msg() to
return int and surface the OOM as UBUS_STATUS_NO_DATA (the status the
rest of this file already uses for allocation failures, e.g.
ubusd_alloc_event_pattern()). ubusd_send_event() now breaks out of the
loop on the first error and propagates the status, which flows back
through ubusd_forward_event() to the originating client so it knows the
send did not complete.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Link: openwrt#20
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
The SIGHUP handler called ubusd_acl_load() directly, which in turn calls
malloc, fopen, glob, syslog, and avl_insert — none of which are
async-signal-safe. Invoking them from a signal handler that interrupted
any of those same functions in the main loop can cause heap corruption,
deadlock, or a crash.

Replace with a self-pipe: the signal handler writes a single byte
(write() is async-signal-safe) to a pipe, and a uloop fd watcher drains
the pipe and calls ubusd_acl_load() safely from the event loop context.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Link: openwrt#20
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
…_new_client

Three bugs in the error cleanup paths:

1. When ubus_alloc_id() failed, 'goto free' skipped ubusd_acl_free_client(),
   leaking the user/group strings and extra_gid array.

2. When ubusd_proto_init_retmsg() failed, 'goto free' skipped both
   ubus_free_id() and ubusd_acl_free_client(). The freed 'cl' struct
   remained in the client ID tree, causing use-after-free on any
   subsequent ID lookup.

3. When ubusd_send_hello() failed, 'goto delete' called ubus_free_id()
   but did not free cl->retmsg or cl->b, leaking those allocations.

Fix by restructuring the error labels so each path frees exactly the
resources that were successfully allocated.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Link: openwrt#20
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
…watch

cl->user, cl->group, target->client->user, and target->client->group
can all be NULL if the earlier strdup() calls in ubusd_acl_init_client()
fail under memory pressure. Passing a NULL pointer to strcmp() is
undefined behaviour and causes a crash.

Guard each comparison with explicit NULL checks and treat NULL as
non-matching (deny access) to preserve the original security intent.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Link: openwrt#20
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
blob_buf_init() can return -ENOMEM if the initial buffer allocation
fails. The return value was ignored, leaving cl->b.head as NULL.
The subsequent blob_raw_len(b->head) then dereferences that NULL
pointer, causing a crash.

Check the return value and propagate the error early. The caller's
goto delete_no_retmsg path already calls blob_buf_free(&cl->b),
which is safe because a failed blob_buf_init leaves cl->b.buf as
NULL (free(NULL) is a no-op).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Link: openwrt#20
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
The guard 'if (lua_istable(L, 1))' was backwards: it raised an error
when argument 1 IS a table, but argument 1 is always the ubus_context
userdata (never a table). The intent was to reject calls where argument
2 is NOT a table.

As a result, any call to ubus.add(c, non_table) silently proceeded to
lua_next() on a non-table value, causing a crash, while a valid call
with a proper table as argument 2 was never incorrectly rejected only
because arg 1 never matches lua_istable.

Fix by checking !lua_istable(L, 2).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Link: openwrt#20
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Two bugs in the method array allocation:

1. calloc(mlen, sizeof(struct ubus_method)) was not checked for NULL.
   On OOM the returned NULL pointer was stored in obj->o.methods and
   later dereferenced as m[midx], causing a crash.

2. calloc(0, N) has implementation-defined behaviour (may return NULL
   or a unique pointer). Treat mlen == 0 explicitly by setting m = NULL
   and skipping the allocation.

3. When the subsequent calloc for obj->o.type failed, the already-
   allocated method array 'm' was not freed, causing a memory leak.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Link: openwrt#20
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
getrandom() avoids opening a file descriptor, works before /dev/urandom
is accessible (e.g. early boot inside a container), and blocks until
the kernel entropy pool is initialised, making it safer than reading
from /dev/urandom directly.

getrandom() is available since glibc 2.25 and musl 1.1.20. On Linux
call it from <sys/random.h> directly. Non-Linux platforms continue to
use the /dev/urandom fd path unchanged.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Link: openwrt#20
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
ubus_msg_new() can return NULL on allocation failure, but the
subsequent assignment to ub->hdr.type was unconditional, causing a
NULL pointer dereference.  Bail out early when allocation fails.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Link: openwrt#20
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Incoming messages can carry a file descriptor via SCM_RIGHTS.  Several
paths through ubus_process_req_msg() leaked that fd:

  - UBUS_MSG_STATUS with no matching request: break without closing.
  - UBUS_MSG_DATA: fd was never handled at all.

Consolidate cleanup by closing fd at the end of the function unless it
was handed off to req->fd_cb.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Link: openwrt#20
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
ubus_process_msg() receives a file descriptor along with each message
but leaked it on several paths:

  - UBUS_MSG_NOTIFY / UBUS_MSG_UNSUBSCRIBE on a channel context.
  - UBUS_MSG_INVOKE queued via ubus_queue_msg() when stack_depth > 0
    (ubus_queue_msg() does not propagate the fd).
  - UBUS_MSG_MONITOR in all paths; neither the channel-context check
    nor the monitor_cb invocation take ownership of the fd.
  - Any message type not matched by the switch.

Close the fd at the end of the function for all paths that do not
hand it off to a lower-level handler.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Link: openwrt#20
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
ubus_shutdown() left ctx->sock.fd and ctx->msgbuf.data dangling after
releasing them. The header documents shutdown as paired one-to-one with
ubus_connect_ctx(), but the function is exposed and a caller that
violates the contract — most realistically by invoking ubus_shutdown()
explicitly and then ubus_free() (which calls shutdown a second time) —
would close an unrelated descriptor the kernel may have recycled in the
interim and double-free the msgbuf.

Guard the close with a validity check and reset both fields after
release so subsequent calls become no-ops. This also brings the tear-
down path in line with ubus_reconnect()'s error handling, which already
sets sock.fd to -1 after closing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Link: openwrt#20
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
strdup() can fail and return NULL; the subsequent strrchr() call then
dereferences a NULL pointer.  The impact is limited to startup with a
constant path, but handle the allocation failure gracefully.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Link: openwrt#20
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
In ubus_alloc_id() the `continue` statement was intended to discard random
IDs that fall in the reserved system-object range (0..UBUS_SYSTEM_OBJECT_MAX-1)
and retry the read. In a do-while loop, however, `continue` jumps to the
loop-continuation expression, not back to the start of the body. The
expression then calls avl_insert() with the still-reserved ID, and on
success the loop exits — successfully assigning a reserved ID to a client,
object, or object type.

This breaks invariants relied upon elsewhere, e.g. ubusd_alloc_event_pattern()
rejects any object with id < UBUS_SYSTEM_OBJECT_MAX as a permission error,
making event registration fail for any client object that happens to draw a
low random ID.

Wrap the read in an inner do-while so a low ID actually triggers a re-read
before insertion is attempted.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Link: openwrt#20
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
ubus_msg_ref() can return NULL when the source message uses an external
buffer and the ubus_msg_new() allocation fails. ubus_msg_enqueue() stored
the result without checking and queued the list entry anyway. Later, when
the queue entry was freed via ubus_msg_list_free() -> ubus_msg_free(NULL),
the NULL message pointer was dereferenced.

Drop the queue entry and return early if the reference cannot be taken,
so we never link a list entry with a NULL message.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Link: openwrt#20
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
fcntl(fd, F_SETFL, ... | O_CLOEXEC) is a no-op for the close-on-exec bit.
F_SETFL only manipulates file *status* flags (O_APPEND, O_ASYNC,
O_NONBLOCK, ...), and silently masks out anything else; close-on-exec is
a file *descriptor* flag that has to be set with F_SETFD/FD_CLOEXEC.

Both ubus_reconnect() and ubus_channel_connect() relied on the broken
form and therefore left the ubus socket inheritable across exec(2). Any
client that fork+exec'd a child (a common pattern in OpenWrt services)
would leak the ubus connection into the new process.

Set FD_CLOEXEC explicitly via F_SETFD in addition to the F_SETFL call
that toggles O_NONBLOCK.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Link: openwrt#20
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
hauke and others added 6 commits May 23, 2026 02:48
recv_retry() captures any SCM_RIGHTS fd into the caller's *recv_fd as soon
as the first recvmsg succeeds, then continues looping to drain the rest of
the iov. If a subsequent recvmsg in the same call returns 0 (EOF) or
fails fatally, recv_retry returns -1 with the captured fd already written
to the caller's variable. The same hand-off survives errors in the body
read or in the validate/alloc path of get_next_msg().

ubus_handle_data() simply broke out of its loop in those cases, dropping
the fd on the floor. With long-running clients on a flaky link this is a
slow descriptor leak that eventually exhausts the process's fd table.

Reset recv_fd to -1 once it has been handed off to ubus_process_msg(),
and close it at loop exit if anything is still parked there.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Link: openwrt#20
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Every other receive path converts the wire-format ubus_msghdr fields to
host byte order before use (see get_next_msg() and ubusd_main client_cb).
ubus_reconnect() forgot to do this for the HELLO peer field and stored
the raw big-endian value into ctx->local_id.

On little-endian hosts this means local_id holds a byte-swapped client
id forever:

  - The defensive `if (ctx->local_id <= UBUS_CLIENT_ID_CHANNEL)` check
    no longer rejects an id of 1, since 0x01000000 > 1. The check was
    intended to catch a server handing out the channel id (1) to a
    regular client.
  - ubus_context_is_channel() can never match an actual non-channel
    local_id by accident, but the value is still semantically wrong and
    breaks any future code that treats local_id as a real id.

Apply be32_to_cpu() to match the other receive paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Link: openwrt#20
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
If ubus_reconnect() fails after the new socket has been opened (HELLO
read failure, header validation, OOM, body read failure, ID validation),
the out_close: handler closes ctx->sock.fd but leaves the field pointing
at the now-closed descriptor.

A subsequent ubus_shutdown() / ubus_free() then calls close() on that
stale value, and a second reconnect attempt will pass the entry guard at
the top of ubus_reconnect() and close it again. In both cases the kernel
may have recycled the slot for an unrelated descriptor in the meantime,
causing us to close someone else's fd — the same class of bug fixed in
ubus_shutdown() previously.

Set ctx->sock.fd = -1 alongside the close so the field is a faithful
indicator of socket ownership.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Link: openwrt#20
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
len held the result of strlen() in an int. It is later passed as
len + 1 to calloc_a(), whose variadic length arguments are read via
va_arg as size_t in __calloc_a(). Passing an int where size_t is
expected is undefined behaviour and can yield a wrong allocation
size on LP64 platforms.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Link: openwrt#20
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
strlen() and blob_raw_len() return size_t; storing their result in
an int relies on an implicit narrowing conversion. Use size_t for
the affected length variables in ubusd_alloc_event_pattern(),
ubus_create_obj_method() and __ubusd_handle_lookup().

ubusd_alloc_event_pattern() checked the length with "<= 0", which is
only ever true for 0 once the variable is unsigned; replace it with
"!len".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Link: openwrt#20
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
The message sequence number is a uint16_t on the wire (ubus_msghdr),
and ubusd_acl_seq is emitted as a u32. Declaring the counters with
the matching fixed-width types removes the implicit conversion that
happened on every store or send:

  event_seq               int          -> uint16_t
  ubus_object.invoke_seq  unsigned int -> uint16_t
  ubus_monitor.seq        uint32_t     -> uint16_t
  ubusd_acl_seq           int          -> uint32_t

The values placed on the wire are unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Link: openwrt#20
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
@openwrt-bot openwrt-bot merged commit 795b32b into openwrt:master May 23, 2026
10 checks passed
@peterwillcn

Copy link
Copy Markdown

@peterwillcn

Copy link
Copy Markdown

ping @claude

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants