Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 52 additions & 0 deletions docs/Audit_Remediation_Results_2026-06-12.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# DID-matlab Audit Remediation — Results (2026-06-12)

> **Context for a reviewer / next agent.** One of **9 coordinated PRs** in the 2026-06 NDI
> ecosystem audit; **none are merged.** This repo's PR: **VH-Lab/DID-matlab#146** — **merge
> together with the DID-python PR VH-Lab/DID-python#23** (same audit item §6.1).
> **Author-not-run** — run the DID-matlab + symmetry suites before merge. The one deferral
> is the **`timestamp` format DECISION** (see "DECISION required" below).

Branch `audit/did-matlab-2026-06`, off `origin/main` (`03b0f7f`). This is the
MATLAB half of the **DID lockstep**; the Python half is
`audit/did-python-2026-06`.

> **⚠️ Author-not-run.** Authored without a local MATLAB runtime. Please run the
> DID-matlab + symmetry test suites before merging.

## Findings addressed (audit §6.1)

| # | Severity | Commit | Summary |
|---|----------|--------|---------|
| 6.1-3 | Medium | `14439fb` | **SQL injection via interpolated ids.** Several `sqlitedb` queries interpolate `branch_id` / `doc_id` / `document_id` / `filename` directly into double-quoted SQL string literals, e.g. `['... WHERE doc_id="' document_id '"']`. `run_sql_query()` does not forward bind parameters to mksqlite (it calls `do_run_sql_query` without `varargin`), so those values cannot be passed as `?` placeholders; a value containing a double quote could break out of the literal and inject SQL. Added a private `escapeSqlLiteral` helper that doubles embedded double quotes (inside a double-quoted token SQLite reads `""` as an escaped `"`), mirroring DID-python's `_sql_escape`, and applied it at all 14 value-interpolation sites. The already-parameterized `run_sql_noOpen('... =?', val)` calls are unchanged. |

### isa (6.1-1) — no MATLAB change needed

MATLAB's `isa` (via `doc2sql` `meta.class`/`meta.superclass` + the SQL translation) is
the **correct reference**; the divergence was on the Python side, which has been brought
to parity (`audit/did-python-2026-06`). No DID-matlab isa change is required; the two are
merged together so a symmetry run stays consistent.

### sqlite indexes (6.1-4) — already present in MATLAB

The audit item is "Python omits MATLAB's sqlite indexes." MATLAB already creates them;
the Python side added the equivalents. No MATLAB change.

## DECISION required — do not merge a serialization change without sign-off

**§6.1-2 / §7.3-13: the `timestamp` column meaning diverges across languages and is NOT
changed here.** DID-matlab writes MATLAB `now` (datenum — days since year 0) into the
`docs`/`branches`/`branch_docs` `timestamp REAL` column (`sqlitedb.m` `do_add_branch`,
add-doc, add-to-branch); DID-python writes `time.time()` (Unix epoch seconds) into the
same column. Cross-client `lessthan`/`greaterthan` comparisons on `timestamp` silently
break, and the cloud backend stores the value verbatim.

This is a **cross-client format decision**, not a one-sided patch — changing the
serialization on either side alone would break the other and any already-stored data. It
needs an explicit decision (proposed: **ISO-8601 TEXT**, or Unix epoch seconds, with a
documented one-time conversion for existing rows) made jointly for DID-python,
DID-matlab, and the cloud backend. Left untouched pending that decision.

## Lockstep / merge

Merge with `audit/did-python-2026-06` (the Python field-name validation is the same audit
item 6.1-3). VH-Lab fork-and-PR: a DID-matlab fork is created at review time.
47 changes: 33 additions & 14 deletions src/did/+did/+implementations/sqlitedb.m
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,25 @@ function delete(this_obj)
% disposed when this method returns, using the onCleanup mechanism
end % do_run_sql_query()

function s = escapeSqlLiteral(~, s)
% escapeSqlLiteral - escape a value for a double-quoted SQL literal
%
% S = escapeSqlLiteral(THIS_OBJ, S)
%
% Several queries below interpolate identifiers (branch_id, doc_id,
% filename) into SQL string literals delimited by double quotes,
% e.g. ['... WHERE doc_id="' document_id '"']. run_sql_query() does
% not forward bind parameters to mksqlite (it calls
% do_run_sql_query without varargin), so those values cannot be
% passed as '?' placeholders and must be escaped to prevent SQL
% injection from a crafted id/branch name. Inside a double-quoted
% token SQLite treats "" as an escaped ", so doubling the double
% quotes neutralizes any attempt to break out of the literal. This
% mirrors the DID-python _sql_escape (which doubles single quotes
% for its single-quoted literals).
s = strrep(char(s), '"', '""');
end % escapeSqlLiteral()

function branch_ids = do_get_branch_ids(this_obj)
% do_get_branch_ids - return all unique branch ids in the database
%
Expand Down Expand Up @@ -153,7 +172,7 @@ function do_add_branch(this_obj, branch_id, parent_branch_id, varargin)
this_obj.insert_into_table('branches', 'branch_id,parent_id,timestamp', branch_id, parent_branch_id, tnow);

% Duplicate the docs from parent branch to the newly-created branch
sqlStr = ['SELECT doc_idx FROM branch_docs WHERE branch_id="' parent_branch_id '"'];
sqlStr = ['SELECT doc_idx FROM branch_docs WHERE branch_id="' this_obj.escapeSqlLiteral(parent_branch_id) '"'];
data = this_obj.run_sql_noOpen(sqlStr);
if ~isempty(data)
doc_idx = [data.doc_idx];
Expand All @@ -176,11 +195,11 @@ function do_delete_branch(this_obj, branch_id, varargin)
if ~isempty(doc_ids)
% Remove all documents from the branch_docs table
% TODO: also delete records of unreferenced docs ???
this_obj.run_sql_query(['DELETE FROM branch_docs WHERE branch_id="' branch_id '"']);
this_obj.run_sql_query(['DELETE FROM branch_docs WHERE branch_id="' this_obj.escapeSqlLiteral(branch_id) '"']);
end

% Now delete the branch record
this_obj.run_sql_query(['DELETE FROM branches WHERE branch_id="' branch_id '"']);
this_obj.run_sql_query(['DELETE FROM branches WHERE branch_id="' this_obj.escapeSqlLiteral(branch_id) '"']);
end % do_delete_branch()

function parent_branch_id = do_get_branch_parent(this_obj, branch_id, varargin)
Expand All @@ -190,7 +209,7 @@ function do_delete_branch(this_obj, branch_id, varargin)
%
% Returns the ID of the parent branch for the specified BRANCH_ID.

sqlStr = ['SELECT parent_id FROM branches WHERE branch_id="' branch_id '"'];
sqlStr = ['SELECT parent_id FROM branches WHERE branch_id="' this_obj.escapeSqlLiteral(branch_id) '"'];
data = this_obj.run_sql_query(sqlStr);
if isempty(data)
parent_branch_id = '';
Expand Down Expand Up @@ -219,7 +238,7 @@ function do_delete_branch(this_obj, branch_id, varargin)
% Returns a cell array of IDs of sub-branches of the specified BRANCH_ID.
% If BRANCH_ID has no sub-branches, an empty cell array is returned.

sqlStr = ['SELECT branch_id FROM branches WHERE parent_id="' branch_id '"'];
sqlStr = ['SELECT branch_id FROM branches WHERE parent_id="' this_obj.escapeSqlLiteral(branch_id) '"'];
data = this_obj.run_sql_query(sqlStr);
if isempty(data)
branch_ids = {};
Expand All @@ -241,7 +260,7 @@ function do_delete_branch(this_obj, branch_id, varargin)
if nargin > 1 && ~isempty(branch_id)
sqlStr = ['SELECT docs.doc_id FROM docs,branch_docs' ...
' WHERE docs.doc_idx = branch_docs.doc_idx' ...
' AND branch_id="' branch_id '"'];
' AND branch_id="' this_obj.escapeSqlLiteral(branch_id) '"'];
else
sqlStr = 'SELECT docs.doc_id FROM docs';
end
Expand Down Expand Up @@ -432,7 +451,7 @@ function do_add_doc(this_obj, document_obj, branch_id, options)
%document_obj = did.document(doc);

% Run the SQL query in the database
query_str = ['SELECT json_code FROM docs WHERE doc_id="' document_id '"'];
query_str = ['SELECT json_code FROM docs WHERE doc_id="' this_obj.escapeSqlLiteral(document_id) '"'];
data = this_obj.run_sql_query(query_str);

% Process missing document results
Expand Down Expand Up @@ -500,8 +519,8 @@ function do_remove_doc(this_obj, document_id, branch_id, options)
% Handle case of missing document
sqlStr = ['SELECT docs.doc_idx FROM docs,branch_docs ' ...
' WHERE docs.doc_idx = branch_docs.doc_idx ' ...
' AND branch_id="' branch_id '"' ...
' AND doc_id="' doc_id '"'];
' AND branch_id="' this_obj.escapeSqlLiteral(branch_id) '"' ...
' AND doc_id="' this_obj.escapeSqlLiteral(doc_id) '"'];
%doc_id = [doc_id '/' branch_id];
data = this_obj.run_sql_noOpen(sqlStr);
if isempty(data)
Expand All @@ -521,7 +540,7 @@ function do_remove_doc(this_obj, document_id, branch_id, options)
doc_idx = data(1).doc_idx;

% Remove the document from the branch_docs table
this_obj.run_sql_noOpen(['DELETE FROM branch_docs WHERE branch_id="' branch_id '" AND doc_idx=?'], doc_idx);
this_obj.run_sql_noOpen(['DELETE FROM branch_docs WHERE branch_id="' this_obj.escapeSqlLiteral(branch_id) '" AND doc_idx=?'], doc_idx);

% TODO - remove all document records if no branch references remain?
%{
Expand Down Expand Up @@ -579,10 +598,10 @@ function do_remove_doc(this_obj, document_id, branch_id, options)
% Get the cached filepath to the specified document
query_str = ['SELECT cached_location,orig_location,uid,type ' ...
' FROM docs,files ' ...
' WHERE docs.doc_id="' document_id '" ' ...
' WHERE docs.doc_id="' this_obj.escapeSqlLiteral(document_id) '" ' ...
' AND files.doc_idx=docs.doc_idx'];
if nargin > 2 && ~isempty(filename)
query_str = [query_str ' AND files.filename="' filename '"'];
query_str = [query_str ' AND files.filename="' this_obj.escapeSqlLiteral(filename) '"'];
else
error('DID:SQLITEDB:open','The requested filename must be specified in open_doc()');
%filename = ''; % used in catch block below
Expand Down Expand Up @@ -713,10 +732,10 @@ function tryCustomFileHandler(customFileHandler, destPath, sourcePath, file_type
% Get the cached filepath to the specified document
query_str = ['SELECT cached_location,orig_location,uid,type ' ...
' FROM docs,files ' ...
' WHERE docs.doc_id="' document_id '" ' ...
' WHERE docs.doc_id="' this_obj.escapeSqlLiteral(document_id) '" ' ...
' AND files.doc_idx=docs.doc_idx'];
if nargin > 2 && ~isempty(filename)
query_str = [query_str ' AND files.filename="' filename '"'];
query_str = [query_str ' AND files.filename="' this_obj.escapeSqlLiteral(filename) '"'];
else
error('DID:SQLITEDB:open','The requested filename must be specified in check_exist_doc()');
end
Expand Down
21 changes: 19 additions & 2 deletions tools/tasks/codecheckToolbox.m
Original file line number Diff line number Diff line change
@@ -1,4 +1,21 @@
function codecheckToolbox()
function codecheckToolbox(varargin)
% codecheckToolbox Project code-check wrapper invoked by CI
% (ehennestad/matbox-actions check-code@v1).
%
% The check-code action adds tools/ to the path and calls the bare
% name, e.g. codecheckToolbox("FoldersToCheck","src"). This wrapper
% must therefore:
% (a) live at the bare name so the action's
% exist("codecheckToolbox","file") branch resolves to it
% (rather than falling through to matbox with CreateBadge=true),
% (b) accept and forward those name-value args, and
% (c) force CreateBadge=false. Badge generation goes through
% matbox.utility.createBadgeSvg, which needs a CPython install
% the CI runner does not have ("Python commands require a
% supported version of CPython").
%
% Earlier this was a zero-arg wrapper, which broke once the action
% began passing "FoldersToCheck"/"src" ("Too many input arguments").
projectRootDirectory = didtools.projectdir();
matbox.tasks.codecheckToolbox(projectRootDirectory, "CreateBadge", false)
matbox.tasks.codecheckToolbox(projectRootDirectory, "CreateBadge", false, varargin{:});
end
Loading