Skip to content

feat(analyzer): Add third-party + stdlib taint passThrough propagators#154

Open
misonijnik wants to merge 8 commits into
mainfrom
misonijnik/opentaint-config-passthrough
Open

feat(analyzer): Add third-party + stdlib taint passThrough propagators#154
misonijnik wants to merge 8 commits into
mainfrom
misonijnik/opentaint-config-passthrough

Conversation

@misonijnik
Copy link
Copy Markdown
Member

No description provided.

@misonijnik misonijnik force-pushed the misonijnik/opentaint-config-passthrough branch 2 times, most recently from a6295e0 to 050c869 Compare May 25, 2026 09:54
@misonijnik misonijnik marked this pull request as ready for review May 27, 2026 13:02
Extract the passThrough (propagator) rules from misonijnik/codeql-rules
that
the rule-test suite depends on but the main-built analyzer jar does not
yet
bundle. Adds 13 per-jar propagator files under config/jar-split and
extends
stdlib.yaml, so a future analyzer release ships them by default and the
rules
CI no longer needs to overlay them.
…overage

Refine the propagators added in 309492b using the engine's actual
propagation semantics (no implicit this->result for unmodeled library
calls; whole-object taint subsumes field reads but virtual-field taint
does not satisfy a whole-object sink).

Fix broken fluent chains by modeling every intermediate link, not just
the endpoints:
- okhttp Request.Builder: cover all builder methods, not only url/build.
- spring-web RequestEntity: cover all static factories and the
  HeadersBuilder/BodyBuilder chain, not only get/build.
- java.net.http HttpRequest.Builder: cover header/POST/PUT/method/etc.,
  not only uri/GET/build.

Broaden per-package coverage:
- commons-io IOUtils: toByteArray/toCharArray/readLines.
- commons-codec Base64: URL-safe and chunked encode variants.
- java.util.Base64.Decoder: decode/wrap (decode direction).
- spring-jdbc: substituteNamedParameters/parseSqlStatementIntoString.
- spring-ldap: LdapQueryBuilder config carriers + gte/lte/not
(iface+impl).
- velocity: Context interface put.
- groovy CompilationUnit.addSource: arg(*) to cover File/URL/SourceUnit
  overloads where the source is the first argument.

Wrapper constructors keep whole-object arg(*)->this on purpose: their
sinks consume the whole object, so a virtual-field-only taint would be a
false negative.
… fields

Make the new propagators precise and type-correct. A bare `arg -> this`
(or `arg -> result`) makes the whole object abstract-tainted, so every
later field read of it matches via abstract refinement — over-tainting —
and copies a value of one type onto an object of another. Instead, store
the differently-typed argument into a `<rule-storage>` virtual field of
the target:

  from: arg(i)
  to:
  - this            # or result
  - .java.lang.Object#<rule-storage>#java.lang.Object

This is precise (reading unrelated real fields no longer matches, since
the node is no longer abstract) and type-safe (a field accessor owned by
java.lang.Object is always type-compatible, so the slot survives every
chain step and the Builder -> Product type change of build()/body()
under
a plain this->result carry). A sink that consumes the whole object still
matches the slot by prefix.

Applied to: StringEntity, hudson.FilePath, unboundid SearchRequest,
JMXServiceURL/JMXConnectorFactory, StreamSource, java.net.URL(String),
VelocityContext.put, CompilationUnit.addSource, ant FileSet set*, and
the
okhttp Request.Builder / java.net.http HttpRequest.Builder / spring-web
RequestEntity / spring-ldap query-builder chains.

Builder data methods store into both result (chained `a().b()`) and this
(non-chained `x.a(); x.b()`) and carry this->result; every chain link
keeps this->result so the chain survives. Pure transforms that *produce*
the tainted value (toString/decode/getBytes/URI.create/MVEL.compile/
substituteNamedParameters/iterator.next) stay bare, since the result is
itself the tainted value with no surrounding state to over-taint.
Several new propagators repeated rules already present in other bundled
config files (all .yaml under /config are concatenated at load time, so
a
second copy of the same function is a real runtime duplicate). Removed:

- commons-codec: Base64#encodeBase64 / #decodeBase64 — already in
  config.yaml (identical arg(0)->result). Kept the String / URL-safe /
  chunked variants, which are not.
- stdlib: dropped the catch-all (no-signature) entries that existing
  signature-specific rules already cover better:
  String#getBytes, URI#create, Enumeration#nextElement (existing ()
entry
  already has this->result), URL#<init>(String) (existing same-signature
  entry), Base64$Decoder#decode (existing typed overloads), and
  StreamSource#<init> (jmod.yaml has all typed overloads).
  Also dropped Collection#iterator (covered by the kept
Iterable#iterator
  via override). Kept Iterable#iterator / Iterator#next generics — the
  existing entries only model element-level (.Element) flow, so these
add
  the whole-collection this->result carry.
- spring-web-5.3.39: dropped the RequestEntity
HeadersBuilder/BodyBuilder
  chain methods — spring-web-7.0.2.yaml already provides them. Kept only
  the static factories (get/post/method/...), because v7 models those as
  this->result, a no-op for static methods; the arg-based form here is
  what actually carries the URI.

Note: spring-web-7.0.2.yaml's RequestEntity static factories are dead
(this-based on static methods); they should be fixed or removed there
separately.
…3.39 file

spring-web-7.0.2.yaml modeled the RequestEntity.get/post/head/put/patch/
delete/options/method static factories as `this -> result`, which is a
no-op for static methods, so they never carried the URI argument.
Replace
them with the arg-based form that stores the URI into a
java.lang.Object-
owned <rule-storage> slot of the returned builder; the slot survives the
builder -> RequestEntity type change of .build() under the existing
this->result builder carries, and the RestTemplate sink consuming the
whole RequestEntity matches it by prefix.

With the factories fixed, spring-web-5.3.39.yaml (which only existed to
provide working arg-based factories) is fully redundant — RequestEntity
is the same class in both jars and all config now lives in
spring-web-7.0.2.yaml — so delete it.
…at terminals

Follow-up to the static-factory fix: instead of a java.lang.Object-owned
<rule-storage> slot, store the factory's URI argument into the builder's
own type slot (RequestEntity$HeadersBuilder#<rule-storage>, which also
covers BodyBuilder since it extends HeadersBuilder), and re-key it onto
the RequestEntity#<rule-storage> slot at the terminals .build() and
.body().

This is more type-honest and unifies with the existing <init> / getBody
modeling (all RequestEntity construction paths now land taint in the
same
RequestEntity#<rule-storage> slot), and a builder-typed slot is
type-pruned if taint ever reaches a non-builder object. The cost is that
every terminal returning a RequestEntity must re-key (HeadersBuilder is
not a subtype of RequestEntity, so a plain this->result drops the
builder-owned slot) — both build() and body() do.
…key at transitions

Replace the imprecise `java.lang.Object` <rule-storage> slot owners in
the
new passthrough config with precise domain types, mirroring the
spring-web
RequestEntity reference (precise owner, java.lang.Object kept only as
the
field value type, re-key where the static type changes).

- No type change (wrapper -> this, factory -> result): owner = the
object's
  own type — ant FileSet, jenkins FilePath, httpcore5 StringEntity,
unboundid
  SearchRequest, groovy CompilationUnit, JMX JMXServiceURL/JMXConnector;
  Velocity put() uses the common Context interface for all three
receivers.
- Single transition: builder-typed owner along the chain + one re-key at
  build() — okhttp Request.Builder -> Request, java.net.http
  HttpRequest.Builder -> HttpRequest.
- Multi-transition Spring LDAP chain: owner tracks each stage
  (LdapQueryBuilder -> ConditionCriteria -> ContainerCriteria) with a
re-key
  at every type change
(where/is/like/gte/lte/whitespaceWildcardsLike/and/or/
  filter), mirrored on the Default* impl entries.

Re-key acceptance follows
JIRFactTypeChecker.typeMayHaveSubtypeOf(resultType,
owner); every transition is covered, so the slot reaches the terminal in
exactly the cases the Object-owner baseline did (behavior-equivalent).
…n passthroughs

Rework the new jar-split/stdlib passThrough rules for precision:

- Replace arg(*) with per-overload `signature:` + exact arg indices
  (RequestEntity, okhttp/java.net.http builders, StringEntity, FilePath,
  CompilationUnit.addSource, JMXServiceURL, SearchRequest). Keep arg(*)
only
  for genuine varargs (e.g. UriTemplate.expand). Use `*` return to avoid
  generic-erasure mismatch; exclude structurally non-taint args (e.g.
  RequestEntity.method's HttpMethod enum).

- Rename branch-new <rule-storage> slots to specific names: url /
content /
  path / source / value / filter; SearchRequest split into baseDN /
filter /
  attributes. Pre-existing <rule-storage> slots left untouched. Note:
named
  fields are no longer excluded from .* (AnyAccessor) unrolling — only
the
  literal "<rule-storage>" is special-cased (JIRTaintAnalyzer
UnrollStrategy);
  ContainsMarkOnAnyField sinks still see the named sub-fields.

- HttpHeaders: enumerate all 61 setter overloads + 24 read-back getters
with
  exact signatures and per-header named vfields (Location, ContentType,
...;
  dynamic-name set/add/getFirst/get share DynamicHeader), keeping
whole-object
  taint for header-injection sinks.

- RequestEntity: fix the static factories
(get/post/head/delete/options/put/
  patch/method carry the URI/template args into the builder slot;
this->result
  was a no-op for statics) and add the getUrl read-back accessor.

- Drop explanatory comments from the passThrough YAML.
@misonijnik misonijnik force-pushed the misonijnik/opentaint-config-passthrough branch from 1359bfe to 0b087fd Compare May 29, 2026 06:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant