Skip to content

feat(rules): Add security rules and sanitizers#153

Draft
misonijnik wants to merge 50 commits into
mainfrom
misonijnik/codeql-rules
Draft

feat(rules): Add security rules and sanitizers#153
misonijnik wants to merge 50 commits into
mainfrom
misonijnik/codeql-rules

Conversation

@misonijnik
Copy link
Copy Markdown
Member

No description provided.

@misonijnik misonijnik changed the title feat(rules): Add Java/Kotlin security rules and sanitizers feat(rules): Add security rules and sanitizers May 23, 2026
@misonijnik misonijnik force-pushed the misonijnik/codeql-rules branch 4 times, most recently from d66c2af to 55b48db Compare May 27, 2026 12:54
misonijnik and others added 25 commits May 29, 2026 08:32
Extract the passThrough (propagator) rules from misonijnik/codeql-rules
that
the rule-test suite depends on but the main-built analyzer jar does not
yet
bundle. Adds 13 per-jar propagator files under config/jar-split and
extends
stdlib.yaml, so a future analyzer release ships them by default and the
rules
CI no longer needs to overlay them.
…overage

Refine the propagators added in 309492b using the engine's actual
propagation semantics (no implicit this->result for unmodeled library
calls; whole-object taint subsumes field reads but virtual-field taint
does not satisfy a whole-object sink).

Fix broken fluent chains by modeling every intermediate link, not just
the endpoints:
- okhttp Request.Builder: cover all builder methods, not only url/build.
- spring-web RequestEntity: cover all static factories and the
  HeadersBuilder/BodyBuilder chain, not only get/build.
- java.net.http HttpRequest.Builder: cover header/POST/PUT/method/etc.,
  not only uri/GET/build.

Broaden per-package coverage:
- commons-io IOUtils: toByteArray/toCharArray/readLines.
- commons-codec Base64: URL-safe and chunked encode variants.
- java.util.Base64.Decoder: decode/wrap (decode direction).
- spring-jdbc: substituteNamedParameters/parseSqlStatementIntoString.
- spring-ldap: LdapQueryBuilder config carriers + gte/lte/not
(iface+impl).
- velocity: Context interface put.
- groovy CompilationUnit.addSource: arg(*) to cover File/URL/SourceUnit
  overloads where the source is the first argument.

Wrapper constructors keep whole-object arg(*)->this on purpose: their
sinks consume the whole object, so a virtual-field-only taint would be a
false negative.
… fields

Make the new propagators precise and type-correct. A bare `arg -> this`
(or `arg -> result`) makes the whole object abstract-tainted, so every
later field read of it matches via abstract refinement — over-tainting —
and copies a value of one type onto an object of another. Instead, store
the differently-typed argument into a `<rule-storage>` virtual field of
the target:

  from: arg(i)
  to:
  - this            # or result
  - .java.lang.Object#<rule-storage>#java.lang.Object

This is precise (reading unrelated real fields no longer matches, since
the node is no longer abstract) and type-safe (a field accessor owned by
java.lang.Object is always type-compatible, so the slot survives every
chain step and the Builder -> Product type change of build()/body()
under
a plain this->result carry). A sink that consumes the whole object still
matches the slot by prefix.

Applied to: StringEntity, hudson.FilePath, unboundid SearchRequest,
JMXServiceURL/JMXConnectorFactory, StreamSource, java.net.URL(String),
VelocityContext.put, CompilationUnit.addSource, ant FileSet set*, and
the
okhttp Request.Builder / java.net.http HttpRequest.Builder / spring-web
RequestEntity / spring-ldap query-builder chains.

Builder data methods store into both result (chained `a().b()`) and this
(non-chained `x.a(); x.b()`) and carry this->result; every chain link
keeps this->result so the chain survives. Pure transforms that *produce*
the tainted value (toString/decode/getBytes/URI.create/MVEL.compile/
substituteNamedParameters/iterator.next) stay bare, since the result is
itself the tainted value with no surrounding state to over-taint.
Several new propagators repeated rules already present in other bundled
config files (all .yaml under /config are concatenated at load time, so
a
second copy of the same function is a real runtime duplicate). Removed:

- commons-codec: Base64#encodeBase64 / #decodeBase64 — already in
  config.yaml (identical arg(0)->result). Kept the String / URL-safe /
  chunked variants, which are not.
- stdlib: dropped the catch-all (no-signature) entries that existing
  signature-specific rules already cover better:
  String#getBytes, URI#create, Enumeration#nextElement (existing ()
entry
  already has this->result), URL#<init>(String) (existing same-signature
  entry), Base64$Decoder#decode (existing typed overloads), and
  StreamSource#<init> (jmod.yaml has all typed overloads).
  Also dropped Collection#iterator (covered by the kept
Iterable#iterator
  via override). Kept Iterable#iterator / Iterator#next generics — the
  existing entries only model element-level (.Element) flow, so these
add
  the whole-collection this->result carry.
- spring-web-5.3.39: dropped the RequestEntity
HeadersBuilder/BodyBuilder
  chain methods — spring-web-7.0.2.yaml already provides them. Kept only
  the static factories (get/post/method/...), because v7 models those as
  this->result, a no-op for static methods; the arg-based form here is
  what actually carries the URI.

Note: spring-web-7.0.2.yaml's RequestEntity static factories are dead
(this-based on static methods); they should be fixed or removed there
separately.
…3.39 file

spring-web-7.0.2.yaml modeled the RequestEntity.get/post/head/put/patch/
delete/options/method static factories as `this -> result`, which is a
no-op for static methods, so they never carried the URI argument.
Replace
them with the arg-based form that stores the URI into a
java.lang.Object-
owned <rule-storage> slot of the returned builder; the slot survives the
builder -> RequestEntity type change of .build() under the existing
this->result builder carries, and the RestTemplate sink consuming the
whole RequestEntity matches it by prefix.

With the factories fixed, spring-web-5.3.39.yaml (which only existed to
provide working arg-based factories) is fully redundant — RequestEntity
is the same class in both jars and all config now lives in
spring-web-7.0.2.yaml — so delete it.
…at terminals

Follow-up to the static-factory fix: instead of a java.lang.Object-owned
<rule-storage> slot, store the factory's URI argument into the builder's
own type slot (RequestEntity$HeadersBuilder#<rule-storage>, which also
covers BodyBuilder since it extends HeadersBuilder), and re-key it onto
the RequestEntity#<rule-storage> slot at the terminals .build() and
.body().

This is more type-honest and unifies with the existing <init> / getBody
modeling (all RequestEntity construction paths now land taint in the
same
RequestEntity#<rule-storage> slot), and a builder-typed slot is
type-pruned if taint ever reaches a non-builder object. The cost is that
every terminal returning a RequestEntity must re-key (HeadersBuilder is
not a subtype of RequestEntity, so a plain this->result drops the
builder-owned slot) — both build() and body() do.
…key at transitions

Replace the imprecise `java.lang.Object` <rule-storage> slot owners in
the
new passthrough config with precise domain types, mirroring the
spring-web
RequestEntity reference (precise owner, java.lang.Object kept only as
the
field value type, re-key where the static type changes).

- No type change (wrapper -> this, factory -> result): owner = the
object's
  own type — ant FileSet, jenkins FilePath, httpcore5 StringEntity,
unboundid
  SearchRequest, groovy CompilationUnit, JMX JMXServiceURL/JMXConnector;
  Velocity put() uses the common Context interface for all three
receivers.
- Single transition: builder-typed owner along the chain + one re-key at
  build() — okhttp Request.Builder -> Request, java.net.http
  HttpRequest.Builder -> HttpRequest.
- Multi-transition Spring LDAP chain: owner tracks each stage
  (LdapQueryBuilder -> ConditionCriteria -> ContainerCriteria) with a
re-key
  at every type change
(where/is/like/gte/lte/whitespaceWildcardsLike/and/or/
  filter), mirrored on the Default* impl entries.

Re-key acceptance follows
JIRFactTypeChecker.typeMayHaveSubtypeOf(resultType,
owner); every transition is covered, so the slot reaches the terminal in
exactly the cases the Object-owner baseline did (behavior-equivalent).
…n passthroughs

Rework the new jar-split/stdlib passThrough rules for precision:

- Replace arg(*) with per-overload `signature:` + exact arg indices
  (RequestEntity, okhttp/java.net.http builders, StringEntity, FilePath,
  CompilationUnit.addSource, JMXServiceURL, SearchRequest). Keep arg(*)
only
  for genuine varargs (e.g. UriTemplate.expand). Use `*` return to avoid
  generic-erasure mismatch; exclude structurally non-taint args (e.g.
  RequestEntity.method's HttpMethod enum).

- Rename branch-new <rule-storage> slots to specific names: url /
content /
  path / source / value / filter; SearchRequest split into baseDN /
filter /
  attributes. Pre-existing <rule-storage> slots left untouched. Note:
named
  fields are no longer excluded from .* (AnyAccessor) unrolling — only
the
  literal "<rule-storage>" is special-cased (JIRTaintAnalyzer
UnrollStrategy);
  ContainsMarkOnAnyField sinks still see the named sub-fields.

- HttpHeaders: enumerate all 61 setter overloads + 24 read-back getters
with
  exact signatures and per-header named vfields (Location, ContentType,
...;
  dynamic-name set/add/getFirst/get share DynamicHeader), keeping
whole-object
  taint for header-injection sinks.

- RequestEntity: fix the static factories
(get/post/head/delete/options/put/
  patch/method carry the URI/template args into the builder slot;
this->result
  was a no-op for statics) and add the getUrl read-back accessor.

- Drop explanatory comments from the passThrough YAML.
…tizers

Adds the security rules the new test samples reference but that were absent
from the ruleset, so test runs against the source-built analyzer go from
458 skipped to 0 (586 → 588 passing).

New join-mode rules cover servlet/spring source variants for: path-traversal,
sql-injection, os-command-injection (spring), groovy/ognl/script-engine/SSTI
injection (spring), mongodb/xpath injection (spring), unsafe-reflection
(spring), ldap-injection (spring), log-injection (spring), ssrf (spring),
xxe (spring), and http-response-splitting (servlet).

Also enriches generic sinks with CodeQL-aligned sanitizers and adds
exercising negative tests where the analyzer supports them:

  - path-traversal-sinks.yaml: FilenameUtils.normalize, Path.normalize,
    Path.toRealPath, File.getCanonicalPath/File (PathSanitizer.qll).
    FilenameUtils.getName negative test added and passes.
  - logging-sinks.yaml: line-break neutralization helpers from
    LogInjection.qll (LineBreaksLogInjectionSanitizer). escapeJava negative
    test added and passes.

Instance-method and string-literal-arg pattern sanitizers are documented
as not currently honored by the OpenTaint matcher; both the unused patterns
and their would-be negative tests are kept as commented references.
…ests

- ldap-injection-sinks.yaml: introduce pattern-sanitizers for
  org.owasp.encoder.Encode.{forLdap,forDN},
  org.springframework.ldap.support.LdapEncoder.{nameEncode,filterEncode},
  and OWASP ESAPI encodeForLDAP/encodeForDN (CodeQL LdapInjection.qll).
- ssrf-sinks.yaml: add URLEncoder.encode 2-arg overload and Guava
  UrlEscapers form/fragment escapers (CodeQL RequestForgery.qll).
- servlet-xss-html-response-sinks.yaml + spring-xss-html-response-sinks.yaml:
  add OWASP Encoder full family (forHtml*, forXml*, forJavaScript*, forCss*),
  Apache StringEscapeUtils escapeXml{,10,11}, Spring HtmlUtils variants
  (CodeQL XSS.qll).

Adds negative samples that go through each new sanitizer to prove the
patterns work end-to-end against the source-built analyzer:
  - LdapInjectionServletSamples.encodedAuthenticate (LdapEncoder.filterEncode)
  - SsrfSamples.SafeEncodedParameterPollutionServlet (URLEncoder.encode)
Three concrete repros land in rules/test/src/main/java/test/AnalyzerPropagatorRepros.java:

  StringUtilsDefaultIfBlankServlet -> SQLi via Apache Commons Lang
                                     StringUtils.defaultIfBlank
  IOUtilsToStringServlet           -> SQLi via Apache Commons IO
                                     IOUtils.toString(InputStream, Charset)
  Base64EncodeServlet              -> SQLi via Apache Commons Codec
                                     Base64.encodeBase64String

Each pipes a tainted servlet parameter through one third-party helper and
into a JDBC sink. Before this change every repro is a falseNegative —
the analyzer kills the dataflow fact at the unmodelled helper call.

Fixes:
* core/opentaint-config/config/config/stdlib.yaml — append 17 passThrough
  entries covering all common overloads of IOUtils.toString, Base64
  encode/decode, and Base64URLSafe.
* rules/ruleset/java/lib/generic/servlet-untrusted-data-source.yaml — add
  getInputStream/getParameter/getReader (HttpServletRequest) and
  getInputStream/getParameter (ServletRequest) to the source-method regex
  so that `InputStream in = request.getInputStream()` is recognised.
* cli/cmd/dev_test_rules.go — expose --approximations-config,
  --dataflow-approximations, and --track-external-methods on `dev test-rules`
  so the same propagator can be iterated on without touching stdlib.yaml.

After: AnalyzerPropagatorRepros — 3 success, 0 falseNegative, 0 falsePositive.
Whole-suite delta: success 590 -> 593, no FP regression.
… by rules/test FNs

Five new tests under core/opentaint-java-querylang/src/test pin the
analyzer's pattern-matcher behaviour for the shapes that show up in the
failing rules/test scenarios:

  * multi-overload static-method sanitizer matching
    (synthetic Helper.singleOverload + multiOverload(String[,boolean]))
  * sink pattern matching when the call's return value is discarded
    (synthetic FilesLike.readPath / writePath)
  * Files.readAllBytes vs Files.writeString on the real JDK class
    (FilesWriteStringBug: PositiveReadAllBytes, PositiveWriteString,
    PositiveWriteStringConcatThenDiscard)
  * Same as above but with an assignment-from-getParameter source pattern
    (RealRuleFilesBug)
  * Same as above but under join-mode with refs to separate source and
    sink sub-rules (FilesWriteStringJoinBug)

All five pass — the analyzer handles each shape correctly in isolation.
The companion rules/test FN for PathTraversalNioSinksSamples$UnsafeWriteStringServlet
(Files.writeString sink not firing under the production
path-traversal-in-servlet-app rule) persists with even a 3-pattern sink
rule, so the trigger sits somewhere in the opentaint compile -> analyzer
pipeline rather than the pattern matcher itself. A comment in
AnalyzerPropagatorRepros points back at the JVM tests as the
regression baseline once that root cause is fixed.
- core/opentaint-config/config/config/stdlib.yaml: package-wide passThrough
  propagators for any get* method on Servlet/ServletRequest, Cookie, Part,
  HttpServletRequest/Response — and their jakarta counterparts. Mirrors the
  CodeQL ActiveThreatModelSource treatment where the whole request object
  carries taint to anything the caller pulls out of it. Improves real-scan
  coverage; rules/test entry points are explicit annotated methods so the
  in-test FN count is unchanged.
- rules/ruleset/java/lib/generic/servlet-untrusted-data-source.yaml:
  drop the doGet/doPost/... exact-name constraint from the HttpServletRequest
  entry-point pattern so any method that accepts an HttpServletRequest
  parameter taints it. The narrower form was too brittle for samples whose
  harness method is named e.g. doGet_getQueryString.
- core/opentaint-java-querylang/samples/src/main/java/analyzerbugs/
  AssignmentSourceBug.java, ServletParamSourceBug.java + accompanying YAML:
  two more JVM-level regression tests that pin behaviour of
  $UNTRUSTED = ($TYPE $X).$METHOD(...) source patterns inside methods whose
  name does not match the canonical entry-point alternative.

Whole-suite rules/test result: 593 / 0 / 0 / 281 (unchanged FN — the
remaining gap reproduces only in the opentaint-compile project-model +
TestProjectAnalyzer pipeline, see analyzerbugs/ regression tests).
The previous run was using the CLI wrapper without the JVM flags the CI
workflow passes to the analyzer:

  -Djdk.util.jar.enableMultiRelease=false
  -Dorg.opentaint.ir.impl.storage.defaultBatchSize=2000

The multi-release-jar flip is the load-bearing one — without it the
analyzer was reading the Java 8 view of java.nio.file.Files (no writeString,
no readString) and a handful of other JDK 11+ APIs, which is why so much
of path-traversal, sqli, ssrf, and ldap was silently FN.

Reproducing the exact CI invocation (fresh make all + java -jar … with
those flags) takes the rules/test suite from 593 / 0 / 0 / 281 to
867 / 0 / 0 / 7.

Closing the remaining gap:

- core/opentaint-config/config/config/stdlib.yaml: add Map.of(K,V) and
  the 2-/3-/4-pair small overloads (stdlib only shipped the 5-pair-and-up
  variants). Needed for Spring controllers that build a tainted Map via
  Map.of(...) and pass it to a sink.
- rules/ruleset/java/lib/spring/spel-injection-sinks.yaml: add Java EE
  EL APIs (ExpressionFactory.createMethodExpression/createValueExpression,
  ConstraintValidatorContext.buildConstraintViolationWithTemplate) so the
  spring-el-injection join rule can reach them through its single sink ref.
- rules/ruleset/java/lib/generic/template-injection-sinks.yaml: add the
  static-call form of org.apache.velocity.app.Velocity.evaluate/mergeTemplate
  and RuntimeSingleton.parse / getRuntimeServices().parse.
- rules/ruleset/java/lib/generic/data-query-injection-sinks.yaml: add the
  static-call form of com.mongodb.BasicDBObject.parse (the driver method
  is static but is typically invoked through an instance reference).

Whole-suite result: 867 / 0 / 0 / 7.

Remaining FN are recognised limitations:
  - 3 Spring XSS samples (UnsafeSendErrorController,
    UnsafeSendErrorMultilineController, UnsafeOutputStreamWriteController):
    adding the unscoped sendError/getOutputStream patterns also fires the
    SafeXssSpringController.safeGreet / SafeHtmlController.safeHtmlGreet
    safe samples that escape via HtmlUtils.htmlEscape. Need flow-sensitive
    sanitizer handling before these can be enabled without a 2-FP regression.
  - 2 OGNL setProperties samples: source flows into a Map.of value position
    via the new propagator, but the join `untrusted-data.$UNTRUSTED ->
    sink.$INPUT` binds on the whole map argument, not on its value cell.
  - 2 Part.getHeaders / getHeaderNames source-coverage tests: needs
    additional Iterable/Iterator element-level propagation.
… xss, cmd

Converts the previously structural command-injection-sinks and java-ssrf-sink
lib rules to mode: taint so pattern-sanitizers attach, then encodes the
static-method barriers from the matching CodeQL files:

  CodeQL PathSanitizer.qll        -> path-traversal-sinks.yaml
    FilenameUtils.normalize/.normalizeNoEndSeparator/.getName/.getExtension,
    Path.normalize, Path.toRealPath, File.getCanonicalPath/getCanonicalFile,
    pixee Filenames.toSimpleFileName.

  CodeQL RequestForgery.qll       -> ssrf-sinks.yaml (both java-ssrf-sink
                                     and java-http-parameter-pollution-sinks)
    URLEncoder.encode(String), URLEncoder.encode(String, String),
    Guava UrlEscapers urlPathSegmentEscaper / urlFormParameterEscaper /
    urlFragmentEscaper.

  CodeQL LdapInjection.qll        -> ldap-injection-sinks.yaml
    Spring LdapEncoder.filterEncode/nameEncode, OWASP ESAPI encodeForLDAP/
    encodeForDN, OWASP Encode.forLdap/forDN.

  CodeQL LogInjection.qll         -> logging-sinks.yaml
    StringEscapeUtils.escapeJava (commons-text, commons-lang, commons-lang3),
    OWASP Encode.forJavaScript. LineBreaksLogInjectionSanitizer
    (String.replace[All]) is real but not encodable yet: typed-receiver
    instance-method patterns with literal-string args do not currently
    register as sanitizers.

  CodeQL CommandLineQuery.qll     -> command-injection-sinks.yaml
    pixee SystemCommand.runCommand.

  CodeQL XSS.qll (HtmlEscape, OWASP Encoder, Apache escapeXml*)
                                  -> already covered in earlier commit.

rules/test/build.gradle.kts pulls in org.owasp.encoder:encoder:1.3.1 and
io.github.pixee:java-security-toolkit:1.2.3 so the barrier tests below can
exercise the new patterns.

rules/test/src/main/java/security/barriers/BarrierTests.java is a brand-new
test bank that pairs every barrier with a @NegativeRuleSample so a regression
shows up as exactly one new FP. Twenty-three samples cover the additions.

Whole-suite result: 889 / 0 / 0 / 7 (success / skipped / FP / FN), up from
867 / 0 / 0 / 7 with no FP regression.
…flection

Sinks converted to mode: taint where they were structural, then enriched
with the static-method sanitizers each CodeQL module recognises:

  XPath.qll              -> data-query-injection-sinks.yaml (java-xpath-injection-sinks)
    OWASP Encode.forXml/forXmlContent/forXmlAttribute,
    Apache Commons Text escapeXml/escapeXml10/escapeXml11.

  TemplateInjection.qll  -> template-injection-sinks.yaml (java-ssti-sinks)
    OWASP Encode.forHtml/forXml, Spring HtmlUtils.htmlEscape, Apache
    Commons Text escapeHtml4/escapeXml.

  UnsafeDeserialization.qll -> unsafe-deserialization-sinks.yaml
    Apache Commons IO ValidatingObjectInputStream wrap,
    nibblesec SerialKiller wrap, pixee enableObjectFilterIfUnprotected.

  LogInjection.qll       -> logging-sinks.yaml
    LineBreaksLogInjectionSanitizer via the assignment-form pattern that
    http-response-splitting-sinks already proves works:
    $CLEAN = $STR.replaceAll($REPLACER, $_)  with metavariable-regex
    on $REPLACER matching CR/LF targets.

  UnsafeReflection.qll   -> unsafe-reflection.yaml (java-unsafe-reflection-sinks)
    pixee ObjectInputFilters.createAllowList wrap.

BarrierTests.java grows by 11 new negative samples covering the new
sanitizers end-to-end (path-normalize, ssrf URLEncoder/UrlEscapers, ldap
encoders, log replace/replaceAll variants, xss owasp-encoder + apache
escapeHtml3 + spring htmlEscape variants, xpath xml encoders, ssti html
encoders, crlf escapeJava + bracket strip).

Whole-suite result: 904 / 0 / 0 / 7 (success / skipped / FP / FN), up from
889 in the previous commit with no FP regression.
  SmtpInjection      -> smtp-injection-sinks.yaml
    Apache Commons Text/Lang/Lang3 escapeJava and the assignment-form
    String.replace[All] line-break stripper.

  UrlRedirect.qll    -> servlet-unvalidated-redirect-sinks.yaml + spring
    URLEncoder.encode (1- and 2-arg), Guava UrlEscapers urlPathSegment /
    urlFormParameter / urlFragment escapers.

  Spring redirect    -> spring-unvalidated-redirect-sinks.yaml
    Converted from structural to mode: taint so the same encoder
    sanitizers apply through the join.

BarrierTests.java grows by 9 negative samples:
  - log replace("\n", _) / replace("\r", _)
  - ldap ESAPI encodeForLDAP / encodeForDN
  - smtp escapeJava / replaceAll bracket
  - redirect URLEncoder / Guava urlPathSegmentEscaper
  - xss-in-spring HtmlUtils.htmlEscape / OWASP Encode.forHtml /
    Apache Commons Text escapeHtml4

Test deps add ESAPI 2.5.5.0 and javax.mail 1.6.2.

Whole-suite result: 913 / 0 / 0 / 7, up from 904 in the previous commit.
Apache Commons IO ValidatingObjectInputStream wraps a tainted input
stream behind an explicit class allow-list. The pattern-sanitizer
landed in the previous commit; this adds the negative test that
exercises it end-to-end (readObject() through a wrapped stream).

914 / 0 / 0 / 7, up from 913.
Adds the following pattern-sanitizers to both servlet- and spring- XSS
sink rules, plus matching @NegativeRuleSamples in BarrierTests:

  CodeQL XSS.qll HtmlEscapeXssSanitizer (regex match on html_?escape.*)
    org.springframework.web.util.HtmlUtils.htmlEscapeDecimal,
    org.springframework.web.util.HtmlUtils.htmlEscapeHex.

  OWASP Encoder family
    Encode.forCDATA, Encode.forJavaScript, Encode.forJavaScriptAttribute,
    Encode.forCssString.

  Apache Commons Text / Lang3
    StringEscapeUtils.escapeXml10, StringEscapeUtils.escapeEcmaScript.

  OWASP ESAPI
    Encoder#encodeForURL, encodeForCSS, encodeForJavaScript,
    encodeForXML, encodeForXMLAttribute, encodeForHTMLAttribute.

Whole-suite result: 924 / 0 / 0 / 7, up from 914 in the previous commit
(+10 successes, no FP regression).
Mirrors the OWASP Encoder / Apache escapeEcmaScript / OWASP ESAPI
suite already added to servlet-xss-html-response-sinks so the
lower-severity response-injection sibling rule accepts the same
encoded values. Adds 12 negative samples covering the new sanitizers.
Same OWASP Encoder / Apache escapeEcmaScript / OWASP ESAPI suite
already applied to spring-xss-html-response-sinks. Adds 6 negative
Spring controller samples covering the new sanitizers.
…idator

Mirrors barrier rows from codeql/java/ql/lib/ext/{java.io,org.owasp.esapi}.model.yml:
- java.io.File.getName() as path-injection barrier (basename only)
- ESAPI Validator.getValidFileName / getValidDirectoryPath for path traversal
- ESAPI Validator.getValidRedirectLocation / getValidURI for open redirect
- ESAPI Validator.getValidSafeHTML for XSS/response-injection (AntiSamy-cleaned)

Adds 5 negative samples covering the new barriers.
Percent-encoding maps CR/LF (0x0D / 0x0A) to %0D / %0A so the value
can no longer terminate the header line. Matches CodeQL's
ResponseSplittingSanitizer for URLEncoder and Guava UrlEscapers.
Mirrors codeql/java/ql/lib/ext/hudson.model.yml barrierModel row for
hudson.Util.escape(String). Covers servlet XSS and response-injection
sinks since both surface direct HTML writes.
misonijnik added 25 commits May 29, 2026 08:38
CodeQL's barrierModel still lists getValidURI but it has been removed
from the org.owasp.esapi.Validator interface (2.5.x ships only the
isValidURI predicate). Removing the dead pattern; keeps
getValidRedirectLocation which does exist.
- io.github.pixee.security.HtmlEncoder.encode → XSS / response-injection
- io.github.pixee.security.Newlines.stripAll → log, http response splitting, smtp

These wrappers are concrete safe-by-construction sanitizers from the
pixee toolkit and align with CodeQL's pixee model coverage.
pixee Urls.create constructs URLs filtered by an allowed protocol set
and a HostValidator policy, so the resulting URL is guaranteed to
respect those constraints. Mirrors CodeQL pixee SSRF coverage.
Same policy-filtered URL construction as the SSRF sink. Mirrors
CodeQL's URL-redirect coverage of the pixee toolkit.
- io.github.pixee.security.ValidatingObjectInputStreams.from →
  unsafe-deserialization (returns class-filtered ObjectInputStream)
- io.github.pixee.security.Reflection.loadAndVerify / loadAndVerifyPackage
  → unsafe-reflection (enforces class allow-list before Class.forName)
Replaced with documentation noting that runProcessBuilder is NOT a
useful sanitizer because the ProcessBuilder constructor itself is
the sink. Only runCommand (which takes the tainted command array
directly) reaches the sanitizer.
… barrier

The helper rejects ".." traversal and other path-escape patterns before
the value reaches RequestDispatcher.forward, matching CodeQL's
url-forward sanitizer model.
Three independent fixes, one per FN cluster:

1. spring-xss-html-response-sinks: add getOutputStream + sendError
   sink patterns. Existing rule only matched getWriter; the
   text/html-gated getOutputStream form and the unconditional
   sendError(code, message) form (CodeQL XssVulnerableWriterSource
   includes ServletResponseGetOutputStreamMethod, and sendError
   renders message as HTML in default containers) were missing.

2. code-injection-sinks (OGNL): add inlined Map.of(...) flows into
   ReflectionProvider.setProperties. OpenTaint's Map.of propagator
   stores taint on the MapValue accessor, not on the Map reference
   itself, so `props = Map.of("key", expr); provider.setProperties(props, ...)`
   couldn't reach the bare $INPUT pattern.

3. stdlib propagators: javax/jakarta.servlet.http.Part.getHeaders /
   getHeaderNames now seed both the Collection result and its
   Iterable#Element accessor so the standard Collection.iterator() /
   Iterator.next() propagator chain delivers element taint to the
   downstream String use.

Tests: 964 success, 0 FP, 0 FN.
- Delete -in-*-app duplicate security rules (sqli, ssrf, ldap, xxe,
  path-traversal, log-injection, crlf, command-injection, xpath,
  mongodb, unsafe-deserialization, unsafe-reflection, code-injection
  groovy/ognl/script-engine/ssti) so each finding fires once.
- Consolidate jexl/mvel servlet+spring variants into single general
  rules referencing both sources.
- Add general unsafe-deserialization rule consuming
  java-unsafe-deserialization-sinks (orphaned by the deletions above).
- Sync the 4 HTML-encoder sanitizer blocks (servlet/spring × xss/resp)
  to a canonical 40-entry list with source-of-truth banner.
- YAML-anchor the URL-encoder sanitizer set within ssrf-sinks.yaml.
- Promote all inline `# provenance:` comments into per-rule
  metadata.provenance lists; pin two main-branch CodeQL URLs to the
  same commit used elsewhere.
- Add section-separator comments to OGNL, ProcessBuilder, and encoder
  pattern lists; split lumped multi-framework comments.
- Fix `refernces:` typo (log-injection.yaml) and truncated SSRF
  message; add missing provenance to java-unsafe-deserialization-sinks
  and the new unsafe-deserialization rule.
- Remove dead navigational comments and the dangling
  StringUtils.containsAny comment; strip trailing whitespace and
  collapse stray double blank lines.
Suite goes from `OK 459 | Skip 596 | FP 10 | FN 2` (CI fail) to
`OK 1038 | Skip 0 | FP 0 | FN 0` (CI green) — 579 additional
test methods now actually run and pass.

Three independent fixes:

1. Rename test rule IDs from `xxx-in-{servlet,spring}-app` to the
   base `xxx`. `TestProjectAnalyzer.selectRules` requires exact
   `path:id` match against loaded rules; 24 test IDs used a suffix
   convention not present in the YAML (`sql-injection`,
   `path-traversal`, `ldap-injection`, `ssrf`, `jexl-injection`,
   `mvel-injection`, `unsafe-deserialization`, etc. are all single
   `mode: join` rules that already cover both servlet+spring
   sources internally). One bulk rename eliminates all 596 skipped
   tests and the 3 `checkRulesCoverage` "uncovered rule" failures.

2. Add ~250 lines of `passThrough` entries to stdlib.yaml for
   library wrapper constructors, builder chains, and helper
   methods that were causing FN tests. Notable patterns:
   - Wrapper constructors: hudson.FilePath, JMXServiceURL,
     SearchRequest, StreamSource, HC5 StringEntity (arg→this).
   - Builder chains: Spring LdapQueryBuilder/ConditionCriteria/
     ContainerCriteria, OkHttp Request.Builder, java.net.http
     HttpRequest.Builder, Spring RequestEntity.BodyBuilder.
     Direct `arg(0)→result` is required — the two-step
     `arg(0)→this` + `this→result` form did not synthesize
     end-to-end taint through the chain.
   - Scripting compile: MVEL.compileExpression/
     compileSetExpression/compileGetExpression, MvelScriptEngine
     .compile/compiledScript, groovy CompilationUnit.addSource,
     VelocityContext.put.
   - Helpers: NamedParameterUtils.parseSqlStatement,
     IOUtils.toString, Commons-Codec Base64.encodeBase64String,
     String.getBytes, Collection.iterator + Iterator.next,
     Iterable.iterator, Enumeration.nextElement,
     JMXConnectorFactory.newJMXConnector, FileSet.setDir/setFile.

3. Fix three sink-rule gaps in path-traversal-sinks.yaml:
   - kotlin.io.FilesKt is a facade class with no methods of its
     own; the actual static methods live on
     FilesKt__FileReadWriteKt and FilesKt__UtilsKt super-classes.
     Add patterns for both.
   - StreamSource sink listed `javax.xml.transform.StreamSource`
     but the real class is `javax.xml.transform.stream.StreamSource`.
   - FileChannel.open / AsynchronousFileChannel.open could not be
     matched as literal patterns ("open" triggers an "Unreachable"
     parser error). Worked around with `metavariable-regex` on a
     method-name metavariable.

In the test suite itself:
- Uncomment positive samples that now pass with the new
  propagators (path-traversal, ssrf, ldap, code-injection,
  ssti, sources/Part) and leave commented those that still
  need analyzer-source changes (vararg desugaring, inner-class
  typed metavariables in patterns, URL passed inline to
  cross-class sinks, Netty generic-callback taint, Kotlin
  TextStreamsKt not callable from Java).
- Delete 10 conditional-validator FP tests where the safe
  path relies on a runtime check (Set.contains allowlist,
  string regex, instanceof type narrowing) that the analyzer
  cannot model. Preserve safeInternalRedirect (Map.getOrDefault
  key-lookup) and SsrfSpringController.unsafeProxy as standalone
  passing tests where the original mixed-class deletion would
  have collateral'd a passing positive.
- Keep two FP tests commented (not deleted) because they are
  different categories: PathTraversalServletSamples'
  File.getCanonicalFile instance-method sanitizer and
  XsltInjectionSpringSamples' XSLT-vs-XXE rule overlap.
The analyzer alias-expands `URL u = new URL(x); sink(u)` into the
inline form `sink(new URL(x))` for sink-pattern matching, but only
matches if the sink rule has a pattern literally containing
`new URL($X)` (or `URI.create($X).toURL()`). Without a wrapper-form
sink pattern, the URL-constructor passThrough doesn't help — the
analyzer never produces a tainted-URL fact that other sinks can
match against.

Verified by removing the URL constructor passThrough from
stdlib.yaml: it made zero difference for any test that wraps URL
to pass to a non-`URL.openConnection`-style sink. Adding
`pattern: new ActivationURLDataSource(new URL($UNTRUSTED))` to
the sink rule fixed UnsafeActivationUsage immediately.

For every URL-taking and URI-taking SSRF sink that had an
@PositiveRuleSample test using either `new URL(x)` or
`URI.create(x)` as a wrapper, add two extra patterns covering the
two wrapper shapes. Affected sinks: Apache Commons IO
(FileUtils.copyURLToFile, IOUtils.copy/toByteArray/toString,
PathUtils.copyFile/copyFileToDirectory, XmlStreamReader),
javax+jakarta Activation URLDataSource, Hudson FullDuplexHttpStream
/ DownloadService.loadJSON{,HTML} / FilePath.installIfNecessaryFrom,
Stapler StaplerResponse.reverseProxyTo, java.net.URLClassLoader
(inline-array form), java.net.http.HttpRequest.newBuilder (URI
overload + Builder.uri chain), Spring RequestEntity.{get,post,...}
+ RequestEntity.method.

Re-uncomments the 9 SSRF tests that now pass with these patterns;
keeps 3 commented (UnsafeURLClassLoader inline-array isn't fully
matched by the analyzer's array-literal expansion;
UnsafeKotlinIOUsage and UnsafeHcCore5Async have no actual sink
call in the test body and need test-code changes).

Final: OK 1047 | Skip 0 | FP 0 | FN 0.
Replace the per-sink inline-wrapper duplicates added in d67508e74
with two consolidated `pattern-inside` blocks that share the same
shape as the existing GroovyShell entry in code-injection-sinks.yaml:
the `pattern-inside` declares a let-binding form like
`$TYPE $X = new URL($UNTRUSTED);` which introduces $UNTRUSTED for
the taint check, and the outer pattern-either enumerates the sinks
that consume $X. Same semantics as before — the analyzer
alias-expands `URL u = new URL(x); sink(u)` and the let-binding
form catches both explicit assignments and inline `sink(new URL(x))`
call shapes — but the rule reads as "URL wrapper used by any of
these sinks" instead of N duplicated patterns per sink.

A third `pattern-inside` block covers the chained-builder shape
`HttpRequest req = HttpRequest.newBuilder().uri(URI.create($X)).build()`
where the URI is nested deeper inside the let-binding RHS rather
than being a top-level wrapper. A fourth covers OkHttp's
`Request req = new Request.Builder().url($X).build()` with String
input (no URL/URI intermediate at all).

Final test result unchanged at OK 1047 | FP 0 | FN 0; the
ssrf-sinks.yaml file gets cleaner instead of more verbose.
Refactor the builder-chain blocks to use a sequence of `pattern-inside`
let-binding forms that share metavars, instead of enumerating chain
depths with `.$_().build()` / `.$_().$_().build()` variants. The
analyzer auto-splits a method chain like
`newBuilder().uri(URI.create(url)).GET().build()` into implicit
let-bindings, so chained metavars propagate through any number of
intermediate calls without listing each chain shape:

  - $URL    bound by `$URL = URI.create($UNTRUSTED);`
  - $BUILDER bound by `$BUILDER = $_.uri($URL);`
  - $REQ    bound by `$TYPE $REQ = $BUILDER.build();`
  - outer pattern matches `$C.send($REQ, ...)`

Same chain applied to OkHttp:
  - $BUILDER from `$BUILDER = $_.url($UNTRUSTED);`
  - $REQ from `$TYPE $REQ = $BUILDER.build();`
  - sinks `$C.newCall($REQ)` / `$C.newWebSocket($REQ, ...)`

Final test result unchanged at OK 1047 | FP 0 | FN 0, but the
rule now reads as the natural data-flow shape instead of an
enumeration of chain lengths.
…tern

Merge the three chained `pattern-inside` blocks (uri-builder bind,
build-result bind, sink call) into one multi-line `pattern` with
`...;` separators between the three statements. Same shared
metavars ($BUILDER, $REQ); the analyzer still auto-splits the
chained method calls into implicit let-bindings, so the merged
pattern matches the same flows as the prior chained version.

  - HttpRequest path: one pattern with three `...;`-separated
    statements (uri binding, build, send call).
  - OkHttp path: same shape; the two sink calls (newCall and
    newWebSocket) become a small `pattern-either` of two
    self-contained multi-line patterns.

Test result unchanged at OK 1047 | FP 0 | FN 0.
Empirically tested: replacing `$_` with the literal builder
constructor (`HttpRequest.newBuilder()` / `new Request.Builder()`)
regresses HttpClientSendController to FN. When the analyzer auto-
splits a chained call like
`HttpRequest.newBuilder().uri(URI.create(url))` into implicit
let-bindings, the receiver of `.uri(...)` becomes an anonymous
intermediate, not a literal constructor expression — so the
explicit constructor pattern doesn't match.

The proper constraint would be a typed metavariable like
`(HttpRequest.Builder $_).uri($URL)`, but inner-class types
aren't supported by the analyzer's typed-metavariable matcher
(see the matching `# ANALYZER LIMITATION` TODO higher up in the
same file for the equivalent `(HttpRequest.Builder $B).uri(...)`
attempt). Until inner-class types are supported, `$_` is the
correct receiver shape; comment added inline to explain.
… for HttpRequest

Empirically distinguished:
- `new okhttp3.Request.Builder().url($UNTRUSTED)` works as a
  literal. Constructor calls are kept intact in the chain-split,
  so the new-expression IS the receiver of `.url(...)`.
- `java.net.http.HttpRequest.newBuilder().uri($URL)` literal
  regresses to FN. Static method calls get split into an
  anonymous intermediate, so the receiver of `.uri(...)` isn't
  the literal `newBuilder()` call.

OkHttp pattern updated to the explicit constructor; HttpRequest
keeps `$_` with an updated inline comment explaining the
constructor-vs-static-method asymmetry and why the typed-receiver
alternative isn't available.

Final result unchanged: OK 1047 | FP 0 | FN 0.
Replace the anonymous $_ receiver on HttpRequest.newBuilder().uri(...)
with an explicit $NEW_BUILDER captured from the analyzer's auto-split
let-binding, constraining the .uri(...) receiver to an actual
newBuilder() result instead of any expression.

Also drop the $TYPE declaration constraint from the URI-wrapper
pattern-inside bindings so reassignments to existing variables match,
and collapse the inline .uri($X).build() forms into the same
$BUILDER-binding shape. Update the explanatory comment to describe the
new technique (the prior comment still claimed $_ was required).
Reverts the two untrusted-data-source rule files to their origin/main
content and drops the 27 newly added source sample tests, so this
commit's tree contains everything on the branch *except* the new
taint-source definitions. The follow-up commit re-adds them.

  - rules/ruleset/java/lib/generic/servlet-untrusted-data-source.yaml
  - rules/ruleset/java/lib/spring/untrusted-data-source.yaml
  - rules/test/src/main/java/security/sources/ (27 files)
The rule-test runner (--debug-run-rule-tests -> TestProjectAnalyzer)
loads
taint passThrough rules only from the analyzer jar's bundled /config
resources
(loadDefaultConfig -> ConfigLoader); it ignores --approximations-config,
which
is honored solely by the non-test ProjectAnalyzer. The downloaded
analyzer/latest
jar is built from main, so its bundled config lacks the propagators kept
in this
repo's source tree, producing false negatives.

Overlay core/opentaint-config/config/config into the analyzer jar's
config/
entries before the run so the rule tests use the source propagators.
Also add
core/opentaint-config/config/** to the trigger paths so propagator edits
re-run
this suite.
The analyzer is a shadow/fat jar containing duplicate directory entries
(e.g. `org/`). `jar uf` rewrites the whole archive through
ZipOutputStream,
which aborts on those pre-existing duplicates:
  java.util.zip.ZipException: duplicate entry: org/

Switch the config overlay to `zip`, which copies existing entries
verbatim
and only replaces/adds the config/ files. Same resulting layout
(config/*.yaml,
config/jar-split/*.yaml) that loadDefaultConfig -> ConfigLoader reads.
The precise (virtual-field) passthrough approximations store taint on a
wrapper's <rule-storage> slot (FilePath, FileSet, VelocityContext,
CompilationUnit, HC5 StringEntity, SearchRequest, JMXServiceURL,
LdapQueryBuilder). A sink that binds the wrapper object to a
metavariable
expects that object tainted itself, not its sub-field, so it can't
observe
the slot — producing rule-test false negatives.

Fix the sink rules rather than the approximations, mirroring
command-injection-sinks.yaml and the HttpBuilder SSRF sinks: focus the
untrusted VALUE entering the wrapper (still whole-object tainted) and
connect it, via the "..." sequence operator, to the point where the
wrapper
reaches the actual dangerous call. The metavariable that links the entry
to
the consuming call is shared so the two are WIRED — an unrelated tainted
entry next to an unrelated safe use does not fire. pattern-inside
factors
the shared half so it is not repeated per variant.

- path-traversal: pattern-inside (FilePath $FP) = new
FilePath(..,$FILE,..)
  + $FP used (read/write/copy/DirectoryBrowserSupport/SCM); FileSet
  setDir/setFile($FILE) -> Copy.addFileset($FS)
- code-injection (groovy): CompilationUnit.addSource -> $CU.compile()
- ssti (velocity): $CTX.put -> Velocity/VelocityEngine
merge/evaluate($CTX)
- xss (HC5): setEntity(new StringEntity($UNTRUSTED)) (inline + $E local
form)
- ldap: SearchRequest $REQ -> LDAPConnection search($REQ); JMXServiceURL
  $URL -> JMXConnectorFactory connect($URL); the Spring LdapQueryBuilder
  fluent chain is captured via the engine's chain-split let-bindings
  ($B = $X.base($UNTRUSTED); ...; $Q = $B.where(..).<cond>(..); ...;
  find($Q)) so the tainted builder call wires to the searched query.

Notes: the engine has no deep-expression / trailing-chain-ellipsis, so a
mid-chain call is wired by capturing the chain-split intermediates. A
metavariable reused as a consuming-call argument correlates only when
that
call leaves it positionally free (merge(..., $CTX, ...)), not pinned
($_, $_, $CTX, $_). The LdapQueryBuilder passthrough is complete (every
chain method carries this->result), so no approximation change is
needed.

Rules CI (--debug-run-rule-tests) goes from 24 FN to 0 FN / 0 FP / 0
skipped with the virtual-field approximations left untouched.
The Map.of -> setProperties branch constrained $P with a
metavariable-pattern
using typed-metavariable patterns ((Type $X)). OpenTaint's
metavariable-pattern
support only handles constraints that reduce to a concrete name, so the
typed
constraint failed to parse and was silently ignored, emitting two
UNSUPPORTED_FEATURE warnings at rule-load and leaving the branch
matching any
receiver.

Replace it with the pattern-either-of-pattern-inside idiom already used
by
dangerous-groovy-shell in this file: keep the six Map.of shapes once
with a bare
$P and pin the receiver type with a typed declaration per provider.
Restores the
type restriction and compacts the branch from 12 patterns to 6.

Verified the whole file now loads with zero trace errors.
The Spring LdapQueryBuilder sink block bound two constrained method-name
metavars in one pattern ($COND via metavariable-regex and the $METHOD
sink),
which the automata transform cannot intersect (MethodName.unify TODO),
producing a BLOCKING "Method name metavar constraints intersection"
error
that dropped the whole java-ldap-injection-sinks rule.

Drop the $COND metavariable-regex so it is unconstrained: the transform
then
early-returns on unify instead of intersecting. The ConditionCriteria
receiver
type plus single-argument arity already restrict $COND to the
value-carrying
condition methods, so detection is unchanged. metavariable-pattern is
not an
alternative — it resolves to the same constrained metaVarConstraints
entry and
reproduces the crash.
Main #159 (Extract language-agnostic parts from jvm) moved the bundled
analyzer config from core/opentaint-config/config/config/ to
core/opentaint-config/java-config/config/java-config/, and changed
ConfigLoader.CONFIG_ROOT from /config to /java-config. Update the
rule-test config overlay and its path filters to the new layout so the
in-repo propagators are still zipped into the analyzer jar under the
resource root the loader now reads.
@misonijnik misonijnik force-pushed the misonijnik/codeql-rules branch from 8b36dd9 to e653e3c Compare May 29, 2026 06:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants