Skip to content

feat: add WDL module specification and symbolic imports#765

Open
claymcleod wants to merge 26 commits into
wdl-1.4from
feat/modules
Open

feat: add WDL module specification and symbolic imports#765
claymcleod wants to merge 26 commits into
wdl-1.4from
feat/modules

Conversation

@claymcleod

@claymcleod claymcleod commented Apr 23, 2026

Copy link
Copy Markdown
Collaborator

This introduces a module system to WDL in two parts: a new peer specification at modules/SPEC.md covering the ecosystem (manifest, resolution, lockfile, hashing, signing, credentials, registry), and language-level additions in SPEC.md (the from keyword and two new import forms).

The module specification defines:

  • module.json, the manifest format, including core fields, upstream tools provenance, and a dependencies object supporting version, tag, branch, commit, and local path selectors.
  • module-lock.json, a lockfile that pins the fully resolved tree with commit SHAs and content checksums.
  • A resolution algorithm describing how symbolic paths (<dep-name>[/<sub-path>]) map to modules, including tag-to-manifest version consistency and Go-style path-prefixed tags for multi-module repositories.
  • A content hashing scheme: a deterministic SHA-256 algorithm that serves both lockfile integrity and signature verification.
  • Module signing via optional Ed25519 signatures in module.sig, with trust-on-first-use recorded in the lockfile.
  • Credential management that delegates to Git credential helpers so private repositories work without a new credential store.
  • A registry: a community index at openwdl.github.io/registry that aggregates metadata without becoming a resolution dependency.

The language changes in SPEC.md:

  • Reserve from as a keyword in WDL 1.4 documents. Earlier version declarations continue to parse from as an ordinary identifier.
  • Add two new import forms alongside the existing one, for three forms total:
    1. import <source> [as <alias>] (alias <Old> as <New>)*, the existing namespaced form.
    2. import * from <source>, which brings every task, workflow, and user-defined type from <source> into the importing document's scope, with no namespace.
    3. import { <member> [as <Name>], ... } from <source>, which brings only the listed items, with an optional per-member as <Name> rename, also with no namespace.
  • Unify the two source styles: a <source> is either a quoted URI (resolved per Import URIs) or an unquoted symbolic module path of the form <dep>[/<sub-path>] (resolved through the consuming module's module.json). Once resolved, the two styles produce identical scoping in every form.

JSON Schemas for both module.json and module-lock.json ship under modules/schemas/ and validate the in-spec examples.

Full context: RFC discussion #700.

Before submitting this PR, please make sure:

  • You have added a few sentences describing the PR here.
  • You have considered whether the README.md or other documentation needs updating to account for these changes.
  • You have updated the CHANGELOG.md describing the change and linking back to your pull request.
  • You have read and agree to the CONTRIBUTING.md document.
  • You have considered adding or updating relevant example WDL tests to the specification.
    • See the guide for more details.

For OpenWDL team members:

  • Assign the appropriate individual to this PR.
  • Triage the PR and add appropriate labels.

Closes #226.
Closes #758.

@claymcleod claymcleod changed the base branch from wdl-1.3 to wdl-1.4 April 23, 2026 21:57
github-actions Bot and others added 2 commits April 23, 2026 17:02
Introduces a peer specification at `modules/SPEC.md` covering the
`module.json` manifest, `module-lock.json`, dependency resolution from
Git tags, content hashing, Ed25519 signing, and credential management.
Adds the `from` keyword and a new symbolic import form to `SPEC.md`
with symmetric namespacing rules for tasks, workflows, structs, and
enums. Remote URL imports are soft-deprecated. Ships JSON Schemas for
both the manifest and lockfile.

Closes #758. Closes #226.
Comment thread modules/SPEC.md Outdated
Comment thread modules/SPEC.md Outdated
Comment thread modules/SPEC.md Outdated
Comment thread modules/SPEC.md Outdated
Comment thread modules/SPEC.md Outdated
Comment thread modules/SPEC.md Outdated
Comment thread modules/SPEC.md Outdated
Comment thread modules/SPEC.md Outdated
Comment thread modules/SPEC.md Outdated
Comment thread modules/SPEC.md
A `module-lock.json` file is no longer required for every module. It is
required for modules whose consumers need reproducible builds, but may
be omitted by modules intended as libraries where version resolution is
deliberately left to the consumer. The spec now also states that
lockfiles apply only to the module they sit in: upstream lockfiles are
not consulted during downstream resolution, so consumers remain in
control of their transitive version choices.

Clarifies engine responsibility for verifying cached module content
against the lockfile and for keeping the lockfile consistent with the
resolved tree. Drops the now-stale "Lockfile as a specification
requirement" rationale bullet, and the "Duplicate dependencies over
conflict resolution" bullet whose premise doesn't carry its weight.
@claymcleod claymcleod changed the title Add WDL module specification and symbolic imports feat: add WDL module specification and symbolic imports Apr 23, 2026
@claymcleod claymcleod self-assigned this Apr 23, 2026
@claymcleod claymcleod requested a review from a team April 23, 2026 22:44
@claymcleod claymcleod added Z-specification-change (Metadata) An issue or PR related to a specification change. S05-in-progress (State) A task that is in progress. K-feature (Kind) A new feature request (for issues) or implementation (for PRs).. T-lang (Topic) Issues related to the syntax and semantic of the language itself. labels Apr 23, 2026
@claymcleod claymcleod marked this pull request as ready for review April 23, 2026 22:44
@claymcleod

claymcleod commented Apr 24, 2026

Copy link
Copy Markdown
Collaborator Author

@peterhuene and I have agreed to land a simpler model than the original eight-form proposal. Symbolic imports are interchangeable with quoted imports — the build system resolves a symbolic path to the equivalent of a quoted import. All three forms accept either source type with identical semantics.

The three forms:

  1. import <source> [as <alias>] (alias <Old> as <New>)* — the existing import. UDTs (structs, enums) enter the importing document's scope; tasks and workflows are accessible through a pseudo-namespace but are not exported. as <alias> renames the pseudo-namespace; alias <Old> as <New> renames a UDT. alias cannot rename tasks or workflows.
  2. import * from <source> — every task, workflow, and UDT enters scope.
  3. import { <member> [as <Name>], ... } from <source> — only the listed items enter scope.

<source> is either a quoted URI or a symbolic module path. Forms 2 and 3 don't accept a trailing as <alias> or alias clause.

What changed from the prior eight-form model:

  • No exported namespaces. The pseudo-namespace in form 1 still exists for accessing tasks and workflows, but it isn't part of the importing document's exported scope.
  • from works with quoted URIs too. import { foo } from "bar.wdl" parses, since symbolic and quoted sources have identical semantics.
  • Two imports both bringing baz into scope must be renamed explicitly:
    import "foo.wdl" alias baz as foo_baz
    import { baz as bar_baz } from "bar.wdl"

This model fits the existing WDL mental model (no new namespace concept to teach), is easier to implement, and parses with no lookahead beyond peek. We'll try implementing some packages in this mode and see whether it feels constraining; if it does, we'll come back and formalize exported namespaces.

@claymcleod

claymcleod commented Apr 27, 2026

Copy link
Copy Markdown
Collaborator Author

After much discussion, @peterhuene and I have agreed that this is the approach we're going to take.

  • We're going to implement a simpler version of modules that is closer to what WDL is today (see the ruleset below) in Sprocket and see how it goes. This should be easier for WDL users to understand (follows the existing rules without introducing concepts like namespaces) but may introduce strain on more complicated packages.
  • We'll try implementing some of our packages in this mode and see if it feels constraining. We'll probably also discuss with some other future package maintainers to see what their thought is. If necessary, we'll extend this PR to include formalizing the concept of namespaces (which exist today in WDL but only within a document—i.e., namespaces are not exported as part of the document's scope).

If we decide later on that namespaces are needed, then the directives to encapsulate members within a namespace (as/within on forms 2 and 3) will be introduced, and namespaces will be more fully built out in the proposal.


The rules for the simpler version we're going to try first are below.

  1. import "foo.wdl" follows the behavior that exists today.
    a. User-defined types (UDTs) like structs and enums are brought into the importing document's scope.
    b. Tasks and workflows are accessible in the document via a pseudo-namespace but that namespace is not actually brought in to the document's scope/exported.
  2. import * from "foo.wdl" imports all tasks, workflows, and UDTs into the document's scope (no namespace) and makes them exportable.
  3. import { bar, baz } from "foo.wdl" only imports the items bar and baz into the document's scope and also makes them exportable.

Other notables:

  • You can use as in the first form to rename the pseudo-namespace that items are available under (exists today).
  • You can use alias in the first form to rename members that get imported into the top-level document scope (the UDTs). Alias cannot be used to rename tasks or workflows (exists today).
  • You can use import { bar as baz } from "foo.wdl" to rename individual items that are pulled in using the new syntax. The is semantically equivalent to the as in the first form, as it renames the thing that immediately precedes the keyword (even though the subject is a pseudo-namespace in form 1 and an item in form 3).
  • This means that, if there is a conflict between two modules foo and bar that both import an item baz, a package maintainer's only choice is to alias imports in form 1/use as in form 3 to rename the items, e.g.
    • import "foo.wdl" alias baz as foo_baz, or
    • `import { baz as bar_baz } from "bar.wdl"
  • Symbolic imports are largely going to be interchangable with the existing quoted imports, with the symbolic part simply resolving to the equivalent of a quoted import through the build system. So all forms accept both quoted and symbolic imports.

@claymcleod claymcleod force-pushed the feat/modules branch 2 times, most recently from 0cf7f31 to c5db66c Compare April 27, 2026 18:29

@DavyCats DavyCats left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thanks for writing this up!

Comment thread SPEC.md
An import statement takes one of three forms. Every form accepts the same two source styles, and the source style affects only how the build system locates the imported document.

1. `import <source> [as <alias>] (alias <Old> as <New>)*`. User-defined types (structs and enums) from `<source>` are copied into the importing document's scope. Tasks and workflows from `<source>` are accessible only through the import's namespace, which defaults to the filename minus the `.wdl` extension for a quoted URI or to the last component of the path for a symbolic module path. `as <alias>` overrides the default namespace; `alias <Old> as <New>` renames a struct or enum as it is copied. `alias` cannot rename tasks or workflows. See [Fully Qualified Names & Namespaced Identifiers](#fully-qualified-names--namespaced-identifiers) for how the namespace is used.
2. `import * from <source>`. Every task, workflow, and user-defined type from `<source>` enters the importing document's scope. No namespace is introduced.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I'm not a huge fan of these kinds of imports. They obfuscate where the imported task/workflow is actually defined. I'm wondering how others feel about this.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used to feel this way when I was resolving them manually / by eye. But LSPs have mostly solved this issue for me, now I just click 'jump to definition' (also in e.g. python where they're also argued against for the same reason.)

@claymcleod claymcleod Apr 28, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah and it's a little tricky too because of the design decisions that were made prior in WDL (i.e., #160). It's actually a split situation today where import "foo.wdl" means:

  • Import all of the user-defined types (structs and enums) into the current document's scope, effectively losing the tether to where they came from, and
  • Make all of the tasks and workflows available at foo.<task>/foo.<workflow> but don't export those as part of the scope.

This asymmetry poses real challenges to making any sort of module system usable under the current syntax.

Additionally, WDL, as it stands today, has no real concept of a formal namespace (there is something referred to as a namespace in the spec, but it's not allowed to be passed between document's using their scope).

Short of introducing namespaces more formally (which has a whole other set of language implications), we had to find a new syntax that made sense to actually pull the tasks and workflows into the document's scope alongside the UDTs while still feeling WDL-like. This was the best we could come up with, but I'm open to other ways to express this.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that this can obfuscate what actually gets into the document, but It is a common pattern in most languages so I think a lot of people will expect this to exist.

Comment thread SPEC.md

1. `import <source> [as <alias>] (alias <Old> as <New>)*`. User-defined types (structs and enums) from `<source>` are copied into the importing document's scope. Tasks and workflows from `<source>` are accessible only through the import's namespace, which defaults to the filename minus the `.wdl` extension for a quoted URI or to the last component of the path for a symbolic module path. `as <alias>` overrides the default namespace; `alias <Old> as <New>` renames a struct or enum as it is copied. `alias` cannot rename tasks or workflows. See [Fully Qualified Names & Namespaced Identifiers](#fully-qualified-names--namespaced-identifiers) for how the namespace is used.
2. `import * from <source>`. Every task, workflow, and user-defined type from `<source>` enters the importing document's scope. No namespace is introduced.
3. `import { <member> [as <Name>], ... } from <source>`. Only the listed items enter the importing document's scope. A per-member `as <Name>` renames the selected item locally. A trailing comma after the last member is permitted. No namespace is introduced.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with how form 1 is explained, I guess ... should be *?

Suggested change
3. `import { <member> [as <Name>], ... } from <source>`. Only the listed items enter the importing document's scope. A per-member `as <Name>` renames the selected item locally. A trailing comma after the last member is permitted. No namespace is introduced.
3. `import { (<member> [as <Name>],)* } from <source>`. Only the listed items enter the importing document's scope. A per-member `as <Name>` renames the selected item locally. A trailing comma after the last member is permitted. No namespace is introduced.

Or maybe + if at least one needs to be present.

Or is ... supposed to have a separate functionality, like "also import everything else but don't rename it".

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sounds good, I'll clarify this.

Comment thread SPEC.md
Comment on lines +3945 to +3953
version 1.4

import "csvkit.wdl" # tasks via `csvkit` namespace; structs/enums in scope
import "csvkit.wdl" as csv # tasks via `csv` namespace; structs/enums in scope
import * from "csvkit.wdl" # tasks, workflows, structs, enums all in scope
import { CsvSort } from "csvkit.wdl" # only `CsvSort` in scope
import { CsvSort as MySort, CsvSortStable } from "csvkit.wdl" # `MySort` and `CsvSortStable` in scope
import openwdl/csvkit # tasks via `csvkit` namespace; structs/enums in scope
import { CsvSort } from openwdl/csvkit # only `CsvSort` in scope

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is missing as example of form 1's alias old as new syntax.

Symbolic sub-paths now resolve directly to `<module-folder>/<sub-path>.wdl` rather than naming a nested module's `module.json`. Each dependency points to exactly one module folder; consumers reach additional modules in the same repository by declaring them as separate dependencies with distinct `path` values. The recursive scan that registered every nested manifest in a source is removed.

The manifest's `index` field is renamed to `entrypoint`. A root module import resolves to the entrypoint file; a missing entrypoint surfaces an engine-specific error at import time.

A new `exclude` field accepts gitignore-style globs identifying files unreachable via symbolic import. `exclude` governs the public import surface only and does not affect content hashing, signing, or quoted within-module imports.

Sub-path components remain WDL identifiers, so `..` is rejected at parse time. The spec now states this as a normative no-escape guarantee that engines may rely on when performing sparse Git checkouts of dependency module folders.
claymcleod added a commit to stjude-rust-labs/sprocket that referenced this pull request May 1, 2026
Prepend the spec's domain-separation prefix `wdl-module-content\0v1\0`
to the SHA-256 input in `Hasher::finalize` per openwdl/wdl#765 commit
505e954, and rewrite `detects_path_content_boundary_collision` to
exercise the actual collision the length prefixes prevent (a file name
absorbing what would otherwise be its content's bytes), rather than
asserting determinism on two trivially-different file lists.

@patmagee patmagee left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking amazing! Round 1 of comments done.. I still have a lot to go :D

"required": ["git", "commit"],
"properties": {
"git": { "type": "string", "format": "uri" },
"commit": { "type": "string", "pattern": "^[0-9a-f]{40}$" },

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be sha to make the it explicit what the pattern is enforcing?

Comment thread SPEC.md
An import statement takes one of three forms. Every form accepts the same two source styles, and the source style affects only how the build system locates the imported document.

1. `import <source> [as <alias>] (alias <Old> as <New>)*`. User-defined types (structs and enums) from `<source>` are copied into the importing document's scope. Tasks and workflows from `<source>` are accessible only through the import's namespace, which defaults to the filename minus the `.wdl` extension for a quoted URI or to the last component of the path for a symbolic module path. `as <alias>` overrides the default namespace; `alias <Old> as <New>` renames a struct or enum as it is copied. `alias` cannot rename tasks or workflows. See [Fully Qualified Names & Namespaced Identifiers](#fully-qualified-names--namespaced-identifiers) for how the namespace is used.
2. `import * from <source>`. Every task, workflow, and user-defined type from `<source>` enters the importing document's scope. No namespace is introduced.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that this can obfuscate what actually gets into the document, but It is a common pattern in most languages so I think a lot of people will expect this to exist.

Comment thread modules/SPEC.md

## Module Directory Layout

A module is a directory containing a `module.json` manifest at its root and one or more `.wdl` files. There is no additional organizational construct—no workspace type, no grouping file, no hierarchy requirement.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the time being this format will work, but I am sure at some point the question of module distribution and packaging is going to come up. This will be needed to solve for issues like offline dependency resolution, caching and maybe the future central dependency managment layer

  • Python has wheels (.whl)
  • Java has jars (.jar)
  • Rust has crates (.crate)

these are all either zipped or gzipped folders that contain the package manifest and all other files.

I would propose a similar format and build a wdl module: .wdlm that is essentially just a zip file of the Module directory

Comment thread modules/SPEC.md

```
csvcut-wdl/
module.json

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does a module ship with it's own module-lock.json? If there are conflicts how are these resolved?

Comment thread modules/SPEC.md
1. Each file's relative path (computed per [Content Hashing](#content-hashing)) is treated as a sequence of `/`-separated components. The path must contain no `.` or `..` components, must not be absolute (no leading `/`, no Windows-style drive letter), and must not contain a null byte.
2. Symbolic links inside a module are permitted only if their targets, after full resolution, fall inside the module root. A symlink whose target escapes the module root makes the module invalid.
3. The names `module.json`, `module-lock.json`, and `module.sig` are reserved for files at the module root. A file with any of these names appearing at any other path within the module tree is a validation error.
4. A module that, when materialized on disk, would produce any path failing these rules is invalid; engines must refuse to load it and surface a clear error identifying the offending entry.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a file in a module references a relative path outside the module root directly (not in the dependencies) would the module be considered invalid?

More generally what do import statements within a module look like?

Comment thread modules/SPEC.md

- **`tag`** — a specific Git tag name (e.g., `"v1.2.0-rc1"`). Not subject to semver resolution.
- **`branch`** — a Git branch name. The resolved commit varies over time; the lockfile pins the exact commit at resolution time.
- **`commit`** — a Git commit SHA. Any prefix that uniquely identifies a commit in the source repository is accepted; the resolver expands the prefix to the full 40-character SHA at lock time and records the full SHA in `module-lock.json`. The most precise and immutable selector.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe change to sha?

Comment thread modules/SPEC.md
```json
{
"dependencies": {
"local_utils": { "path": "../../shared/utils" }

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the implication of this is that it breaks out of the module root. To that extent it feels like the module spec is doing double duty:

  1. defining a boundary for a reusable workflow
  2. defining the dependencies for a workflow that may or may not be a module

If we allow relative paths pointing outside of the current module root we will satisfy 2, but not 1. If we restrict to not allow that then we will only satisfy 1 and not 2

Comment thread modules/SPEC.md
"homepage": "https://csvkit.readthedocs.io/"
}
],
"dependencies": {}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we expand the dependencies section to include an example?

Comment thread modules/SPEC.md
import * from csvcut # tasks, workflows, and UDTs all in scope unqualified, no namespace
```

A module with multiple files curates its default surface by importing them from the entrypoint:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can an entrypoint wdl consist of ONLY imports and no workflow/task/enum etc?

Comment thread modules/SPEC.md

Sub-path resolution is a direct file lookup. Intermediate directories along the sub-path do not need to contain `module.json` files, and any nested `module.json` files that happen to be present along the way are ignored.

If the manifest's `exclude` field matches the path that resolution would otherwise read—either the entrypoint file for a root module import or `<sub-path>.wdl` for a sub-path import—the engine must treat the import as unresolvable and surface a missing-file error that names the path. The file's presence on disk is irrelevant; an excluded path is not part of the module's public import surface. Excluding the entrypoint therefore makes root module imports of that dependency fail while leaving non-excluded files reachable by sub-path. Quoted imports inside the module (e.g., the entrypoint's own `import "helper.wdl"`) are file-relative and are unaffected by `exclude`.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels like a weakly enforceable security feature. if the file exists they can just address it directly instead of resolving via the module

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

K-feature (Kind) A new feature request (for issues) or implementation (for PRs).. S05-in-progress (State) A task that is in progress. T-lang (Topic) Issues related to the syntax and semantic of the language itself. Z-specification-change (Metadata) An issue or PR related to a specification change.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement package specification and ecosystem Versioned import statements

4 participants