Skip to content

Add cloudformation_<verb>_stack functions#150

Open
carlopi wants to merge 14 commits into
duckdb:mainfrom
carlopi:cloudformation_xxx_stack
Open

Add cloudformation_<verb>_stack functions#150
carlopi wants to merge 14 commits into
duckdb:mainfrom
carlopi:cloudformation_xxx_stack

Conversation

@carlopi

@carlopi carlopi commented May 27, 2026

Copy link
Copy Markdown
Member

Those allows creation, listing, check readiness, and deletion of AWS's cloudformation stacks.

Example, end to end:

SET VARIABLE h = (SELECT handle FROM cloudformation_create_stack(
               'https://test-wasm-carlo.s3.us-east-1.amazonaws.com/deploy-quack.yaml',
               NULL, MAP { 'region': 'us-east-1' }));
FROM cloudformation_describe_stack(getvariable('h'));
---- wait for CREATE_COMPLETED
---- do arbitrary operations
FROM cloudformation_delete_stack(getvariable('h'))

or listing capabilities (for a given region) like:

FROM cloudformation_list_stacks('us-east-1');

Note: there is at the moment no implicit state management of created resources. They can be inserted in a table or similar, but this is not (for the moment) automated or facilitated by duckdb. The source of truth remains listing the actual catalog.

Notes that these functions are at the moment not stable, and meant to change over time.

@carlopi

carlopi commented May 28, 2026

Copy link
Copy Markdown
Member Author

One open question is: what's the correct default set of tags.

@guillesd points to reducing to:

    set_tag("created-by", "duckdb-aws");
    set_tag("created-by-version", DUCKDB_AWS_GIT_SHA);
    set_tag("duckdb-version", DuckDB::LibraryVersion());

and making sure you can bring your own (it should be the case, but that's not really tested)

Another open option is using DuckDBUserAgent (instead of LibraryVersion).

I think this area can also be iterated later, but having an OK baseline would make this smoother.

@Tmonster Tmonster left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a couple of comments

Can we also put this in the folder src/functions/cloud_formation/cloudformation_functions.cpp ?

I can see a world where we continue to add aws-sdk dependencies, and people want to take them out, and we should set up an easy way to do that with folders etc

Comment thread src/cloudformation_functions.cpp Outdated
Comment thread src/cloudformation_functions.cpp Outdated
Comment thread src/cloudformation_functions.cpp
Comment thread src/cloudformation_functions.cpp Outdated
set_tag("stack-name", metadata_stack_name);
}
for (auto &kv : data.tags_override) {
set_tag(kv.key, kv.value);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we disallow overriding some of the tags? version, created-by, duckdb-session-id seem like they should keep the values DuckDB gives them.
Maybe created-by could be overridden with an extension config?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea was: explicit intent is always respected

values.emplace_back(stack_id);
keys.emplace_back("region");
values.emplace_back(region);
auto handle = Value::MAP(LogicalType::VARCHAR, LogicalType::VARCHAR, std::move(keys), std::move(values));

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you enlighten me as to why a single column MAP was chosen for the return type?

@carlopi carlopi Jun 22, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll rename the functions to the utility ones, idea of a MAP is that it's a pass-through that's not opinionated by what's passed around. I'll need to elaborate on this.

return;
}

auto provider = BuildAwsCredentialsProvider("", /*require_credentials=*/true);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I've seen the /*requires_credentials*/ now a couple of times, why is the block comment here?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/*argument_name*/ is a convention expecially around boolean values, since they are harder to remember what true / false means in the context of the caller on its own.

}

idx_t remaining = data.rows.size() - data.cursor;
idx_t to_emit = std::min(remaining, (idx_t)STANDARD_VECTOR_SIZE);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like in this function we just send all the API calls to describe the stacks, then we start returning them. Will it be difficult to just return the ones from the request, then on the next chunk send any request and return those?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but I would prefer to do that as follow up.

carlopi added 14 commits June 22, 2026 11:10
Initial commit lifting the cfn_* functions from the earlier prototype
branch onto a clean main. Function names and shape are unchanged from
the prototype; rename + outputs-fold + auto-tags + session-id follow as
separate commits in this PR.

- vcpkg.json: add cloudformation to aws-sdk-cpp features
- CMakeLists.txt: add cloudformation to find_package COMPONENTS;
  add src/cfn_functions.cpp to EXTENSION_SOURCES
- src/include/aws_client.hpp: new shared header with
  BuildClientConfigWithCa + BuildAwsCredentialsProvider declarations
- src/aws_secret.cpp: extract BuildClientConfigWithCa to external
  linkage; add BuildAwsCredentialsProvider factory wrapping the
  existing DuckDBCustomAWSCredentialsProviderChain
- src/aws_extension.cpp: register CfnFunctions in LoadInternal
- src/cfn_functions.cpp + src/include/cfn_functions.hpp:
  cfn_create_stack, cfn_describe_stack, cfn_outputs, cfn_delete_stack
- test/sql/cfn_functions.test: bind-time error smoke tests
Function names now mirror the AWS service name (cloudformation), matching
how boto3 (boto3.client('cloudformation')) and the AWS CLI
(aws cloudformation ...) refer to it. No behavior change, pure rename.

- src/cfn_functions.{cpp,hpp} -> src/cloudformation_functions.{cpp,hpp}
- test/sql/cfn_functions.test -> test/sql/cloudformation_functions.test
- cfn_create_stack    -> cloudformation_create_stack
- cfn_describe_stack  -> cloudformation_describe_stack
- cfn_outputs         -> cloudformation_outputs
- cfn_delete_stack    -> cloudformation_delete_stack
- CfnFunctions        -> CloudFormationFunctions (and all internal Cfn* types)
- Fallback autogen stack-name prefix:
  "cfn-stack" -> "duckdb-aws" (consistent with the planned
  created-by tag value).
cloudformation_describe_stack now returns an extra `outputs` column
(MAP(VARCHAR, VARCHAR)) populated from stack.GetOutputs(). The standalone
cloudformation_outputs function is removed - it was a second
DescribeStacks call returning a subset of the same information.

Callers that previously did
    SELECT outputs FROM cloudformation_outputs(handle);
now write
    SELECT outputs FROM cloudformation_describe_stack(handle);
- same shape, one fewer function, one fewer round-trip if you want
status + outputs in one query.
Every stack cloudformation_create_stack produces now carries provenance
tags identifying the duckdb-aws extension that created it, the DuckDB
host that ran the call, the extension's git short SHA, and a stable
per-process session id. Caller-supplied tags via the `tags := ...` named
parameter merge on top and override on key collision.

Auto-tags applied:
  created-by         = "duckdb-aws"
  created-by-version = <DUCKDB_AWS_GIT_SHA, baked at build time>
  duckdb-version     = DuckDB::LibraryVersion()
  managed-by         = "duckdb-aws"
  duckdb-session-id  = <per-process random hex from ShortRandHex()>
  stack-name         = <Metadata.StackName>   (only when template has it)

Also expose duckdb_aws_session_id() as a scalar SQL function so callers
can scope queries / cleanups by the running session, e.g.:
    cloudformation_destroy_all(
        tag_filter := MAP {'duckdb-session-id': duckdb_aws_session_id()})

The session id is currently process-scoped and extension-specific.
A future DuckDB-core session primitive could replace the local
implementation without changing the SQL surface.
cloudformation_describe_stack gains three columns:
  - region                 (from the input handle, mirrors list's shape)
  - last_updated_time      (from Stack.LastUpdatedTime, nullable)
  - description            (from Stack.Description, nullable)

The 8-column common prefix (region, stack_name, stack_id, status,
status_reason, creation_time, last_updated_time, description) now matches
what cloudformation_list_stacks will return. Describe's 9th column,
outputs MAP, remains describe-only because ListStacks doesn't surface
outputs.

cloudformation_delete_stack now echoes the input handle MAP byte-for-byte
instead of returning just stack_id - any extra keys the caller added
(annotations, timestamps, custom metadata) survive the call. Composes
naturally with create's handle output for audit/log patterns:
  INSERT INTO deletions (handle) SELECT handle FROM
    cloudformation_delete_stack(getvariable('h'));
Adds `tags MAP(VARCHAR, VARCHAR)` as a new column between `description`
and `outputs`. AWS's DescribeStacks already returns Tags on the Stack
object - no extra API call.

Useful for verifying that the auto-tags applied by cloudformation_create_stack
(created-by=duckdb-aws, created-by-version, duckdb-version, managed-by,
duckdb-session-id, stack-name) actually landed:

    SELECT tags FROM cloudformation_describe_stack(getvariable('h'));

And for ad-hoc tag-based queries until cloudformation_destroy_all and
friends arrive:

    SELECT stack_name FROM cloudformation_describe_stack(getvariable('h'))
    WHERE tags['created-by'] = 'duckdb-aws';

ListStacks's StackSummary doesn't include tags, so this column stays
describe-only - same pattern as `outputs`.
Lists every CFN stack in a region. Pagination handled internally
(follows NextToken until the result is exhausted). Same 8-column
identity-and-state prefix as cloudformation_describe_stack, minus
the describe-only `outputs` and `tags` columns (ListStacks's
StackSummary doesn't return them).

  cloudformation_list_stacks(
      [region        := VARCHAR]              -- defaults to AWS_REGION
                                              --  / AWS_DEFAULT_REGION
      [status_filter := LIST<VARCHAR>])       -- passes through to AWS's
                                              --  native StackStatusFilter
   -> (region, stack_name, stack_id, status, status_reason,
       creation_time, last_updated_time, description)

Region resolution: explicit `region :=` wins; otherwise AWS_REGION,
then AWS_DEFAULT_REGION. If none resolves, errors out clearly.

The `region` output column is the same value across every row in a
single call - convenient when UNIONing across regions:

    SELECT * FROM cloudformation_list_stacks(region := 'us-east-1')
    UNION ALL
    SELECT * FROM cloudformation_list_stacks(region := 'eu-west-1');

status_filter values are validated against the SDK's StackStatus enum
at bind time; unknown values surface a clear error rather than reaching
AWS.
@carlopi carlopi force-pushed the cloudformation_xxx_stack branch from bd972fd to 50d4425 Compare June 22, 2026 11:19
@carlopi carlopi changed the base branch from v1.5-variegata to main June 22, 2026 11:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants