Add cloudformation_<verb>_stack functions#150
Conversation
|
One open question is: what's the correct default set of tags. @guillesd points to reducing to: and making sure you can bring your own (it should be the case, but that's not really tested) Another open option is using DuckDBUserAgent (instead of LibraryVersion). I think this area can also be iterated later, but having an OK baseline would make this smoother. |
Tmonster
left a comment
There was a problem hiding this comment.
Looks good, just a couple of comments
Can we also put this in the folder src/functions/cloud_formation/cloudformation_functions.cpp ?
I can see a world where we continue to add aws-sdk dependencies, and people want to take them out, and we should set up an easy way to do that with folders etc
| set_tag("stack-name", metadata_stack_name); | ||
| } | ||
| for (auto &kv : data.tags_override) { | ||
| set_tag(kv.key, kv.value); |
There was a problem hiding this comment.
should we disallow overriding some of the tags? version, created-by, duckdb-session-id seem like they should keep the values DuckDB gives them.
Maybe created-by could be overridden with an extension config?
There was a problem hiding this comment.
Idea was: explicit intent is always respected
| values.emplace_back(stack_id); | ||
| keys.emplace_back("region"); | ||
| values.emplace_back(region); | ||
| auto handle = Value::MAP(LogicalType::VARCHAR, LogicalType::VARCHAR, std::move(keys), std::move(values)); |
There was a problem hiding this comment.
Can you enlighten me as to why a single column MAP was chosen for the return type?
There was a problem hiding this comment.
I'll rename the functions to the utility ones, idea of a MAP is that it's a pass-through that's not opinionated by what's passed around. I'll need to elaborate on this.
| return; | ||
| } | ||
|
|
||
| auto provider = BuildAwsCredentialsProvider("", /*require_credentials=*/true); |
There was a problem hiding this comment.
nit: I've seen the /*requires_credentials*/ now a couple of times, why is the block comment here?
There was a problem hiding this comment.
/*argument_name*/ is a convention expecially around boolean values, since they are harder to remember what true / false means in the context of the caller on its own.
| } | ||
|
|
||
| idx_t remaining = data.rows.size() - data.cursor; | ||
| idx_t to_emit = std::min(remaining, (idx_t)STANDARD_VECTOR_SIZE); |
There was a problem hiding this comment.
Looks like in this function we just send all the API calls to describe the stacks, then we start returning them. Will it be difficult to just return the ones from the request, then on the next chunk send any request and return those?
There was a problem hiding this comment.
No, but I would prefer to do that as follow up.
Initial commit lifting the cfn_* functions from the earlier prototype branch onto a clean main. Function names and shape are unchanged from the prototype; rename + outputs-fold + auto-tags + session-id follow as separate commits in this PR. - vcpkg.json: add cloudformation to aws-sdk-cpp features - CMakeLists.txt: add cloudformation to find_package COMPONENTS; add src/cfn_functions.cpp to EXTENSION_SOURCES - src/include/aws_client.hpp: new shared header with BuildClientConfigWithCa + BuildAwsCredentialsProvider declarations - src/aws_secret.cpp: extract BuildClientConfigWithCa to external linkage; add BuildAwsCredentialsProvider factory wrapping the existing DuckDBCustomAWSCredentialsProviderChain - src/aws_extension.cpp: register CfnFunctions in LoadInternal - src/cfn_functions.cpp + src/include/cfn_functions.hpp: cfn_create_stack, cfn_describe_stack, cfn_outputs, cfn_delete_stack - test/sql/cfn_functions.test: bind-time error smoke tests
Function names now mirror the AWS service name (cloudformation), matching
how boto3 (boto3.client('cloudformation')) and the AWS CLI
(aws cloudformation ...) refer to it. No behavior change, pure rename.
- src/cfn_functions.{cpp,hpp} -> src/cloudformation_functions.{cpp,hpp}
- test/sql/cfn_functions.test -> test/sql/cloudformation_functions.test
- cfn_create_stack -> cloudformation_create_stack
- cfn_describe_stack -> cloudformation_describe_stack
- cfn_outputs -> cloudformation_outputs
- cfn_delete_stack -> cloudformation_delete_stack
- CfnFunctions -> CloudFormationFunctions (and all internal Cfn* types)
- Fallback autogen stack-name prefix:
"cfn-stack" -> "duckdb-aws" (consistent with the planned
created-by tag value).
cloudformation_describe_stack now returns an extra `outputs` column
(MAP(VARCHAR, VARCHAR)) populated from stack.GetOutputs(). The standalone
cloudformation_outputs function is removed - it was a second
DescribeStacks call returning a subset of the same information.
Callers that previously did
SELECT outputs FROM cloudformation_outputs(handle);
now write
SELECT outputs FROM cloudformation_describe_stack(handle);
- same shape, one fewer function, one fewer round-trip if you want
status + outputs in one query.
Every stack cloudformation_create_stack produces now carries provenance
tags identifying the duckdb-aws extension that created it, the DuckDB
host that ran the call, the extension's git short SHA, and a stable
per-process session id. Caller-supplied tags via the `tags := ...` named
parameter merge on top and override on key collision.
Auto-tags applied:
created-by = "duckdb-aws"
created-by-version = <DUCKDB_AWS_GIT_SHA, baked at build time>
duckdb-version = DuckDB::LibraryVersion()
managed-by = "duckdb-aws"
duckdb-session-id = <per-process random hex from ShortRandHex()>
stack-name = <Metadata.StackName> (only when template has it)
Also expose duckdb_aws_session_id() as a scalar SQL function so callers
can scope queries / cleanups by the running session, e.g.:
cloudformation_destroy_all(
tag_filter := MAP {'duckdb-session-id': duckdb_aws_session_id()})
The session id is currently process-scoped and extension-specific.
A future DuckDB-core session primitive could replace the local
implementation without changing the SQL surface.
cloudformation_describe_stack gains three columns:
- region (from the input handle, mirrors list's shape)
- last_updated_time (from Stack.LastUpdatedTime, nullable)
- description (from Stack.Description, nullable)
The 8-column common prefix (region, stack_name, stack_id, status,
status_reason, creation_time, last_updated_time, description) now matches
what cloudformation_list_stacks will return. Describe's 9th column,
outputs MAP, remains describe-only because ListStacks doesn't surface
outputs.
cloudformation_delete_stack now echoes the input handle MAP byte-for-byte
instead of returning just stack_id - any extra keys the caller added
(annotations, timestamps, custom metadata) survive the call. Composes
naturally with create's handle output for audit/log patterns:
INSERT INTO deletions (handle) SELECT handle FROM
cloudformation_delete_stack(getvariable('h'));
Adds `tags MAP(VARCHAR, VARCHAR)` as a new column between `description`
and `outputs`. AWS's DescribeStacks already returns Tags on the Stack
object - no extra API call.
Useful for verifying that the auto-tags applied by cloudformation_create_stack
(created-by=duckdb-aws, created-by-version, duckdb-version, managed-by,
duckdb-session-id, stack-name) actually landed:
SELECT tags FROM cloudformation_describe_stack(getvariable('h'));
And for ad-hoc tag-based queries until cloudformation_destroy_all and
friends arrive:
SELECT stack_name FROM cloudformation_describe_stack(getvariable('h'))
WHERE tags['created-by'] = 'duckdb-aws';
ListStacks's StackSummary doesn't include tags, so this column stays
describe-only - same pattern as `outputs`.
Lists every CFN stack in a region. Pagination handled internally
(follows NextToken until the result is exhausted). Same 8-column
identity-and-state prefix as cloudformation_describe_stack, minus
the describe-only `outputs` and `tags` columns (ListStacks's
StackSummary doesn't return them).
cloudformation_list_stacks(
[region := VARCHAR] -- defaults to AWS_REGION
-- / AWS_DEFAULT_REGION
[status_filter := LIST<VARCHAR>]) -- passes through to AWS's
-- native StackStatusFilter
-> (region, stack_name, stack_id, status, status_reason,
creation_time, last_updated_time, description)
Region resolution: explicit `region :=` wins; otherwise AWS_REGION,
then AWS_DEFAULT_REGION. If none resolves, errors out clearly.
The `region` output column is the same value across every row in a
single call - convenient when UNIONing across regions:
SELECT * FROM cloudformation_list_stacks(region := 'us-east-1')
UNION ALL
SELECT * FROM cloudformation_list_stacks(region := 'eu-west-1');
status_filter values are validated against the SDK's StackStatus enum
at bind time; unknown values surface a clear error rather than reaching
AWS.
bd972fd to
50d4425
Compare
Those allows creation, listing, check readiness, and deletion of AWS's
cloudformationstacks.Example, end to end:
or listing capabilities (for a given region) like:
Note: there is at the moment no implicit state management of created resources. They can be inserted in a table or similar, but this is not (for the moment) automated or facilitated by duckdb. The source of truth remains listing the actual catalog.
Notes that these functions are at the moment not stable, and meant to change over time.