Skip to content

fix: repair AWS private manager handoff permissions#67

Open
alongubkin wants to merge 1 commit into
mainfrom
alon/alien-153-fix-aws-private-manager-setup-owned-security-group
Open

fix: repair AWS private manager handoff permissions#67
alongubkin wants to merge 1 commit into
mainfrom
alon/alien-153-fix-aws-private-manager-setup-owned-security-group

Conversation

@alongubkin
Copy link
Copy Markdown
Member

No description provided.

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Jun 2, 2026

Greptile Summary

This PR fixes a broken permission handoff for AWS private manager scenarios by splitting the ec2:RunInstances security-group grant in compute-cluster/management (so it correctly conditions on managed-by: setup rather than the previous managed-by: runtime which never matched the setup-owned boundary group), adding the missing ec2:RunInstances and ec2:CreateTags blocks to compute-cluster/provision that are required for Auto Scaling's dry-run validation, and granting ec2:AuthorizeSecurityGroupIngress/ec2:RevokeSecurityGroupIngress on the setup-owned compute security group in container/provision so Horizon-backed containers can punch ALB ingress holes.

  • compute-cluster/management: Separates the combined launch-template + security-group RunInstances statement into two tagged statements — one for runtime-owned launch templates (managed-by: runtime) and a new one for setup-owned boundary groups (managed-by: setup, resource: compute).
  • compute-cluster/provision: Adds the previously absent ec2:RunInstances permission blocks (runtime launch templates & security groups, instance creation, companion resources) and the ec2:CreateTags block needed by launch-template tag specifications during ASG creation.
  • container/provision: Adds a narrowly scoped ec2:AuthorizeSecurityGroupIngress/ec2:RevokeSecurityGroupIngress statement for the setup-owned compute security group; explicitly excludes ec2:DeleteSecurityGroup, validated by new tests.

Confidence Score: 4/5

The changes are safe to merge. The core permission fixes are correctly structured and well-tested; the only open question is whether the stack-wide ingress grant on container/provision is intentional for all container types or should be limited to Horizon-backed ones.

All three permission changes correctly address the stated root causes: the management security-group statement now matches the actual tag value on the setup-owned boundary group, provision gets the RunInstances grants that Auto Scaling dry-run requires, and container provision gets the narrowly scoped ingress permission with no delete. The test suite covers the new statements directly. The one architectural note — every container provision role receives the compute security-group ingress permission regardless of Horizon dependency — may be intentional but is not explicitly confirmed in the PR description or code comments.

container/provision.jsonc warrants a second look on the question of whether the setup compute security group ingress permission should be conditional on container type or left as a stack-wide grant.

Important Files Changed

Filename Overview
crates/alien-permissions/permission-sets/compute-cluster/management.jsonc Correctly splits the combined launch-template + security-group RunInstances statement into two separate, properly conditioned statements — fixing the root bug where the setup-owned boundary security group was being checked against a managed-by:runtime condition it could never satisfy.
crates/alien-permissions/permission-sets/compute-cluster/provision.jsonc Adds the previously missing ec2:RunInstances blocks (runtime launch templates + security groups, instance creation, companion resources) and ec2:CreateTags for ASG dry-run validation — all properly conditioned on managed-by:runtime.
crates/alien-permissions/permission-sets/container/provision.jsonc Adds AuthorizeSecurityGroupIngress/RevokeSecurityGroupIngress on the setup-owned compute security group for ALB-to-host connectivity. The resource binding does not further narrow by ${resourceName}, so every container resource in the stack receives this permission regardless of whether it is Horizon-backed.
crates/alien-permissions/tests/aws_runtime.rs Adds well-targeted tests verifying the new provision RunInstances permissions and container ingress grants, plus the split management security-group statement; condition_equals helper is a clean addition used throughout the file.
crates/alien-permissions/tests/aws_cloudformation.rs Adds CloudFormation-generator counterparts for the same new permissions; condition access is done inline rather than using a shared helper (unlike aws_runtime.rs), a minor style inconsistency but not a bug.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    subgraph Setup["Setup Phase (setup-owned resources)"]
        SG["ComputeCluster Security Group\nmanaged-by: setup, resource: compute"]
    end

    subgraph Provision["Provision Phase (provision role)"]
        LT["Runtime Launch Template\nmanaged-by: runtime"]
        RSG["Runtime Security Group\nmanaged-by: runtime"]
        ASG_P["Create ASG\nautoscaling:CreateAutoScalingGroup"]
        DRY_P["Dry-run RunInstances\nlaunch-template/* + security-group/*\n(managed-by: runtime)"]
    end

    subgraph Management["Management Phase (management role)"]
        ASG_M["Update/Refresh ASG\nautoscaling:UpdateAutoScalingGroup"]
        DRY_M_LT["RunInstances on launch-template/*\n(managed-by: runtime)"]
        DRY_M_SG["RunInstances on security-group/*\n(managed-by: setup, resource: compute) NEW"]
    end

    subgraph Container["Container Provision (container role)"]
        ALB["ALB Security Group\nmanaged-by: runtime"]
        INGRESS["Authorize/Revoke Ingress\non compute SG NEW"]
    end

    LT --> DRY_P
    RSG --> DRY_P
    DRY_P --> ASG_P

    LT --> DRY_M_LT
    SG --> DRY_M_SG
    DRY_M_LT --> ASG_M
    DRY_M_SG --> ASG_M

    ALB --> INGRESS
    INGRESS --> SG
Loading
Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
crates/alien-permissions/permission-sets/container/provision.jsonc:245-278
**Ingress permission applies to all containers, not just Horizon-backed ones**

The `resource` binding for this permission does not add an `aws:ResourceTag/${resourceTag}": "${resourceName}"` condition (unlike other resource-level grants in this file), so every container provision role in the stack receives `ec2:AuthorizeSecurityGroupIngress` and `ec2:RevokeSecurityGroupIngress` on the setup-owned compute security group — regardless of whether the container is actually Horizon-backed. A non-Horizon container's provision role could therefore add or remove ingress rules on the shared boundary security group, and there is no per-rule ownership condition available at the IAM level to prevent one container from revoking another container's ingress rules. This appears intentional given the comment, but worth an explicit acknowledgment that the permission is stack-wide rather than per-container.

### Issue 2 of 2
crates/alien-permissions/tests/aws_cloudformation.rs:157-271
**Condition lookup duplicated inline instead of using a shared helper**

The two new CloudFormation tests access `StringEquals` condition values via inline `and_then` chains. `aws_runtime.rs` introduces a `condition_equals` helper for exactly this purpose at the bottom of that file, but `aws_cloudformation.rs` does not share it. The duplication is minor now, but any future change to how conditions are represented would require updates in multiple places. Consider either moving `condition_equals` to `common` or duplicating it in this test module.

Reviews (1): Last reviewed commit: "fix: repair AWS private manager handoff ..." | Re-trigger Greptile

Comment on lines +245 to +278
// Containers own their ALB security group, but public Horizon-backed
// containers also need to add/remove one ingress rule on the setup-owned
// ComputeCluster security group so the ALB can reach host ports.
{
"label": "Manage setup compute security group ingress",
"grant": {
"actions": [
"ec2:AuthorizeSecurityGroupIngress",
"ec2:RevokeSecurityGroupIngress"
]
},
"binding": {
"stack": {
"resources": ["arn:aws:ec2:${awsRegion}:${awsAccountId}:security-group/*"],
"condition": {
"StringEquals": {
"aws:ResourceTag/${stackTag}": "${stackPrefix}",
"aws:ResourceTag/${resourceTag}": "compute",
"aws:ResourceTag/${managedByTag}": "setup"
}
}
},
"resource": {
"resources": ["arn:aws:ec2:${awsRegion}:${awsAccountId}:security-group/*"],
"condition": {
"StringEquals": {
"aws:ResourceTag/${stackTag}": "${stackPrefix}",
"aws:ResourceTag/${resourceTag}": "compute",
"aws:ResourceTag/${managedByTag}": "setup"
}
}
}
}
},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Ingress permission applies to all containers, not just Horizon-backed ones

The resource binding for this permission does not add an aws:ResourceTag/${resourceTag}": "${resourceName}" condition (unlike other resource-level grants in this file), so every container provision role in the stack receives ec2:AuthorizeSecurityGroupIngress and ec2:RevokeSecurityGroupIngress on the setup-owned compute security group — regardless of whether the container is actually Horizon-backed. A non-Horizon container's provision role could therefore add or remove ingress rules on the shared boundary security group, and there is no per-rule ownership condition available at the IAM level to prevent one container from revoking another container's ingress rules. This appears intentional given the comment, but worth an explicit acknowledgment that the permission is stack-wide rather than per-container.

Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/alien-permissions/permission-sets/container/provision.jsonc
Line: 245-278

Comment:
**Ingress permission applies to all containers, not just Horizon-backed ones**

The `resource` binding for this permission does not add an `aws:ResourceTag/${resourceTag}": "${resourceName}"` condition (unlike other resource-level grants in this file), so every container provision role in the stack receives `ec2:AuthorizeSecurityGroupIngress` and `ec2:RevokeSecurityGroupIngress` on the setup-owned compute security group — regardless of whether the container is actually Horizon-backed. A non-Horizon container's provision role could therefore add or remove ingress rules on the shared boundary security group, and there is no per-rule ownership condition available at the IAM level to prevent one container from revoking another container's ingress rules. This appears intentional given the comment, but worth an explicit acknowledgment that the permission is stack-wide rather than per-container.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 157 to 271
);
}

#[test]
fn test_aws_cloudformation_compute_management_can_use_setup_security_group() {
let generator = AwsCloudFormationPermissionsGenerator::new();
let permission_set =
get_permission_set("compute-cluster/management").expect("permission set exists");
let context = PermissionContext::new()
.with_stack_prefix("")
.with_aws_region("${AWS::Region}")
.with_aws_account_id("${AWS::AccountId}");

let result = generator
.generate_policy(permission_set, BindingTarget::Stack, &context)
.expect("Should generate AWS CloudFormation policy successfully");

let setup_security_group_statement = result
.statement
.iter()
.find(|statement| {
statement
.resource
.iter()
.any(|resource| resource == &json!({"Fn::Sub": "arn:${AWS::Partition}:ec2:${AWS::Region}:${AWS::AccountId}:security-group/*"}))
&& statement.action.contains(&json!("ec2:RunInstances"))
})
.expect("compute-cluster management should allow setup compute security group use");

let string_equals = setup_security_group_statement
.condition
.as_ref()
.and_then(|condition| condition.get("StringEquals"))
.expect("setup security group permission should be tag-conditioned");

assert_eq!(
string_equals.get("aws:ResourceTag/deployment"),
Some(&json!({"Fn::Sub": "${AWS::StackName}"}))
);
assert_eq!(
string_equals.get("aws:ResourceTag/managed-by"),
Some(&json!("setup"))
);
assert_eq!(
string_equals.get("aws:ResourceTag/resource"),
Some(&json!("compute"))
);
}

#[test]
fn test_aws_cloudformation_container_provision_can_manage_setup_compute_security_group_ingress() {
let generator = AwsCloudFormationPermissionsGenerator::new();
let permission_set = get_permission_set("container/provision").expect("permission set exists");
let context = PermissionContext::new()
.with_stack_prefix("")
.with_resource_name("alien-manager")
.with_aws_region("${AWS::Region}")
.with_aws_account_id("${AWS::AccountId}");

let result = generator
.generate_policy(permission_set, BindingTarget::Resource, &context)
.expect("Should generate AWS CloudFormation policy successfully");

let setup_compute_ingress_statement = result
.statement
.iter()
.find(|statement| {
if !statement
.action
.contains(&json!("ec2:AuthorizeSecurityGroupIngress"))
|| !statement
.action
.contains(&json!("ec2:RevokeSecurityGroupIngress"))
{
return false;
}

statement
.condition
.as_ref()
.and_then(|condition| condition.get("StringEquals"))
.is_some_and(|string_equals| {
string_equals.get("aws:ResourceTag/managed-by") == Some(&json!("setup"))
&& string_equals.get("aws:ResourceTag/resource") == Some(&json!("compute"))
})
})
.expect("container provision should manage ingress on setup compute security groups");

assert!(!setup_compute_ingress_statement
.action
.contains(&json!("ec2:DeleteSecurityGroup")));

let string_equals = setup_compute_ingress_statement
.condition
.as_ref()
.and_then(|condition| condition.get("StringEquals"))
.expect("setup compute security group ingress permission should be tag-conditioned");

assert_eq!(
string_equals.get("aws:ResourceTag/deployment"),
Some(&json!({"Fn::Sub": "${AWS::StackName}"}))
);
assert_eq!(
string_equals.get("aws:ResourceTag/managed-by"),
Some(&json!("setup"))
);
assert_eq!(
string_equals.get("aws:ResourceTag/resource"),
Some(&json!("compute"))
);
}

#[test]
fn test_aws_cloudformation_missing_actions_error() {
let generator = AwsCloudFormationPermissionsGenerator::new();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Condition lookup duplicated inline instead of using a shared helper

The two new CloudFormation tests access StringEquals condition values via inline and_then chains. aws_runtime.rs introduces a condition_equals helper for exactly this purpose at the bottom of that file, but aws_cloudformation.rs does not share it. The duplication is minor now, but any future change to how conditions are represented would require updates in multiple places. Consider either moving condition_equals to common or duplicating it in this test module.

Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/alien-permissions/tests/aws_cloudformation.rs
Line: 157-271

Comment:
**Condition lookup duplicated inline instead of using a shared helper**

The two new CloudFormation tests access `StringEquals` condition values via inline `and_then` chains. `aws_runtime.rs` introduces a `condition_equals` helper for exactly this purpose at the bottom of that file, but `aws_cloudformation.rs` does not share it. The duplication is minor now, but any future change to how conditions are represented would require updates in multiple places. Consider either moving `condition_equals` to `common` or duplicating it in this test module.

How can I resolve this? If you propose a fix, please make it concise.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant