Skip to content

Fix NodeNames aggregation in AWS jobs #3843

@ronaldngounou

Description

@ronaldngounou

Context

In the GCE jobs, the nodes are given well-defined names (like “us-east-1-a”) and so the per-node metrics have well-defined labels, so they can easily be compared across days. Whereas in the AWS job, the nodes seem to have randomly-assigned names, and so the metrics labels end up being different every day so we can't compare them as easily.

Example:

  • NodeName in GCE test: name: control-plane-us-east1-b-jlvv, control-plane-us-east1-b-smv6 GCE tests truncate the last 4 characters when aggregating.
  • NodeName in AWS test have different namings. They contain the instance-id in the name.

What you expected to happen:
The node metrics don't have consistent trends over time.

As a result, overtime GCE tests have a trend

https://perf-dash.k8s.io/#/?jobname=gce-5000Nodes&metriccategoryname=E2E&metricname=LoadResources&PodName=kube-proxy-nodes-us-east1-b%2Fkube-proxy&Resource=memory
Image

AWS jobs don't have a trend because they are using different node names.
https://perf-dash.k8s.io/#/?jobname=aws-5000Nodes&metriccategoryname=E2E&metricname=LoadResources&PodName=kube-proxy-i-094de2c600c60d0bb%2Fkube-proxy&Resource=memory

Image

Context in sig-scalability: https://kubernetes.slack.com/archives/C09QZTRH7/p1771436311523919

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions