Skip to content
View meet302001's full-sized avatar

Organizations

@aaocodekare

Block or report meet302001

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
meet302001/README.md

Hi, I'm Meet Bhanushali

DevOps / SRE Engineer building toward AI Infrastructure. Based in New York City, NY.

I run production infra for a social trading platform — 30+ EC2 nodes, 10+ microservices, EKS, Prometheus/Grafana/Loki, Vault, APISIX. Lately I've been going deep on GPU serving, LLM inference, and Kubernetes at scale because that's where the interesting problems are.

current_role:    DevOps/MLOps @ Traderware Inc.
focus:           LLM inference infra · GPU workload scheduling · EKS at scale
learning:        CKA · KV-cache-aware autoscaling · vLLM internals
open_to:         AI Infrastructure roles (Anthropic, CoreWeave, Together AI, Baseten, Replicate)

Stack I Run in Production

Orchestration & Compute EKS Kubernetes Helm Karpenter EC2 g5.xlarge (A10G) GPU Operator

LLM & GPU Infra vLLM Mistral 7B Flash Attention v2 CUDA Graphs DCGM Exporter NVIDIA NCA-AIIO

Observability Prometheus Grafana Loki Vector.dev Wazuh HIDS cAdvisor Node Exporter

Data & Storage Qdrant QuestDB Redis S3 EBS

Security & Platform HashiCorp Vault Keycloak APISIX cert-manager Semgrep Trivy

CI/CD & IaC GitHub Actions Terraform ArgoCD Self-hosted runners OIDC federation


What I'm Building

KV-Cache-Aware Pod Autoscaler for LLMs

Standard HPA scales on CPU/memory — useless for LLM inference where KV-cache pressure is the real bottleneck. Building a custom autoscaler that scales vLLM replicas based on cache hit rate, GPU memory utilization, and queued requests. Cuts cold-start cost while protecting tail latency. vLLM Custom Metrics API Prometheus Adapter EKS

GPU Workload Manager

Priority-based job queueing + cost-aware scheduling for mixed inference/training workloads on a shared GPU fleet. Spot-aware fallback. Per-team quota enforcement. Kubernetes Karpenter DCGM Custom Scheduler

LLM Inference & Eval Platform on EKS

End-to-end: vLLM serving Mistral 7B → RAGAS evaluation → MLflow tracking → ArgoCD GitOps. Models ship the same way services do. vLLM RAGAS MLflow ArgoCD EKS

Repos go live as I finish each milestone. Pinned below as they ship.


Certifications

AWS SAA AWS MLS AWS DEA Databricks NVIDIA CKA


GitHub

Meet's GitHub stats Top Languages


Always down to talk GPU schedulers, inference serving, or why your Prometheus is OOMing again.

Pinned Loading

  1. ecr-lifecycle ecr-lifecycle Public

    Shell

  2. gpu-inference-eks gpu-inference-eks Public

    Shell