Skip to content

sodrooome/fastqueue-playwright

Repository files navigation

fastqueue-playwright

integration-tests

Internal tooling that's being developed at MauKerja Malaysia, which is used for distributed workload scheduling across different teams to orchestrate massive Playwright end-to-end UI automation tests for MauKerja Malaysia/Ricebowl Malaysia/internal dashboard platforms and such

Decided to make it as an open-source with limited constraints after a round of discussion so it won't be disclosed all the critical functionality

Background Problems

At high-level, we encountered some a constraints and inertia causing our QA engineering team will have a long awaited queue when run their automation testing because shared and limited resources on our self-hosted runer. It also included with these conditions:

  • We have a hundred (or more less) E2E flows
  • limited concurrency resources
  • competing test execution demand
  • multiple teams with heavy-system using Playwright

The problem starts shifting from "How do we automate tests" into "How do we orchestrate test execution efficiently"

Conceptually

Our testing infrastructure at gist, can be displayed somewhat like this (and it's pretty straightforward):

flowchart LR
  A(Github CI)-->B[Shared runners]
  B-->C[Playwright Workers]
  C-->D[Browser Instances]
  D-->E[Target Environment]
Loading

The major bottleneck and pain point is not Playwright or our UI automation frameworks, but somewhat it's more shifted towards runner contention, browser resource exhaustion which is caused a website crash during automation running, environment instability leading towards flaky retries (which is supposed never meant to be conceited) and massive imbalance test suite distribution

Imagine, we would run on the daily basis

1. 100 E2E flows
2. with average 1 minutes/test suites
3. shared runners
4. Teams need deploy immediately

Total pipeline that's executed for the whole E2E flows approximately lasted around ~1-2 hours, which is not very ideal situation if we're going to run with multiple teams and test against different platforms (different test scenarios, environment and such). Thus, we're conducting a kind of experiment around "Queue-driven allocation"

Practically speaking, it might be improved:

  • Test distribution fairness
  • Test allocation throughput
  • Test executionn predictability

Instead using naive scheduling that we had right now and running everything sequentially, we were trying to tweaking and improving which are revolves around:

  • discover the tests directories
  • enqueue them first
  • workers pull jobs dynamically
  • and each workers launches a batches with Playwright

Now, the testing infrastructure would become different and somewhat looks like this:

flowchart LR
  A[Test Discovery] --> B[fastqueue-playwright]
  B--Worker 1-->C[Playwright]
  B--Worker 2-->D[Playwright]
  B--Worker 3-->E[Playwright]
  
Loading

Why not using Playwright native workers?

The biggest difference was: Playwright workers optimize test execution inside a single Playwright test run, while this experiment that we'd conducted optimizes system-wide scheduling based on limited resource coordination. Those are somehwat still related, but fundamentally speaking it's on different layers.

When we run playwright, let's say with 5 workers, the Playwright internally would: (1) discover our test specs (2) and then the partitions start working (3) spawn a worker process (4) executes tests in parallel mode. And this is eventually excellent for local parallelism and straightforward CI pipelines

However, when you were trying to get the scope on much more higher level, there will be a hard limitation if we're still rely on the pure Playwright workers, since its only sees its own execution scope

It doesn't understand about:

  • Other teams' pipeline
  • Shared runner pressure
  • Global resource contention
  • Cross-project/team urgency or prioritization
  • Different targeted environment

Thus, when all of these resurfaced, it becomes the real and actual problems that we've encountered so far. To put it simply, the typical behavior can be considered like this:

Without queue-driven allocation

CI starts
   ↓
Playwright spawns N workers
   ↓
All tests suites begin aggressively

With queue-driven allocation

CI starts
   ↓
Tests enter centralized queue
   ↓
Workers pull dynamically
   ↓
Concurrency controlled globally
   ↓
Priorities test suites

Run the orchestrator

It will be pretty straightforward to run the orchestrator, but of course you need to clone the repo and install the dependencies first, and then you can run the orchestrator suite with:

npx ts-node src/orchestrator.ts

Assumptions: the test suites that are going to be orchestrated on your end is located under ./tests directory and the test files are named with *.spec.ts pattern

If you'd want to limit concurrency, you can set the WORKERS variable on the global scheduler limit on the orchestrator.ts file

Once you run the command, it will discover the test files and then the workers will start pulling the batches and run the tests, you'll see something like on the runtime output:

[Worker 1] is running tests with batches
[Worker 2] is running tests with batches
[Worker 3] is running tests with batches

[Worker 4] failed to run tests: Error: Command failed: npx playwright test ...

All batches were completed

Run the tests

If you want to run the integration tests locally, you can execute the command:

npm test

# or for watch mode
npm run test:watch

Afterwards it will be run over 34 different test cases which are compromised of unit test for individual modules, integration tests around discovery.ts, worker.ts and also orchestrator test

Integrating with Github Actions

If you want to use this messed up codebase in your actual Github Actions workflow, just drop a few things on your .yml file in a particular repo and it's good to go. For example you may use this necessary setup:

env:
  WORKERS: ${{ github.event.inputs.workers || '5' }}
  BATCH_SIZE: ${{ github.event.inputs.batch_size || '5' }}
  MAX_RETRIES: "3"
  SPEC_PATTERN: ".spec.ts"
  WORKER_TIMEOUT: "600000" # set a reasonable worker timeout
 
jobs:
  e2e:
    name: Run E2E orchestrator
    runs-on: self-hosted  # target your shared self-hosted runner pool
 
    # add a job-level timeout as a hard ceiling, meaning even if a worker
    # hangs and WORKER_TIMEOUT somehow doesn't fire, the job won't run forever
    timeout-minutes: 60

### rest of your workflow ###

- name: Run orchestrator
  run: npx ts-node src/orchestrator.ts
  env:
   WORKERS: ${{ env.WORKERS }}
   BATCH_SIZE: ${{ env.BATCH_SIZE }}
   MAX_RETRIES: ${{ env.MAX_RETRIES }}
   SPEC_PATTERN: ${{ env.SPEC_PATTERN }}
   WORKER_TIMEOUT: ${{ env.WORKER_TIMEOUT }}

Important

One thing to watch out: if you don't use self-hosted runnner, you need to carefully configure the installation of Playwright browsers before runs the cache restore. Otherwise, it would be installed those browsers twice. But, for most self-hosted configurationn it's not a big deal since the cache hit will be high

But, why batching matters?

Without batching, when you enqueue 1 test suite item it can be considered as 1 test spec. The problems is, it will causing too many Playwright process startups, too many browser initializations causing inefficient CPU/Memory utiliziation. In short: it's expensive process

Future considerations that we could take over to significantly improve this experiment or setup that we have right now:

  • Priority scheduling based P0, P1, P2 tagging -> the setup were much complex, but we don't know yet
  • Result aggregator. Unified and collects all the test execution results, includes retries and error traces
  • Observability layer. Another complex setup, but becomes extremely important later (insya Allah) since it would be capture:
    • Queue metrics
    • Worker metrics
    • E2E test metrics
  • More advanced setup? workers across Kubernetes + VMs + multiple runners

Trade-offs

When we started to implementing this orchestration around 2 weeks ago, we beginning to understand that it will costs:

  • testing complexity
  • testinng infra ownership
  • scheduler logic
  • observability needs

So we make a wild guess it’s only worth it when:

  • test suites become large with additional teams multiply
  • shared infra becomes bottleneck
  • CI costs increase

If you want to avoid this complexity and perhaps can be considered overkill nor overenigineering, the best solution is whether to optimize the test suites or to invest in more powerful infastructure, and BOOM!

About

Distributed workload scheduling for Playwright UI automation testing

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors