It’s clearly a bug, but what is the bug exactly?
A couple weeks ago I was updating some of our Playwright tests, when Playwright decided to mess with the execution order of the tests. The tests in question are a group of tests that cover a whole process from start to end. So we run them in serial mode, “to ensure they will always run together and in order”. And yet, they were not running in order.
While that’s obviously a bug, it wasn’t clear what exactly the bug was. Was it the tests not running in the correct order? Or was that only a symptom of some other bug? And what was triggering this bug? Because we’ve had these tests running in serial mode for a while, but had never seen this behavior before.
The problem as we encountered it
The tests that triggered the issue were similar to these:
import { test, expect } from "@playwright/test"; test.describe.configure({ mode: "serial" }); const timestamp = Date.now(); test("first test", async () => { expect(true).toBeTruthy(); }); test(`second test - ${timestamp}`, async () => { expect(true).toBeTruthy(); }); test("third test", async () => { expect(false).toBeTruthy(); });
We have three tests, run in serial mode, with the third test depending on the second test. To keep the example simple, the tests just assert against true and false. (This already tells you the issue is purely within Playwright’s test runner, no page required.) The third test fails to simulate how it would fail if the previous test had not run. Because that’s why you would run tests in serial mode, to manage that dependency between the tests.
And this is what happens, if you run these tests:
$ npx playwright test ./tests/example.spec.ts Running 3 tests using 1 worker ✓ 1 [chromium] › tests/example.spec.ts:7:5 › first test (4ms) ✘ 2 [chromium] › tests/example.spec.ts:15:5 › third test (2ms) - 3 [chromium] › tests/example.spec.ts:11:5 › second test - 1772891032995 1) [chromium] › tests/example.spec.ts:15:5 › third test ─────────────────────── Error: expect(received).toBeTruthy() Received: false 14 | 15 | test("third test", async () => { > 16 | expect(false).toBeTruthy(); | ^ 17 | }); 18 | at ~/playwright-bug/tests/example.spec.ts:16:17 1 failed [chromium] › tests/example.spec.ts:15:5 › third test ──────────────────────── 1 did not run 1 passed (802ms)
So it looks like Playwright runs the first test, then the third test, and then would have run the second test, if the third test hadn’t failed.
It doesn’t make sense, though, for that to be the whole bug. How likely is it that Playwright “just” messes up the execution order, especially in serial mode? And even if that were the bug, the question remains what triggered this bug now. We’ve had these tests for a while, why is this the first time we’re seeing this?
So as I said in the introduction: what exactly is the bug here?1
If you want to play around with this behavior yourself, this is the time to stop reading and start playing, because spoilers are about to follow. And do make sure you are using Playwright v1.58 (or earlier). I reported the bug on February 23rd. The next day there was a PR with a fix, which will be included in v1.59.
Exploring the problem
To debug this issue we tried a number of different things. We moved the test with timestamp in its name higher up in the file. Didn’t make a difference. Originally, the test used crypto.getRandomValues() to generate a random string instead of a timestamp. We made sure that nothing weird was happening there type-wise. Didn’t make a difference. So we made the switch to Date.now(). Didn’t make a difference. We replaced the timestamp with a static string instead of one that’s the result of a function. That did make a difference. The execution order of the tests was as it should be. But why?
It had to have something to do with putting the result of const randomString = Date.now(); into the name of a test. You’d expect that this line of code is evaluated only once in a test run. Turns out that this is not the case.
Better two times than zero times
If you add a console.log(timestamp) right after declaring the timestamp constant, the output you get is:
$ npx playwright test tests/example.log.spec.ts 1772982294612 Running 3 tests using 1 worker 1772982295702 ✓ 1 [chromium] › tests/example.log.spec.ts:8:5 › first test (11ms) ✘ 2 [chromium] › tests/example.log.spec.ts:16:5 › third test (4ms) - 3 [chromium] › tests/example.log.spec.ts:12:5 › second test - 1772982294612 [...]
The output of the console.log() is in there twice. The first output 1772982294612 matches the timestamp in the listed test name. What the second output 1772982295702 is used for, is unclear. But it does look like Playwright runs our test code twice.
This was confirmed by searching the internet for people reporting similar issues. We found three.
What does the internet say?
A reply to issue 24273 explicitly states that Playwright will run code twice: “once when we’re reading all the tests” and a “second time when the worker is actually running the test in a new process”. In issue 28987 there’s a reply stating: “when we load them [the tests], they are called different to when we execute them in the worker.” And in issue 29975 there’s the reply: “Closing since this is working as intended.”
There are two things to note about these three issues.
The first is that two of the three issues mention parametrized tests. And the example in the third one does suggest it’s a parametrized test too. That’s how we ran into this issue as well. I can think of few cases where you’d want to do string interpolation in the name of a test, if not for parametrized tests. So that did help with finding these three similar issues. Yet, as the example above shows, the parametrization is incidental to the bug. You can reproduce it just fine without any parametrized tests.
The second (and more interesting) thing to note about these three issues, is that all three mention an error message being shown by Playwright:
Test not found in the worker process. Make sure test tile does not change.
That error message is not present in the output of the example test I shared above. So that’s odd…
The actual bug underneath it all
To summarize, we’ve figured out that:
- Playwright runs your test code twice.
- It’s not happy when you put something dynamic in the name of a test.
- It should throw an error in that case.2
- It does not throw that error in our case.
So what’s different between our case and the other three?
The one thing we haven’t played with is Playwright’s mode. It has three modes: serial, default, and parallel. If you run the example test above in either default or parallel mode, you’ll actually get the error message: “Test not found in the worker process. Make sure test tile does not change.”. It’s just in serial mode, that you don’t.
So that’s what the actual bug was. It had nothing to do with execution order. The bug is that, for some reason, the error message about the test title changing is not shown when running in serial mode. If you want to know what that reason is, your best bet is reading the fix. You’ll have to dig some deeper into the Playwright code base, though, to figure out how exactly the fix relates to Playwright’s serial mode.
Arguably, there’s also the bug of Playwright executing your test code twice. While that definitely defies user expectations, I don’t want even want to begin imagining what a fix for that behavior would look like…
Have you encountered bugs that were clearly a bug, but it wasn’t clear what exactly the bug was? How often do you encounter such bugs? How do you handle them?
-
The RIMGEN heuristic from the BBST Bug Advocacy course is a great heuristic for cases like this. ↩
-
In care you’re curious, here’s the code that throws that error and the test for that error.. ↩