I've noticed that earlier today when checking the upstream test results for web-platform-tests/wpt#41841. All the failures that pass downstream are actually marked as fail upstream. Reason is that the upstream PR gets created immediately when the patches land on autoland. This is not problematic if there are only test changes, but if a new feature needs to be added or a bug being fixed for Firefox the upstream checks are always going to produce failures. This is not always noticeable given that quite a few jobs do not block landing like these wdspec test results.
Just filing this bug in case we want to make an improvement here and maybe delay the creation of the PR or at least re-run all the checks when the Nightly is ready to be tested. Maybe we should then even block the merge to make sure nothing is regressed?