[PATCH] drm/doc: ci: Require more context for flaky tests

Helen Koike helen.koike at collabora.com
Wed Oct 25 12:47:07 UTC 2023



On 23/10/2023 12:09, Maxime Ripard wrote:
> On Fri, Oct 20, 2023 at 01:33:59AM -0300, Helen Koike wrote:
>> On 19/10/2023 13:51, Helen Koike wrote:
>>> On 19/10/2023 06:46, Maxime Ripard wrote:
>>>> Flaky tests can be very difficult to reproduce after the facts, which
>>>> will make it even harder to ever fix.
>>>>
>>>> Let's document the metadata we agreed on to provide more context to
>>>> anyone trying to address these fixes.
>>>>
>>>> Link: https://lore.kernel.org/dri-devel/CAPj87rPbJ1V1-R7WMTHkDat2A4nwSd61Df9mdGH2PR=ZzxaU=Q@mail.gmail.com/
>>>> Signed-off-by: Maxime Ripard <mripard at kernel.org>
>>>> ---
>>>>    Documentation/gpu/automated_testing.rst | 13 +++++++++++++
>>>>    1 file changed, 13 insertions(+)
>>>>
>>>> diff --git a/Documentation/gpu/automated_testing.rst
>>>> b/Documentation/gpu/automated_testing.rst
>>>> index 469b6fb65c30..2dd0e221c2c3 100644
>>>> --- a/Documentation/gpu/automated_testing.rst
>>>> +++ b/Documentation/gpu/automated_testing.rst
>>>> @@ -67,6 +67,19 @@ Lists the tests that for a given driver on a
>>>> specific hardware revision are
>>>>    known to behave unreliably. These tests won't cause a job to fail
>>>> regardless of
>>>>    the result. They will still be run.
>>>> +Each new flake entry must be associated with a link to a bug report to
>>>
>>> What do you mean by but report? Just a link to an email to the mailing
>>> list is enough?
>>>
>>> Also, I had made a mistake to the first flakes lists, which I corrected
>>> with https://www.spinics.net/lists/kernel/msg4959629.html (there was a
>>> bug in my script which ended up erroneous adding a bunch of tests in the
>>> flake list, so I cleaned them up), I would like to kind request to let
>>> me add those documentation in a future patch to not block that patch
>>> series.
>>>
>>> Thanks
>>> Helen
>>>
>>>
>>>> +the author of the affected driver, the board name or Device Tree name of
>>>> +the board, the first kernel version affected, and an approximation of
>>>> +the failure rate.
>>>> +
>>>> +They should be provided under the following format::
>>>> +
>>>> +  # Bug Report: $LORE_OR_PATCHWORK_URL
>>
>> I wonder if the commit adding the test into the flakes.txt file with and
>> Acked-by from the device maintainer shouldn't be already considered the Bug
>> Report.
> 
> I guess it could, yes. I think I'd still prefer the link since it would
> allow to also evaluate if the issue is fixed or not now.
> 
>>>> +  # Board Name: broken-board.dtb
>>
>> Maybe Board Name isn't required, since it is already in the name of the
>> file.
> 
> I have no idea how the i915 naming works, but on ARM at least the name
> of the file contains the name of the SoC, not the board where it was
> observed.

right, yeah we could use the dtb to be more clear/precise, no problem.

> 
>>>> +  # Version: 6.6-rc1
>>>> +  # Failure Rate: 100
>>
>> Maybe also:
>>
>>    # Pipeline url:
>> https://gitlab.freedesktop.org/helen.fornazier/linux/-/pipelines/1014435
> 
> Sounds like a good idea yeah :) Are those artifacts archived/deleted at
> some point or do they stick around forever?

Good point, I asked the admins, they stick for 4 weeks (could be more, 
but it is not forever) :(

> 
>> All this info will complicated a bit the update-xfails.py script, but well,
>> we can handle...
>> (see https://patchwork.kernel.org/project/dri-devel/patch/20231020034124.136295-4-helen.koike@collabora.com/
>> )
>> We need to update that script to make life easier.
> 
> I guess we could just add a template for now? It would keep the script
> easy and yet still hint its user that we want more data

ack

Thanks
Helen

> 
>> Vignesh sent a patch adding at least the pipeline url to the file
>> https://patchwork.kernel.org/project/linux-arm-msm/patch/20231019070650.61159-9-vignesh.raman@collabora.com/
>> but to meet this doc that needs to be updated too.
> 
> Sure, I'll update it
> 
> Maxime


More information about the dri-devel mailing list