✓ CI.BAT: success for Reapply "drm/xe/gsc: define GSC FW for LNL"

Mon Jul 8 22:59:26 UTC 2024

On Thu, Jul 04, 2024 at 04:29:29PM GMT, Ville Syrjälä wrote:
>On Wed, Jul 03, 2024 at 03:22:04PM -0500, Lucas De Marchi wrote:
>> On Wed, Jul 03, 2024 at 09:38:54AM GMT, Daniele Ceraolo Spurio wrote:
>> >
>> >
>> >On 7/2/2024 11:02 AM, Lucas De Marchi wrote:
>> >>On Tue, Jul 02, 2024 at 09:25:31AM GMT, Daniele Ceraolo Spurio wrote:
>> >>>
>> >>>
>> >>>On 7/2/2024 7:29 AM, Lucas De Marchi wrote:
>> >>>>On Tue, Jul 02, 2024 at 01:01:28AM GMT, Patchwork wrote:
>> >>>>>== Series Details ==
>> >>>>>
>> >>>>>Series: Reapply "drm/xe/gsc: define GSC FW for LNL"
>> >>>>>URL   : https://patchwork.freedesktop.org/series/135623/
>> >>>>>State : success
>> >>>>>
>> >>>>>== Summary ==
>> >>>>>
>> >>>>>CI Bug Log - changes from
>> >>>>>xe-1542-886eeb6d89b58f914ee5045fcac54b59a73d8299_BAT ->
>> >>>>>xe-pw-135623v1_BAT
>> >>>>>====================================================
>> >>>>>
>> >>>>>Summary
>> >>>>>-------
>> >>>>>
>> >>>>> **SUCCESS**
>> >>>>>
>> >>>>> No regressions found.
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>>Participating hosts (5 -> 4)
>> >>>>>------------------------------
>> >>>>>
>> >>>>> Missing    (1): bat-lnl-1
>> >>>>
>> >>>>I guess it didn't really work. +Ryszard +Ewelina: Can we promote LNL to
>> >>>>be considered "a reliable machine from the CI POV" so we don't have
>> >>>>"CI.BAT: success" when LNL execution is missing?  Or is there any other
>> >>>>reason why we report success in this case?
>> >>>
>> >>>Damn. I can't repro the issue anymore with this WA applied and
>> >>>even in CI we weren't seeing it when I sent it for testing before:
>> >>>https://patchwork.freedesktop.org/series/134099/ . I did 3 runs in
>> >>>that case and none of them hit the problem.
>> >>>
>> >>>I've triggered another run to see if we get any better logs.
>> >>
>> >>this change should NOT make the machine "go missing" really. Actually no
>> >>change to xe-only should really make a machine not have any log at
>> >>all....
>> >>
>> >>I believe it was a very unfortunate coincidence and there must be a
>> >>network issue or the like... but we can't consider success when we don't
>> >>have a report for LNL. Particularly for this patch since LNL is
>> >>the only affected platform.
>> >>
>> >>"try again"  in patchwork sounds good for now.
>> >
>> >It died again, but this time we got some logs. It died during an fbdev
>> >test, which makes me think this is due to the display side of the WA
>> >still being missing. I'll try again to repro locally and if I can't
>> >I'll send a patch to move the FB out of stolen and see what happens.
>>
>> I think it needs to be out of stolen, no way around it.  We are fencing
>> the number of writes, but if the fb, that is user-accessible, is
>> allocated in stolen, there's no way to do that.
>>
>> Also, I have no idea why it's currently in stolen. Looking at
>>
> 	https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-135667v5/bat-lnl-1/dmesg0.txt
>>
>> 	<7>[  321.798108] xe 0000:00:02.0: [drm:intelfb_create [xe]] no BIOS fb, allocating a new one
>> 	<6>[  321.798743] xe 0000:00:02.0: [drm] Allocated fbdev into stolen
>> 	<7>[  321.803371] xe 0000:00:02.0: [drm:intelfb_create [xe]] allocated 2880x1800 fb: 0x01621000
>>
>> so, we failed to get the fb from BIOS that would be in stolen (why? not
>> sure), then we go ahead and allocate in stolen, for what benefit?
>
>Using stolen for this is not entirely unreasonable as otherwise it
>might just be completely wasted memory.

well... the problem is that we keep having issues with stolen
when uCs and hardware IPs have their own things in stolen. And then WAs
showing up due to lmembar going back to system memory. Ugh

The stolen is used to pass the initial fb from BIOS to driver, but it's then
expected we won't use it after that anymore. At least it seems the trend
with MTL and LNL. Hence my reaction "if we are just allocating a new FB,
why is it in stolen?"

>
>Though we should probably have a bit better policy than the current
>"1/2 of stolen". A more sensible policy would be "make sure we leave
>enough for FBC". The problem is deciding what is actually reasonable.
>One easy idea would be to just do something along the lines of
>intel_fbc_cfb_size(intel_fbc_max_plane_size()). I'll see about cooking
>up something like that...
>
>>
>> It seems like drivers/gpu/drm/xe/display/intel_fbdev_fb.c is already
>> missing the MTL WA that avoids stolen
>
>Technically we shouldn't need that w/a on MTL as we now bypass
>LMEMBAR and access stolen directly.

c08c364102d07288610734de34111a666e730ae7 and related stolen commits
above it? I'm not sure we can follow the same approach for the LNL WA
though.

Since https://patchwork.freedesktop.org/patch/602631/?series=135861&rev=1
passes CI, I think it would be more reasonable to merge something like
that (maybe changing the WA to do usable_size = 0) and on top come with
a refactor bringing back objects in stolen.

Lucas De Marchi

>
>-- 
>Ville Syrjälä
>Intel