✓ CI.BAT: success for Reapply "drm/xe/gsc: define GSC FW for LNL"

Wed Jul 3 20:22:04 UTC 2024

On Wed, Jul 03, 2024 at 09:38:54AM GMT, Daniele Ceraolo Spurio wrote:
>
>
>On 7/2/2024 11:02 AM, Lucas De Marchi wrote:
>>On Tue, Jul 02, 2024 at 09:25:31AM GMT, Daniele Ceraolo Spurio wrote:
>>>
>>>
>>>On 7/2/2024 7:29 AM, Lucas De Marchi wrote:
>>>>On Tue, Jul 02, 2024 at 01:01:28AM GMT, Patchwork wrote:
>>>>>== Series Details ==
>>>>>
>>>>>Series: Reapply "drm/xe/gsc: define GSC FW for LNL"
>>>>>URL   : https://patchwork.freedesktop.org/series/135623/
>>>>>State : success
>>>>>
>>>>>== Summary ==
>>>>>
>>>>>CI Bug Log - changes from 
>>>>>xe-1542-886eeb6d89b58f914ee5045fcac54b59a73d8299_BAT -> 
>>>>>xe-pw-135623v1_BAT
>>>>>====================================================
>>>>>
>>>>>Summary
>>>>>-------
>>>>>
>>>>> **SUCCESS**
>>>>>
>>>>> No regressions found.
>>>>>
>>>>>
>>>>>
>>>>>Participating hosts (5 -> 4)
>>>>>------------------------------
>>>>>
>>>>> Missing    (1): bat-lnl-1
>>>>
>>>>I guess it didn't really work. +Ryszard +Ewelina: Can we promote LNL to
>>>>be considered "a reliable machine from the CI POV" so we don't have
>>>>"CI.BAT: success" when LNL execution is missing?  Or is there any other
>>>>reason why we report success in this case?
>>>
>>>Damn. I can't repro the issue anymore with this WA applied and 
>>>even in CI we weren't seeing it when I sent it for testing before: 
>>>https://patchwork.freedesktop.org/series/134099/ . I did 3 runs in 
>>>that case and none of them hit the problem.
>>>
>>>I've triggered another run to see if we get any better logs.
>>
>>this change should NOT make the machine "go missing" really. Actually no
>>change to xe-only should really make a machine not have any log at
>>all....
>>
>>I believe it was a very unfortunate coincidence and there must be a
>>network issue or the like... but we can't consider success when we don't
>>have a report for LNL. Particularly for this patch since LNL is
>>the only affected platform.
>>
>>"try again"  in patchwork sounds good for now.
>
>It died again, but this time we got some logs. It died during an fbdev 
>test, which makes me think this is due to the display side of the WA 
>still being missing. I'll try again to repro locally and if I can't 
>I'll send a patch to move the FB out of stolen and see what happens.

I think it needs to be out of stolen, no way around it.  We are fencing
the number of writes, but if the fb, that is user-accessible, is
allocated in stolen, there's no way to do that.

Also, I have no idea why it's currently in stolen. Looking at

	https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-135667v5/bat-lnl-1/dmesg0.txt

	<7>[  321.798108] xe 0000:00:02.0: [drm:intelfb_create [xe]] no BIOS fb, allocating a new one
	<6>[  321.798743] xe 0000:00:02.0: [drm] Allocated fbdev into stolen
	<7>[  321.803371] xe 0000:00:02.0: [drm:intelfb_create [xe]] allocated 2880x1800 fb: 0x01621000

so, we failed to get the fb from BIOS that would be in stolen (why? not
sure), then we go ahead and allocate in stolen, for what benefit?

It seems like drivers/gpu/drm/xe/display/intel_fbdev_fb.c is already
missing the MTL WA that avoids stolen (see the same function at
drivers/gpu/drm/i915/display/intel_fbdev_fb.c. So, for CI this is probably
sufficient?

// diff --git a/drivers/gpu/drm/xe/display/intel_fbdev_fb.c b/drivers/gpu/drm/xe/display/intel_fbdev_fb.c
// index 816ad13821a8..dcac37c33560 100644
// --- a/drivers/gpu/drm/xe/display/intel_fbdev_fb.c
// +++ b/drivers/gpu/drm/xe/display/intel_fbdev_fb.c
// @@ -37,24 +37,10 @@ struct intel_framebuffer *intel_fbdev_fb_alloc(struct drm_fb_helper *helper,
//  	size = PAGE_ALIGN(size);
//  	obj = ERR_PTR(-ENODEV);
//  
// -	if (!IS_DGFX(xe)) {
// -		obj = xe_bo_create_pin_map(xe, xe_device_get_root_tile(xe),
// -					   NULL, size,
// -					   ttm_bo_type_kernel, XE_BO_FLAG_SCANOUT |
// -					   XE_BO_FLAG_STOLEN |
// -					   XE_BO_FLAG_PINNED);
// -		if (!IS_ERR(obj))
// -			drm_info(&xe->drm, "Allocated fbdev into stolen\n");
// -		else
// -			drm_info(&xe->drm, "Allocated fbdev into stolen failed: %li\n", PTR_ERR(obj));
// -	}
// -	if (IS_ERR(obj)) {
// -		obj = xe_bo_create_pin_map(xe, xe_device_get_root_tile(xe), NULL, size,
// -					   ttm_bo_type_kernel, XE_BO_FLAG_SCANOUT |
// -					   XE_BO_FLAG_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
// -					   XE_BO_FLAG_PINNED);
// -	}
// -
// +	obj = xe_bo_create_pin_map(xe, xe_device_get_root_tile(xe), NULL, size,
// +				   ttm_bo_type_kernel, XE_BO_FLAG_SCANOUT |
// +				   XE_BO_FLAG_VRAM_IF_DGFX(xe_device_get_root_tile(xe)) |
// +				   XE_BO_FLAG_PINNED);
//  	if (IS_ERR(obj)) {
//  		drm_err(&xe->drm, "failed to allocate framebuffer (%pe)\n", obj);
//  		fb = ERR_PTR(-ENOMEM);

+Jani +Maarten

Lucas De Marchi

>
>Daniele
>
>>
>>thanks
>>Lucas De Marchi
>>
>>>
>>>Daniele
>>>
>>>>
>>>>thanks
>>>>Lucas De Marchi
>>>
>