[Intel-xe] [PATCH 1/4] drm/xe: Fix MTL+ stolen memory mapping

Lucas De Marchi lucas.demarchi at intel.com
Wed Jul 26 15:13:28 UTC 2023


On Tue, Jul 25, 2023 at 03:47:18PM -0700, Matt Roper wrote:
>On Tue, Jul 25, 2023 at 01:28:33PM -0700, Lucas De Marchi wrote:
>> Based on commit 8d8d062be6b9 ("drm/i915/mtl: Fix MTL stolen memory GGTT
>> mapping"). For stolen on MTL and beyond, the address in the PTE is the
>> offset from DSM base. While at it, update the comments explaining each
>> part of the calculation.
>>
>> v2: Also change resource_is_stolen_vram() so the right PTE bit is set
>
>With a bit more testing locally I noticed that I'm seeing some warnings
>and crashes on driver unload when display is enabled.  I'm not 100% sure
>they're tied to this, but it seems like they could be.
>
>The first warning is
>
>    WARNING: CPU: 10 PID: 4710 at drivers/gpu/drm/xe/xe_ttm_vram_mgr.c:324 ttm_vram_mgr_fini+0x15e/0x170 [xe]
>
>which is
>
>        WARN_ON_ONCE(mgr->visible_avail != mgr->visible_size);
>
>From there things go downhill --- circular locking detected, slab cache
>problems, and KASAN use-after-free reports.

I can only reproduce the warnings above, not the crashes, circular
lockings etc. They don't seem related to this patch. More below.

>
>Without digging through the code in depth, my worry is that by just
>marking this stolen memory as vram everywhere in the driver, we try to
>return the framebuffer to the otherwise empty/uninitialized VRAM pool,
>even though it isn't truly VRAM.
>
>Maybe it would be better to make a change with less side effects
>elsewhere in the code.  E.g., add a "resource_is_stolen_devmem" that
>only gets called from the pagetable encoding functions to set the DM=1
>bit and not anywhere else?

I think it would make it clearer about the intent since the stolen
memory is not actually vram, it's just accessed through the PCI BAR. I
did typed this all into new patches for this series, but it I still
reproduce the warnings above. They seem related to the small BAR
addition that added the tracking for the visible portion of the BAR.

For stolen, it's always 100% visible. Loading the driver without
display (no warning on module removal):

root at mtl: ~# cat  /sys/kernel/debug/dri/0/stolen_mm
   use_type: 1
   use_tt: 0
   size: 62914560
   usage: 0
default_page_size: 4KiB
visible_avail: 60MiB
visible_size: 60MiB
chunk_size: 4KiB, total: 60MiB, free: 60MiB
...

vs loading it with display:

root at mtl # cat  /sys/kernel/debug/dri/0/stolen_mm
   use_type: 1
   use_tt: 0
   size: 62914560
   usage: 25165824
default_page_size: 4KiB
visible_avail: 36MiB
visible_size: 60MiB
chunk_size: 4KiB, total: 60MiB, free: 36MiB

 From dmesg:
[ 3026.756075] xe 0000:00:02.0: [drm:xe_ttm_stolen_mgr_init [xe]] Initialized stolen memory support with 62914560 bytes
[ 3029.670666] xe 0000:00:02.0: [drm] Allocated fbdev into stolen
[ 3030.009260] xe 0000:00:02.0: [drm:intel_fbc_update [xe]] reserved 16777216 bytes of contiguous stolen space for FBC, limit: 1

So the allocations that remain on stolen seem to be the fbdev + FBC.
I wonder if the fini order is correct wrt display and we are returning
those before destroying the ttm mananger. Cc'ing Matt Auld and Maarten
to check if they have additional clues. I will send the next version
since I believe the approach without overriding is_vram is clearer and I
have several other fixes/cleanups piling there.

I will try it on TGL or ADL-P to see if I can reproduce there too.

Lucas De Marchi

>
>Or maybe these warnings are unrelated; I haven't tried unloading with
>display on any other platforms recently, so maybe it's a pre-existing
>problem.
>
>
>Matt
>
>>
>> Signed-off-by: Lucas De Marchi <lucas.demarchi at intel.com>
>> ---
>>  drivers/gpu/drm/xe/xe_bo.c             |  3 ++-
>>  drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c | 15 +++++++++++++--
>>  2 files changed, 15 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
>> index 10858fcf4f62..5379d8e5affb 100644
>> --- a/drivers/gpu/drm/xe/xe_bo.c
>> +++ b/drivers/gpu/drm/xe/xe_bo.c
>> @@ -61,7 +61,8 @@ bool mem_type_is_vram(u32 mem_type)
>>
>>  static bool resource_is_stolen_vram(struct xe_device *xe, struct ttm_resource *res)
>>  {
>> -	return res->mem_type == XE_PL_STOLEN && IS_DGFX(xe);
>> +	return res->mem_type == XE_PL_STOLEN &&
>> +		(IS_DGFX(xe) || GRAPHICS_VERx100(xe) >= 1270);
>>  }
>>
>>  static bool resource_is_vram(struct ttm_resource *res)
>> diff --git a/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c b/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
>> index 21ecc734f10a..271b3fba4129 100644
>> --- a/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
>> +++ b/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
>> @@ -94,11 +94,22 @@ static u32 detect_bar2_integrated(struct xe_device *xe, struct xe_ttm_stolen_mgr
>>
>>  	ggc = xe_mmio_read32(xe_root_mmio_gt(xe), GGC);
>>
>> -	/* check GGMS, should be fixed 0x3 (8MB) */
>> +	/*
>> +	 * Check GGMS: it should be fixed 0x3 (8MB), which corresponds to the
>> +	 * GTT size
>> +	 */
>>  	if (drm_WARN_ON(&xe->drm, (ggc & GGMS_MASK) != GGMS_MASK))
>>  		return 0;
>>
>> -	mgr->stolen_base = mgr->io_base = pci_resource_start(pdev, 2) + SZ_8M;
>> +	/*
>> +	 * Graphics >= 1270 uses the offset to the GSMBASE as address in the
>> +	 * PTEs, together with the DM flag being set. Previously there was no
>> +	 * such flag so the address was the io_base.
>> +	 *
>> +	 * DSMBASE = GSMBASE + 8MB
>> +	 */
>> +	mgr->stolen_base = SZ_8M;
>> +	mgr->io_base = pci_resource_start(pdev, 2) + mgr->stolen_base;
>>
>>  	/* return valid GMS value, -EIO if invalid */
>>  	gms = REG_FIELD_GET(GMS_MASK, ggc);
>> --
>> 2.40.1
>>
>
>-- 
>Matt Roper
>Graphics Software Engineer
>Linux GPU Platform Enablement
>Intel Corporation


More information about the Intel-xe mailing list