[Intel-xe] [PATCH 1/4] drm/xe: Fix MTL+ stolen memory mapping

Matthew Auld matthew.auld at intel.com
Wed Jul 26 15:19:43 UTC 2023


On 26/07/2023 16:13, Lucas De Marchi wrote:
> On Tue, Jul 25, 2023 at 03:47:18PM -0700, Matt Roper wrote:
>> On Tue, Jul 25, 2023 at 01:28:33PM -0700, Lucas De Marchi wrote:
>>> Based on commit 8d8d062be6b9 ("drm/i915/mtl: Fix MTL stolen memory GGTT
>>> mapping"). For stolen on MTL and beyond, the address in the PTE is the
>>> offset from DSM base. While at it, update the comments explaining each
>>> part of the calculation.
>>>
>>> v2: Also change resource_is_stolen_vram() so the right PTE bit is set
>>
>> With a bit more testing locally I noticed that I'm seeing some warnings
>> and crashes on driver unload when display is enabled.  I'm not 100% sure
>> they're tied to this, but it seems like they could be.
>>
>> The first warning is
>>
>>    WARNING: CPU: 10 PID: 4710 at 
>> drivers/gpu/drm/xe/xe_ttm_vram_mgr.c:324 ttm_vram_mgr_fini+0x15e/0x170 
>> [xe]
>>
>> which is
>>
>>        WARN_ON_ONCE(mgr->visible_avail != mgr->visible_size);
>>
>> From there things go downhill --- circular locking detected, slab cache
>> problems, and KASAN use-after-free reports.
> 
> I can only reproduce the warnings above, not the crashes, circular
> lockings etc. They don't seem related to this patch. More below.
> 
>>
>> Without digging through the code in depth, my worry is that by just
>> marking this stolen memory as vram everywhere in the driver, we try to
>> return the framebuffer to the otherwise empty/uninitialized VRAM pool,
>> even though it isn't truly VRAM.
>>
>> Maybe it would be better to make a change with less side effects
>> elsewhere in the code.  E.g., add a "resource_is_stolen_devmem" that
>> only gets called from the pagetable encoding functions to set the DM=1
>> bit and not anywhere else?
> 
> I think it would make it clearer about the intent since the stolen
> memory is not actually vram, it's just accessed through the PCI BAR. I
> did typed this all into new patches for this series, but it I still
> reproduce the warnings above. They seem related to the small BAR
> addition that added the tracking for the visible portion of the BAR.
> 
> For stolen, it's always 100% visible. Loading the driver without
> display (no warning on module removal):
> 
> root at mtl: ~# cat  /sys/kernel/debug/dri/0/stolen_mm
>    use_type: 1
>    use_tt: 0
>    size: 62914560
>    usage: 0
> default_page_size: 4KiB
> visible_avail: 60MiB
> visible_size: 60MiB
> chunk_size: 4KiB, total: 60MiB, free: 60MiB
> ...
> 
> vs loading it with display:
> 
> root at mtl # cat  /sys/kernel/debug/dri/0/stolen_mm
>    use_type: 1
>    use_tt: 0
>    size: 62914560
>    usage: 25165824
> default_page_size: 4KiB
> visible_avail: 36MiB
> visible_size: 60MiB
> chunk_size: 4KiB, total: 60MiB, free: 36MiB
> 
>  From dmesg:
> [ 3026.756075] xe 0000:00:02.0: [drm:xe_ttm_stolen_mgr_init [xe]] 
> Initialized stolen memory support with 62914560 bytes
> [ 3029.670666] xe 0000:00:02.0: [drm] Allocated fbdev into stolen
> [ 3030.009260] xe 0000:00:02.0: [drm:intel_fbc_update [xe]] reserved 
> 16777216 bytes of contiguous stolen space for FBC, limit: 1
> 
> So the allocations that remain on stolen seem to be the fbdev + FBC.
> I wonder if the fini order is correct wrt display and we are returning
> those before destroying the ttm mananger. Cc'ing Matt Auld and Maarten
> to check if they have additional clues. I will send the next version
> since I believe the approach without overriding is_vram is clearer and I
> have several other fixes/cleanups piling there.

Sounds like maybe:
https://patchwork.freedesktop.org/series/120310/

I have been cherry picking that on my local branches, otherwise module 
unload is quite broken.

> 
> I will try it on TGL or ADL-P to see if I can reproduce there too.
> 
> Lucas De Marchi
> 
>>
>> Or maybe these warnings are unrelated; I haven't tried unloading with
>> display on any other platforms recently, so maybe it's a pre-existing
>> problem.
>>
>>
>> Matt
>>
>>>
>>> Signed-off-by: Lucas De Marchi <lucas.demarchi at intel.com>
>>> ---
>>>  drivers/gpu/drm/xe/xe_bo.c             |  3 ++-
>>>  drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c | 15 +++++++++++++--
>>>  2 files changed, 15 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
>>> index 10858fcf4f62..5379d8e5affb 100644
>>> --- a/drivers/gpu/drm/xe/xe_bo.c
>>> +++ b/drivers/gpu/drm/xe/xe_bo.c
>>> @@ -61,7 +61,8 @@ bool mem_type_is_vram(u32 mem_type)
>>>
>>>  static bool resource_is_stolen_vram(struct xe_device *xe, struct 
>>> ttm_resource *res)
>>>  {
>>> -    return res->mem_type == XE_PL_STOLEN && IS_DGFX(xe);
>>> +    return res->mem_type == XE_PL_STOLEN &&
>>> +        (IS_DGFX(xe) || GRAPHICS_VERx100(xe) >= 1270);
>>>  }
>>>
>>>  static bool resource_is_vram(struct ttm_resource *res)
>>> diff --git a/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c 
>>> b/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
>>> index 21ecc734f10a..271b3fba4129 100644
>>> --- a/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
>>> +++ b/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c
>>> @@ -94,11 +94,22 @@ static u32 detect_bar2_integrated(struct 
>>> xe_device *xe, struct xe_ttm_stolen_mgr
>>>
>>>      ggc = xe_mmio_read32(xe_root_mmio_gt(xe), GGC);
>>>
>>> -    /* check GGMS, should be fixed 0x3 (8MB) */
>>> +    /*
>>> +     * Check GGMS: it should be fixed 0x3 (8MB), which corresponds 
>>> to the
>>> +     * GTT size
>>> +     */
>>>      if (drm_WARN_ON(&xe->drm, (ggc & GGMS_MASK) != GGMS_MASK))
>>>          return 0;
>>>
>>> -    mgr->stolen_base = mgr->io_base = pci_resource_start(pdev, 2) + 
>>> SZ_8M;
>>> +    /*
>>> +     * Graphics >= 1270 uses the offset to the GSMBASE as address in 
>>> the
>>> +     * PTEs, together with the DM flag being set. Previously there 
>>> was no
>>> +     * such flag so the address was the io_base.
>>> +     *
>>> +     * DSMBASE = GSMBASE + 8MB
>>> +     */
>>> +    mgr->stolen_base = SZ_8M;
>>> +    mgr->io_base = pci_resource_start(pdev, 2) + mgr->stolen_base;
>>>
>>>      /* return valid GMS value, -EIO if invalid */
>>>      gms = REG_FIELD_GET(GMS_MASK, ggc);
>>> -- 
>>> 2.40.1
>>>
>>
>> -- 
>> Matt Roper
>> Graphics Software Engineer
>> Linux GPU Platform Enablement
>> Intel Corporation


More information about the Intel-xe mailing list