[Intel-xe] [PATCH 0/2] Fix deadlock issue on d3cold

Riana Tauro riana.tauro at intel.com
Mon Dec 4 11:57:38 UTC 2023



On 12/4/2023 4:27 PM, Matthew Auld wrote:
> Hi,
> 
> On Mon, 4 Dec 2023 at 05:18, Riana Tauro <riana.tauro at intel.com> wrote:
>>
>> kernel BOs need to be restored to the same place in VRAM, and with
>> d3cold that means that any VRAM allocation can
>> potentially steal the spot from kernel BOs which then blows up when
>> waking the device up.
>>
>> However if we end up moving xe_device_mem_access_get() much higher
>> up in the hierarchy (start of the gem_create_ioctl) then
>> this is no longer possible.
>>
>> This patch fixes the deadlock issue seen in
>> Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/256
>> Also enables d3cold to get CI results
>>
>> Riana Tauro (2):
>>    RFC drm/xe: Move xe_device_mem_access_get to the top of
>>      gem_create_ioctl
>>    CI drm/xe: Enable d3cold
> 
Hi Matthew


> Tried this locally on DG2 and it triggers lockdep splats for me when
> loading the module, so it looks like a lot more is needed before
> turning on d3cold. 
The lockdep splat seen on load when d3cold is enabled has the below 
stack trace

xe_tile_init_noalloc is called before runtime suspend is initialized
using xe_pm_init. Seems to be a false positive

[  150.900520]
                -> #1 (xe_device_mem_access_lockdep_map){+.+.}-{0:0}:
[  150.908078]        lock_acquire+0x169/0x3d0
[  150.912276]        xe_device_mem_access_get+0x53/0x220 [xe]
[  150.918067]        __xe_ggtt_insert_bo_at+0x12a/0x3e0 [xe]
[  150.923760]        __xe_bo_create_locked+0x2f5/0x6e0 [xe]
[  150.929353]        xe_bo_create_pin_map_at+0x42/0x270 [xe]
[  150.935033]        xe_bo_create_pin_map+0x1a/0x20 [xe]
[  150.940366]        xe_sa_bo_manager_init+0xac/0x300 [xe]
[  150.945884]        xe_tile_init_noalloc+0x74/0x110 [xe]
[  150.951316]        xe_device_probe+0x765/0xaa0 [xe]
[  150.956392]        xe_pci_probe+0x53d/0x860 [xe]
[  150.961220]        local_pci_probe+0x7d/0xe0

                -> #0 (reservation_ww_class_mutex){+.+.}-{3:3}:
[  151.049443]        check_prev_add+0x1ba/0x14a0
[  151.053886]        __lock_acquire+0x203e/0x2ff0
[  151.058413]        lock_acquire+0x169/0x3d0
[  151.062596]        __ww_mutex_lock.constprop.0+0x164/0x1e50
[  151.068161]        ww_mutex_lock+0x42/0x1a0
[  151.072343]        xe_bo_lock+0x2f/0x40 [xe]
[  151.076817]        xe_bo_evict_all+0x57d/0x610 [xe]
[  151.081893]        xe_pm_runtime_suspend+0x38f/0x3b0 [xe]

This does not affect the functionality of d3cold.

However I also had to manually set the
> d3cold.capable=true. Wondering if we have machines in CI that are
> d3cold capable, since BAT results are reporting success?Yeah didn't see this lockdep splat on load in the CI DG2. it also has
display enabled so it won't enter runtime suspend.

Thanks
Riana Tauro

> 
>>
>>   drivers/gpu/drm/xe/xe_bo.c | 26 ++++++++++++++++++++------
>>   drivers/gpu/drm/xe/xe_pm.h |  2 +-
>>   2 files changed, 21 insertions(+), 7 deletions(-)
>>
>> --
>> 2.40.0
>>


More information about the Intel-xe mailing list