[Intel-gfx] [PATCH] drm/i915: Fix a lockdep warning at error capture

Thu Jun 23 15:59:45 UTC 2022

Hi G.G.

On 6/23/2022 5:31 PM, Gwan-gyeong Mun wrote:
> Commit message and code changes look good to me.
>
> Reviewed-by: Gwan-gyeong Mun <gwan-gyeong.mun at intel.com>
>
> This is a question about the issue mentioned in this patch, not the 
> patch. The tests 
> (igt at gem_ctx_persistence@legacy-engines-hostile at render) mentioned in 
> this issue ( https://gitlab.freedesktop.org/drm/intel/-/issues/5595 ) 
> are dealing with a test that causes gpu reset / hang? Or did the reset 
> happen during the test?

If the test  resets the GPU then APL should hit this. From a quick look 
into the igt, the test seems to me can trigger reset. The issue also 
mentions other tests like igt at i915_hangman@engine-error-state-capture 
which might sounds more reasonable.

Regards,

Nirmoy

>
> br,
> G.G.
>
> On 6/17/22 4:21 PM, Das, Nirmoy wrote:
>> Missed the fdo issue ref:
>>
>> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/5595
>>
>> On 6/17/2022 1:55 PM, Nirmoy Das wrote:
>>> For some platfroms we use stop_machine version of
>>> gen8_ggtt_insert_page/gen8_ggtt_insert_entries to avoid a
>>> concurrent GGTT access bug but this causes a circular locking
>>> dependency warning:
>>>
>>>    Possible unsafe locking scenario:
>>>          CPU0                    CPU1
>>>          ----                    ----
>>>     lock(&ggtt->error_mutex);
>>>                                  lock(dma_fence_map);
>>> lock(&ggtt->error_mutex);
>>>     lock(cpu_hotplug_lock);
>>>
>>> Fix this by calling gen8_ggtt_insert_page/gen8_ggtt_insert_entries
>>> directly at error capture which is concurrent GGTT access safe because
>>> reset path make sure of that.
>>>
>>> Suggested-by: Chris Wilson <chris at chris-wilson.co.uk>
>>> Signed-off-by: Nirmoy Das <nirmoy.das at intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/gt/intel_gt_gmch.c  | 2 ++
>>>   drivers/gpu/drm/i915/gt/intel_gtt.h      | 9 +++++++++
>>>   drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 5 ++++-
>>>   drivers/gpu/drm/i915/i915_gpu_error.c    | 8 ++++++--
>>>   4 files changed, 21 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_gmch.c 
>>> b/drivers/gpu/drm/i915/gt/intel_gt_gmch.c
>>> index 18e488672d1b..2260ed576776 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_gt_gmch.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_gt_gmch.c
>>> @@ -629,6 +629,8 @@ int intel_gt_gmch_gen8_probe(struct i915_ggtt 
>>> *ggtt)
>>>       if (intel_vm_no_concurrent_access_wa(i915)) {
>>>           ggtt->vm.insert_entries = bxt_vtd_ggtt_insert_entries__BKL;
>>>           ggtt->vm.insert_page    = bxt_vtd_ggtt_insert_page__BKL;
>>> +        ggtt->vm.raw_insert_page = gen8_ggtt_insert_page;
>>> +        ggtt->vm.raw_insert_entries = gen8_ggtt_insert_entries;
>>>           ggtt->vm.bind_async_flags =
>>>               I915_VMA_GLOBAL_BIND | I915_VMA_LOCAL_BIND;
>>>       }
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h 
>>> b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>> index a40d928b3888..f9a1f3d17272 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_gtt.h
>>> +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h
>>> @@ -306,6 +306,15 @@ struct i915_address_space {
>>>                      struct i915_vma_resource *vma_res,
>>>                      enum i915_cache_level cache_level,
>>>                      u32 flags);
>>> +    void (*raw_insert_page)(struct i915_address_space *vm,
>>> +                dma_addr_t addr,
>>> +                u64 offset,
>>> +                enum i915_cache_level cache_level,
>>> +                u32 flags);
>>> +    void (*raw_insert_entries)(struct i915_address_space *vm,
>>> +                   struct i915_vma_resource *vma_res,
>>> +                   enum i915_cache_level cache_level,
>>> +                   u32 flags);
>>>       void (*cleanup)(struct i915_address_space *vm);
>>>       void (*foreach)(struct i915_address_space *vm,
>>> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c 
>>> b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
>>> index d2c5c9367cc4..c06e83872c34 100644
>>> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
>>> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c
>>> @@ -493,7 +493,10 @@ static void uc_fw_bind_ggtt(struct intel_uc_fw 
>>> *uc_fw)
>>>       if (i915_gem_object_is_lmem(obj))
>>>           pte_flags |= PTE_LM;
>>> -    ggtt->vm.insert_entries(&ggtt->vm, dummy, I915_CACHE_NONE, 
>>> pte_flags);
>>> +    if (ggtt->vm.raw_insert_entries)
>>> +        ggtt->vm.raw_insert_entries(&ggtt->vm, dummy, 
>>> I915_CACHE_NONE, pte_flags);
>>> +    else
>>> +        ggtt->vm.insert_entries(&ggtt->vm, dummy, I915_CACHE_NONE, 
>>> pte_flags);
>>>   }
>>>   static void uc_fw_unbind_ggtt(struct intel_uc_fw *uc_fw)
>>> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c 
>>> b/drivers/gpu/drm/i915/i915_gpu_error.c
>>> index bff8a111424a..f9b1969ed7ed 100644
>>> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
>>> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
>>> @@ -1104,8 +1104,12 @@ i915_vma_coredump_create(const struct 
>>> intel_gt *gt,
>>>           for_each_sgt_daddr(dma, iter, vma_res->bi.pages) {
>>>               mutex_lock(&ggtt->error_mutex);
>>> -            ggtt->vm.insert_page(&ggtt->vm, dma, slot,
>>> -                         I915_CACHE_NONE, 0);
>>> +            if (ggtt->vm.raw_insert_page)
>>> +                ggtt->vm.raw_insert_page(&ggtt->vm, dma, slot,
>>> +                             I915_CACHE_NONE, 0);
>>> +            else
>>> +                ggtt->vm.insert_page(&ggtt->vm, dma, slot,
>>> +                             I915_CACHE_NONE, 0);
>>>               mb();
>>>               s = io_mapping_map_wc(&ggtt->iomap, slot, PAGE_SIZE);