[Intel-gfx] [PATCH] drm/i915: Do not set cache_dirty for DGFX

Wed Nov 2 16:35:05 UTC 2022

On Wed, Nov 02, 2022 at 05:09:27PM +0100, Das, Nirmoy wrote:
>
>On 11/2/2022 11:36 AM, Matthew Auld wrote:
>>On 02/11/2022 07:39, Das, Nirmoy wrote:
>>>
>>>On 11/2/2022 6:14 AM, Niranjana Vishwanathapura wrote:
>>>>Currently on DG1, which do not have LLC, we hit the below
>>>>warning while rebinding an userptr invalidated object.
>>>>
>>>>WARNING: CPU: 4 PID: 13008 at 
>>>>drivers/gpu/drm/i915/gem/i915_gem_pages.c:34 
>>>>__i915_gem_object_set_pages+0x296/0x2d0 [i915]
>>>>...
>>>>RIP: 0010:__i915_gem_object_set_pages+0x296/0x2d0 [i915]
>>>>...
>>>>Call Trace:
>>>>  <TASK>
>>>>  i915_gem_userptr_get_pages+0x175/0x1a0 [i915]
>>>>  ____i915_gem_object_get_pages+0x32/0xb0 [i915]
>>>>  i915_gem_object_userptr_submit_init+0x286/0x470 [i915]
>>>>  eb_lookup_vmas+0x2ff/0xcf0 [i915]
>>>>  ? __intel_wakeref_get_first+0x55/0xb0 [i915]
>>>>  i915_gem_do_execbuffer+0x785/0x21d0 [i915]
>>>>  i915_gem_execbuffer2_ioctl+0xe7/0x3d0 [i915]
>>>>
>>>>We shouldn't be setting the obj->cache_dirty for DGFX,
>>>>fix it.
>>>
>>>With Fixes: |d70af57944 |("drm/i915/shmem: ensure flush during 
>>>swap-in on non-LLC")
>>>

Ok, will add.

>>>Acked-by: Nirmoy Das <nirmoy.das at intel.com>
>>
>>Any idea why this escaped our testing in CI? Perhaps something to 
>>improve.
>
>
>I ran some userptr related igt tests none hit 
>__i915_gem_object_release_shmem . So I think we are missing
>
>coverage here or I/CI isn't running such test.
>
>Niranjana, what test did you ran to hit this case WARN ?
>

I hit this issue with modified gem_userptr_blits at vma-merge where
I added additional execbuf call after userptr invalidation as below
to test rebind happens properly after an userptr invalidation.

         igt_spin_end(spin);
+       igt_spin_reset(spin);
+
+       gem_execbuf_wr(i915, &spin->execbuf);
+       igt_spin_end(spin);
+
         gem_close(i915, handle);

         munmap(addr, sz);

Note that vma-merge subtest fails due to some other issue, but still
is good enough to reproduce this issue and test the fix.

Niranjana

>
>Regards,
>
>Nirmoy
>
>
>>
>>Reviewed-by: Matthew Auld <matthew.auld at intel.com>
>>
>>>
>>>>Suggested-by: Matthew Auld<matthew.auld at intel.com>
>>>>Reported-by: Niranjana 
>>>>Vishwanathapura<niranjana.vishwanathapura at intel.com>
>>>>Signed-off-by: Niranjana 
>>>>Vishwanathapura<niranjana.vishwanathapura at intel.com>
>>>>---
>>>>  drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 4 ++--
>>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c 
>>>>b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
>>>>index 11125c32dd35..2f7804492cd5 100644
>>>>--- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
>>>>+++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
>>>>@@ -369,14 +369,14 @@ __i915_gem_object_release_shmem(struct 
>>>>drm_i915_gem_object *obj,
>>>>        __start_cpu_write(obj);
>>>>      /*
>>>>-     * On non-LLC platforms, force the flush-on-acquire if this 
>>>>is ever
>>>>+     * On non-LLC igfx platforms, force the flush-on-acquire if 
>>>>this is ever
>>>>       * swapped-in. Our async flush path is not trust worthy 
>>>>enough yet(and
>>>>       * happens in the wrong order), and with some tricks it's 
>>>>conceivable
>>>>       * for userspace to change the cache-level to 
>>>>I915_CACHE_NONE after the
>>>>       * pages are swapped-in, and since execbuf binds the 
>>>>object before doing
>>>>       * the async flush, we have a race window.
>>>>       */
>>>>-    if (!HAS_LLC(i915))
>>>>+    if (!HAS_LLC(i915) && !IS_DGFX(i915))
>>>>          obj->cache_dirty = true;
>>>>  }