[Intel-gfx] [PATCH] gen8_ppgtt: Use correct huge page manager for MTL

Tue Feb 21 16:32:54 UTC 2023

On 21/02/2023 16:28, Cavitt, Jonathan wrote:
> -----Original Message-----
> From: Auld, Matthew <matthew.auld at intel.com>
> Sent: Tuesday, February 21, 2023 8:06 AM
> To: Cavitt, Jonathan <jonathan.cavitt at intel.com>; intel-gfx at lists.freedesktop.org
> Cc: Dutt, Sudeep <sudeep.dutt at intel.com>; Siddiqui, Ayaz A <ayaz.siddiqui at intel.com>
> Subject: Re: [PATCH] gen8_ppgtt: Use correct huge page manager for MTL
>>
>> On 17/02/2023 19:18, Jonathan Cavitt wrote:
>>> MTL currently uses gen8_ppgtt_insert_huge when managing huge pages.  This is because
>>> MTL reports as not supporting 64K pages, or more accurately, the system that reports
>>> whether a platform has 64K pages reports false for MTL.  This is only half correct,
>>> as the 64K page support reporting system only cares about 64K page support for LMEM,
>>> which MTL doesn't have.
>>>
>>> MTL should be using xehpsdv_ppgtt_insert_huge.  However, simply changing over to
>>> using that manager doesn't resolve the issue because MTL is expecting the virtual
>>> address space for the page table to be flushed after initialization, so we must also
>>> add a flush statement there.
>>>
>>> Signed-off-by: Jonathan Cavitt <jonathan.cavitt at intel.com>
>> Reviewed-by: Matthew Auld <matthew.auld at intel.com>
>>
>> Although it looks like the hugepage mock tests are failing with this. I
>> assume the mock device just uses some "max" gen version or so, which now
>> triggers this path. Any ideas for that?
> 
> With this patch applied, multiple calls to the hugepages live selftest result in a kernel panic.
> If the mock tests are run immediately after the live ones, that would explain this behavior.
> I was informed when this was initially debugged that the error was a known IOMMU issue
> rather than some novel regression, though it's hard to tell if that was just hopeful optimism
> or not at this point.

In the test results we now get:

6> [183.420316] i915: Running 
i915_gem_huge_page_mock_selftests/igt_mock_exhaust_device_supported_pages
<6> [183.436978] i915: Running 
i915_gem_huge_page_mock_selftests/igt_mock_memory_region_huge_pages
<6> [183.445777] i915: Running 
i915_gem_huge_page_mock_selftests/igt_mock_ppgtt_misaligned_dma
<6> [183.904531] i915: Running 
i915_gem_huge_page_mock_selftests/igt_mock_ppgtt_huge_fill
<3> [183.912658] gtt=69632, expected=4096, size=69632, single=yes
<3> [183.912784] i915/i915_gem_huge_page_mock_selftests: 
igt_mock_ppgtt_huge_fill failed with error -22

I didn't look any deeper than that though. Note that this a just a 
mock/fake device. I don't think its IOMMU related.

> -Jonathan Cavitt
> 
>>
>>> ---
>>>    drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 3 ++-
>>>    1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>>> index 4daaa6f55668..9c571185395f 100644
>>> --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>>> +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c
>>> @@ -570,6 +570,7 @@ xehpsdv_ppgtt_insert_huge(struct i915_address_space *vm,
>>>    			}
>>>    		} while (rem >= page_size && index < max);
>>>    
>>> +		drm_clflush_virt_range(vaddr, PAGE_SIZE);
>>>    		vma_res->page_sizes_gtt |= page_size;
>>>    	} while (iter->sg && sg_dma_len(iter->sg));
>>>    }
>>> @@ -707,7 +708,7 @@ static void gen8_ppgtt_insert(struct i915_address_space *vm,
>>>    	struct sgt_dma iter = sgt_dma(vma_res);
>>>    
>>>    	if (vma_res->bi.page_sizes.sg > I915_GTT_PAGE_SIZE) {
>>> -		if (HAS_64K_PAGES(vm->i915))
>>> +		if (GRAPHICS_VER_FULL(vm->i915) >= IP_VER(12, 50))
>>>    			xehpsdv_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
>>>    		else
>>>    			gen8_ppgtt_insert_huge(vm, vma_res, &iter, cache_level, flags);
>>