[Intel-xe] [PATCH i-g-t] tests/xe/xe_intel_bb: ensure valid next page

Wed Jun 7 11:14:00 UTC 2023

On 07/06/2023 12:09, Zbigniew Kempczyński wrote:
> On Wed, Jun 07, 2023 at 11:27:41AM +0100, Matthew Auld wrote:
>> On 07/06/2023 10:55, Zbigniew Kempczyński wrote:
>>> On Fri, Jun 02, 2023 at 12:48:17PM +0100, Matthew Auld wrote:
>>>> Due to over-fetch, recommendation is to ensure we have a single valid
>>>> extra page beyond the batch. We currently lack this which seems to
>>>> explain why xe_intel_bb at full-batch generates CAT errors.
>>>>
>>>> Currently we allow using the last GTT page, but this looks to be no-go,
>>>> since the next page will be beyond the actual GTT, in the case of
>>>> full-batch. The i915 path looks to already account for this. However
>>>> even with that fixed, Xe doesn't use scratch pages by default so the
>>>> next page will still not be valid.
>>>>
>>>> With Xe rather expect that callers know about HW over-fetch, ensuring
>>>> that the batch has an extra page, if needed. Alternatively we could
>>>> apply the DRM_XE_VM_CREATE_SCRATCH_PAGE when creating the vm, but really
>>>> we want to get away from such things.
>>>>
>>>> Bspec: 60223
>>>> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/262
>>>> Signed-off-by: Matthew Auld <matthew.auld at intel.com>
>>>> Cc: Maarten Lankhorst <maarten.lankhorst at linux.intel.com>
>>>> Cc: Thomas Hellström <thomas.hellstrom at linux.intel.com>
>>>
>>> I observe this not only on the last page. I've introduced
>>>
>>> index 3cd680072e..ef3f782df7 100644
>>> --- a/lib/intel_batchbuffer.c
>>> +++ b/lib/intel_batchbuffer.c
>>> @@ -947,7 +947,7 @@ __intel_bb_create(int fd, uint32_t ctx, uint32_t vm, const intel_ctx_cfg_t *cfg,
>>>                   /* Limit to 48-bit due to MI_* address limitation */
>>>                   ibb->gtt_size = 1ull << min_t(uint32_t, xe_va_bits(fd), 48);
>>> -               end = ibb->gtt_size;
>>> +               end = ibb->gtt_size - xe_get_default_alignment(fd)*2;
>>>
>>> or even:
>>>
>>> +               end = ibb->gtt_size/2;
>>>
>>> Having:
>>>
>>> ADLP:~/igt-upstream/build/tests# ./xe_intel_bb --run full-batch --debug
>>> IGT-Version: 1.27.1-NO-GIT (x86_64) (Linux: 6.3.0-xe+ x86_64)
>>> Opened device: /dev/dri/card0
>>> (xe_intel_bb:11420) drmtest-DEBUG: Test requirement passed: !(fd<0)
>>> (xe_intel_bb:11420) intel_bufops-DEBUG: generation: 12, supported tiles: 0x3f, driver: xe
>>> Starting subtest: full-batch
>>> (xe_intel_bb:11420) intel_allocator_simple-DEBUG: Using simple allocator
>>> (xe_intel_bb:11420) intel_batchbuffer-DEBUG: Run on DRM_XE_ENGINE_CLASS_COPY
>>> (xe_intel_bb:11420) intel_batchbuffer-DEBUG: bind: MAP
>>> (xe_intel_bb:11420) intel_batchbuffer-DEBUG:   handle: 1, offset: ffffffffd000, size: 1000
>>> (xe_intel_bb:11420) intel_batchbuffer-DEBUG: bind: UNMAP
>>> (xe_intel_bb:11420) intel_batchbuffer-DEBUG:   offset: ffffffffd000, size: 1000
>>>
>>> And I still observe failure, what means problem is with prefetching from
>>> next page, regardless location.
>>
>> Yeah, AFAICT the CAT error will trigger for any overfetch that hits an bogus
>> GTT address. Either because it goes beyond the GTT boundary or because the
>> PTE is marked as invalid and we aren't able to fault it in it.
>>
>>>
>>> That means imo umd and any consumer have to overallocate to avoid
>>> hitting:
>>>
>>> [ 1406.062731] xe 0000:00:02.0: [drm] Engine memory cat error: guc_id=2
>>> [ 1406.070519] xe 0000:00:02.0: [drm] Timedout job: seqno=4294967169, guc_id=2, flags=0x4
>>> [ 1406.070535] xe 0000:00:02.0: [drm:xe_devcoredump [xe]] Multiple hangs are occurring, but only the first snapshot was taken
>>> [ 1406.071182] xe 0000:00:02.0: [drm] Engine reset: guc_id=2
>>>
>>> So from my perspective you've avoided failure but that wasn't my intention
>>> when I've written this test. Do you maybe know does umd has mitigation
>>> code which prevents of entering area which starts prefetching from next
>>> page?
>>
>> Userspace should be aware of such HW behaviour I think. In Mesa + Xe I think
>> it still currently forces scratch pages, but AFAIK they want to get rid of
>> that once they are sure that all potential overfetch can't trigger CAT
>> errors or similar. If the vm doesn't have scratch pages, then I think it's
>> up to userspace to deal with overfetch, and I think that must involve
>> inflating the batch size or somehow ensuring the next page always has
>> something valid bound.
>>
>> Do you know if this test is specifically trying to poke at the overfetch
>> behaviour, or does this test just want a PAGE_SIZE worth of NOOPS + BB_END?
>> I wasn't sure given that currently on DG2/ATS-M the batch is always going to
>> be 64K underneath, and so the overfetch is never going to find issues on
>> such platforms.
> 
> You're right, for discrete instead of PAGE_SIZE I should pick 64K
> (xe_get_default_alignment()).
> 
>>
>> If the test cares specifically about triggering overfetch we could maybe
>> enable scratch pages for the vm in full-batch, and on DG2/ATS-M also do
>> s/PAGE_SIZE/64K/ or use system memory for the batch. And then ofc ensure the
>> last GTT page is never given out. If we later decide to remove scratch pages
>> completely in Xe then I guess we just remove the test. What do you think?
> 
> Test was written to exercise last page on i915. Looks we got no influence
> on prefetching we always get the error when we enter last 512B if I'm not
> wrong. Scratch pages are not default option for intel-bb vm and I think it
> never will be so I would just remove the test.

Ok, will remove instead. Thanks for taking a look.

>   
> --
> Zbigniew
> 
>>
>>>
>>> --
>>> Zbigniew
>>>
>>>
>>>> ---
>>>>    lib/intel_batchbuffer.c | 6 ++++++
>>>>    tests/xe/xe_intel_bb.c  | 8 +++++++-
>>>>    2 files changed, 13 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/lib/intel_batchbuffer.c b/lib/intel_batchbuffer.c
>>>> index 3cd680072..035facfc4 100644
>>>> --- a/lib/intel_batchbuffer.c
>>>> +++ b/lib/intel_batchbuffer.c
>>>> @@ -881,6 +881,12 @@ static inline uint64_t __intel_bb_get_offset(struct intel_bb *ibb,
>>>>     * passed in. If this is the case, it copies the information over to the
>>>>     * newly created batch buffer.
>>>>     *
>>>> + * NOTE: On Xe scratch pages are not used by default. Due to over-fetch (~512
>>>> + * bytes) there might need to be a valid next page to avoid hangs or CAT errors
>>>> + * if the batch is quite large and approaches the end boundary of the batch
>>>> + * itself.  Inflate the @size to ensure there is a valid next page in such
>>>> + * cases.
>>>> + *
>>>>     * Returns:
>>>>     *
>>>>     * Pointer the intel_bb, asserts on failure.
>>>> diff --git a/tests/xe/xe_intel_bb.c b/tests/xe/xe_intel_bb.c
>>>> index 755cc530e..af8462af5 100644
>>>> --- a/tests/xe/xe_intel_bb.c
>>>> +++ b/tests/xe/xe_intel_bb.c
>>>> @@ -952,7 +952,13 @@ static void full_batch(struct buf_ops *bops)
>>>>    	struct intel_bb *ibb;
>>>>    	int i;
>>>> -	ibb = intel_bb_create(xe, PAGE_SIZE);
>>>> +	/*
>>>> +	 * Add an extra page to ensure over-fetch always sees a valid next page,
>>>> +	 * which includes not going beyond the actual GTT, and ensuring we have
>>>> +	 * a valid GTT entry, given that on xe we don't use scratch pages by
>>>> +	 * default.
>>>> +	 */
>>>> +	ibb = intel_bb_create(xe, 2 * PAGE_SIZE);
>>>>    	if (debug_bb)
>>>>    		intel_bb_set_debug(ibb, true);
>>>> -- 
>>>> 2.40.1
>>>>