[Intel-gfx] [PATCH] drm/i915/mtl: Increase guard pages when vt-d is enabled

Andrzej Hajda andrzej.hajda at intel.com
Fri Nov 3 21:18:04 UTC 2023



On 03.11.2023 16:53, Sripada, Radhakrishna wrote:
> Hi Tvrtko,
>
>> -----Original Message-----
>> From: Tvrtko Ursulin <tvrtko.ursulin at linux.intel.com>
>> Sent: Friday, November 3, 2023 1:30 AM
>> To: Sripada, Radhakrishna <radhakrishna.sripada at intel.com>; Hajda, Andrzej
>> <andrzej.hajda at intel.com>; intel-gfx at lists.freedesktop.org
>> Cc: Chris Wilson <chris.p.wilson at linux.intel.com>
>> Subject: Re: [Intel-gfx] [PATCH] drm/i915/mtl: Increase guard pages when vt-d is
>> enabled
>>
>>
>> On 02/11/2023 22:14, Sripada, Radhakrishna wrote:
>>> Hi Tvrtko,
>>>
>>>> -----Original Message-----
>>>> From: Tvrtko Ursulin <tvrtko.ursulin at linux.intel.com>
>>>> Sent: Thursday, November 2, 2023 10:41 AM
>>>> To: Hajda, Andrzej <andrzej.hajda at intel.com>; Sripada, Radhakrishna
>>>> <radhakrishna.sripada at intel.com>; intel-gfx at lists.freedesktop.org
>>>> Cc: Chris Wilson <chris.p.wilson at linux.intel.com>
>>>> Subject: Re: [Intel-gfx] [PATCH] drm/i915/mtl: Increase guard pages when vt-d
>> is
>>>> enabled
>>>>
>>>>
>>>> On 02/11/2023 16:58, Andrzej Hajda wrote:
>>>>> On 02.11.2023 17:06, Radhakrishna Sripada wrote:
>>>>>> Experiments were conducted with different multipliers to VTD_GUARD
>> macro
>>>>>> with multiplier of 185 we were observing occasional pipe faults when
>>>>>> running kms_cursor_legacy --run-subtest single-bo
>>>>>>
>>>>>> There could possibly be an underlying issue that is being
>>>>>> investigated, for
>>>>>> now bump the guard pages for MTL.
>>>>>>
>>>>>> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2017
>>>>>> Cc: Gustavo Sousa <gustavo.sousa at intel.com>
>>>>>> Cc: Chris Wilson <chris.p.wilson at linux.intel.com>
>>>>>> Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada at intel.com>
>>>>>> ---
>>>>>>     drivers/gpu/drm/i915/gem/i915_gem_domain.c | 3 +++
>>>>>>     1 file changed, 3 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>>>>>> b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>>>>>> index 3770828f2eaf..b65f84c6bb3f 100644
>>>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>>>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>>>>>> @@ -456,6 +456,9 @@ i915_gem_object_pin_to_display_plane(struct
>>>>>> drm_i915_gem_object *obj,
>>>>>>         if (intel_scanout_needs_vtd_wa(i915)) {
>>>>>>             unsigned int guard = VTD_GUARD;
>>>>>> +        if (IS_METEORLAKE(i915))
>>>>>> +            guard *= 200;
>>>>>> +
>>>>> 200 * VTD_GUARD = 200 * 168 * 4K = 131MB
>>>>>
>>>>> Looks insanely high, 131MB for padding, if this is before and after it
>>>>> becomes even 262MB of wasted address per plane. Just signalling, I do
>>>>> not know if this actually hurts.
>>>> Yeah this feels crazy. There must be some other explanation which is
>>>> getting hidden by the crazy amount of padding so I'd rather we figured
>>>> it out.
>>>>
>>>> With 262MiB per fb how many fit in GGTT before eviction hits? N screens
>>>> with double/triple buffering?
>>> I believe with this method we will have to limit the no of frame buffers in the
>> system. One alternative
>>> that worked is to do a proper clear range for the ggtt instead of doing a nop.
>> Although it adds marginal
>>> time during suspend/resume/boot it does not add restrictions to the no of fb's
>> that can be used.
>>
>> And if we remember the guard pages replaced clearing to scratch, to
>> improve suspend resume times, exactly for improving user experience. :(
>>
>> Luckily there is time to fix this properly on MTL one way or the other.
>> Is it just kms_cursor_legacy --run-subtest single-bo that is affected?
> I am trying to dump the page table entries at the time of failure for bot the fame buffer and if required
> For the guard pages. Will see if I get some info from there.
>
> Yes the test kms_cursor_legacy is used to reliably reproduce. Looking at public CI, I also see pipe errors
> being reported with varying occurrences while running kms_cursor_crc, kms_pipe_crc_basic,
> and kms_plane_scaling. More details on the occurrence can be found here [1].
>
> Thanks,
> RK
>
> 1. http://gfx-ci.igk.intel.com/cibuglog-ng/results/knownfailures?query_key=d9c3297dd17dda35a6c638eb96b3139bd1a6633c

Could you check if [1] helps?

[1]: https://patchwork.freedesktop.org/series/125926/

Regards
Andrzej

>> Regards,
>>
>> Tvrtko
>>
>>
>>>> Regards,
>>>>
>>>> Tvrtko
>>>>
>>>> P.S. Where did the 185 from the commit message come from?
>>> 185 came from experiment to increase the guard size. It is not a standard
>> number.
>>> Regards,
>>> Radhakrishna(RK) Sripada



More information about the Intel-gfx mailing list