[Intel-gfx] [PATCH] drm/i915/mtl: Increase guard pages when vt-d is enabled

Fri Nov 3 22:23:29 UTC 2023

Hi Andrzej,

The patch mentioned below does not help with the issue.

Thanks,
RK

> -----Original Message-----
> From: Hajda, Andrzej <andrzej.hajda at intel.com>
> Sent: Friday, November 3, 2023 2:18 PM
> To: Sripada, Radhakrishna <radhakrishna.sripada at intel.com>; Tvrtko Ursulin
> <tvrtko.ursulin at linux.intel.com>; intel-gfx at lists.freedesktop.org
> Cc: Chris Wilson <chris.p.wilson at linux.intel.com>; Vivi, Rodrigo
> <rodrigo.vivi at intel.com>
> Subject: Re: [Intel-gfx] [PATCH] drm/i915/mtl: Increase guard pages when vt-d is
> enabled
> 
> 
> 
> On 03.11.2023 16:53, Sripada, Radhakrishna wrote:
> > Hi Tvrtko,
> >
> >> -----Original Message-----
> >> From: Tvrtko Ursulin <tvrtko.ursulin at linux.intel.com>
> >> Sent: Friday, November 3, 2023 1:30 AM
> >> To: Sripada, Radhakrishna <radhakrishna.sripada at intel.com>; Hajda, Andrzej
> >> <andrzej.hajda at intel.com>; intel-gfx at lists.freedesktop.org
> >> Cc: Chris Wilson <chris.p.wilson at linux.intel.com>
> >> Subject: Re: [Intel-gfx] [PATCH] drm/i915/mtl: Increase guard pages when vt-d
> is
> >> enabled
> >>
> >>
> >> On 02/11/2023 22:14, Sripada, Radhakrishna wrote:
> >>> Hi Tvrtko,
> >>>
> >>>> -----Original Message-----
> >>>> From: Tvrtko Ursulin <tvrtko.ursulin at linux.intel.com>
> >>>> Sent: Thursday, November 2, 2023 10:41 AM
> >>>> To: Hajda, Andrzej <andrzej.hajda at intel.com>; Sripada, Radhakrishna
> >>>> <radhakrishna.sripada at intel.com>; intel-gfx at lists.freedesktop.org
> >>>> Cc: Chris Wilson <chris.p.wilson at linux.intel.com>
> >>>> Subject: Re: [Intel-gfx] [PATCH] drm/i915/mtl: Increase guard pages when
> vt-d
> >> is
> >>>> enabled
> >>>>
> >>>>
> >>>> On 02/11/2023 16:58, Andrzej Hajda wrote:
> >>>>> On 02.11.2023 17:06, Radhakrishna Sripada wrote:
> >>>>>> Experiments were conducted with different multipliers to VTD_GUARD
> >> macro
> >>>>>> with multiplier of 185 we were observing occasional pipe faults when
> >>>>>> running kms_cursor_legacy --run-subtest single-bo
> >>>>>>
> >>>>>> There could possibly be an underlying issue that is being
> >>>>>> investigated, for
> >>>>>> now bump the guard pages for MTL.
> >>>>>>
> >>>>>> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2017
> >>>>>> Cc: Gustavo Sousa <gustavo.sousa at intel.com>
> >>>>>> Cc: Chris Wilson <chris.p.wilson at linux.intel.com>
> >>>>>> Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada at intel.com>
> >>>>>> ---
> >>>>>>     drivers/gpu/drm/i915/gem/i915_gem_domain.c | 3 +++
> >>>>>>     1 file changed, 3 insertions(+)
> >>>>>>
> >>>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> >>>>>> b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> >>>>>> index 3770828f2eaf..b65f84c6bb3f 100644
> >>>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> >>>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
> >>>>>> @@ -456,6 +456,9 @@ i915_gem_object_pin_to_display_plane(struct
> >>>>>> drm_i915_gem_object *obj,
> >>>>>>         if (intel_scanout_needs_vtd_wa(i915)) {
> >>>>>>             unsigned int guard = VTD_GUARD;
> >>>>>> +        if (IS_METEORLAKE(i915))
> >>>>>> +            guard *= 200;
> >>>>>> +
> >>>>> 200 * VTD_GUARD = 200 * 168 * 4K = 131MB
> >>>>>
> >>>>> Looks insanely high, 131MB for padding, if this is before and after it
> >>>>> becomes even 262MB of wasted address per plane. Just signalling, I do
> >>>>> not know if this actually hurts.
> >>>> Yeah this feels crazy. There must be some other explanation which is
> >>>> getting hidden by the crazy amount of padding so I'd rather we figured
> >>>> it out.
> >>>>
> >>>> With 262MiB per fb how many fit in GGTT before eviction hits? N screens
> >>>> with double/triple buffering?
> >>> I believe with this method we will have to limit the no of frame buffers in the
> >> system. One alternative
> >>> that worked is to do a proper clear range for the ggtt instead of doing a nop.
> >> Although it adds marginal
> >>> time during suspend/resume/boot it does not add restrictions to the no of
> fb's
> >> that can be used.
> >>
> >> And if we remember the guard pages replaced clearing to scratch, to
> >> improve suspend resume times, exactly for improving user experience. :(
> >>
> >> Luckily there is time to fix this properly on MTL one way or the other.
> >> Is it just kms_cursor_legacy --run-subtest single-bo that is affected?
> > I am trying to dump the page table entries at the time of failure for bot the fame
> buffer and if required
> > For the guard pages. Will see if I get some info from there.
> >
> > Yes the test kms_cursor_legacy is used to reliably reproduce. Looking at public
> CI, I also see pipe errors
> > being reported with varying occurrences while running kms_cursor_crc,
> kms_pipe_crc_basic,
> > and kms_plane_scaling. More details on the occurrence can be found here [1].
> >
> > Thanks,
> > RK
> >
> > 1. http://gfx-ci.igk.intel.com/cibuglog-
> ng/results/knownfailures?query_key=d9c3297dd17dda35a6c638eb96b3139bd1a
> 6633c
> 
> Could you check if [1] helps?
> 
> [1]: https://patchwork.freedesktop.org/series/125926/
> 
> Regards
> Andrzej
> 
> >> Regards,
> >>
> >> Tvrtko
> >>
> >>
> >>>> Regards,
> >>>>
> >>>> Tvrtko
> >>>>
> >>>> P.S. Where did the 185 from the commit message come from?
> >>> 185 came from experiment to increase the guard size. It is not a standard
> >> number.
> >>> Regards,
> >>> Radhakrishna(RK) Sripada