â Fi.CI.BAT: failure for drm/i915: (stolen) memory region related fixes

Ville Syrjälä ville.syrjala at linux.intel.com
Thu Dec 14 23:01:39 UTC 2023


On Thu, Dec 14, 2023 at 12:06:58PM +0100, Andrzej Hajda wrote:
> 
> 
> On 13.12.2023 17:13, Andrzej Hajda wrote:
> > On 13.12.2023 02:37, Patchwork wrote:
> >> *Patch Details*
> >> *Series:*    drm/i915: (stolen) memory region related fixes
> >> *URL:*    https://patchwork.freedesktop.org/series/127721/ 
> >> <https://patchwork.freedesktop.org/series/127721/>
> >> *State:*    failure
> >> *Details:* 
> >> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_127721v1/index.html 
> >> <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_127721v1/index.html>
> >>
> >>
> >>   CI Bug Log - changes from CI_DRM_14010 -> Patchwork_127721v1
> >>
> >>
> >>     Summary
> >>
> >> *FAILURE*
> >>
> >> Serious unknown changes coming with Patchwork_127721v1 absolutely need 
> >> to be
> >> verified manually.
> >>
> >> If you think the reported changes have nothing to do with the changes
> >> introduced in Patchwork_127721v1, please notify your bug team 
> >> (I915-ci-infra at lists.freedesktop.org) to allow them
> >> to document this new failure mode, which will reduce false positives 
> >> in CI.
> >>
> >> External URL: 
> >> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_127721v1/index.html
> >>
> >>
> >>     Participating hosts (31 -> 33)
> >>
> >> Additional (4): bat-dg2-8 bat-dg2-9 bat-mtlp-8 fi-pnv-d510
> >> Missing (2): bat-adlp-11 fi-snb-2520m
> >>
> >>
> >>     Possible new issues
> >>
> >> Here are the unknown changes that may have been introduced in 
> >> Patchwork_127721v1:
> >>
> >>
> >>       IGT changes
> >>
> >>
> >>         Possible regressions
> >>
> >>   *
> >>
> >>     igt at i915_module_load@load:
> >>
> >>       o bat-mtlp-8: NOTRUN -> INCOMPLETE
> >>         
> >> <https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_127721v1/bat-mtlp-8/igt@i915_module_load@load.html>
> > 
> > 
> > This is due to overlapping initial fb's vma with GuC reserved area on 
> > MTL, see ggtt_reserve_guc_top.
> > 
> > My dirty quick_fix proposed:
> > @@ -143,6 +143,9 @@ initial_plane_vma(struct drm_i915_private *i915,
> >          if (IS_ERR(vma))
> >                  goto err_obj;
> > 
> > +       if (base + size > GUC_GGTT_TOP)
> > +               base = min(base, GUC_GGTT_TOP) - size;
> > +
> >          pinctl = PIN_GLOBAL | PIN_OFFSET_FIXED | base;
> >          if (HAS_GMCH(i915))
> >                  pinctl |= PIN_MAPPABLE;
> 
> 
> Copy/Paste Ville response for this proposition from another thread:
> 
> On 13.12.2023 17:03, Ville Syrjälä wrote:
>  >
>  > This is not a solution. The correct fix is either:
>  > 1. fix the guc code to not insist on a fixed chunk of the ggtt
>  >    and instead allocate it normally like a good boy. It could
>  >    still try to allocate as high as possible to ideally
>  >    land in the GUC_GGTT_TOP range
> 
> This would be the best solution from initial plane PoV for sure, I am 
> not sure about GuC PoV.

I was going to submit a patch to simply change this to insert(PIN_HIGH)
as that is all that is needed from the GuC FW POV. And whether the
GOP framebuffer or the GuC FW lives there doesn't matter one bit,
both fit the bill of not needing to be accessed by the GuC.

But I suspect this function might be serving a double duty, despite
not saying so anywhere. Basically I think it's perhaps used as
a quick and dirty way to restrict everything below GUC_GGTT_TOP,
presumably because we don't otherwise restrict allocations
below that even if the GuC needs to access them.

So the insert(PIN_HIGH) approach could result in some troubles
iff the GOP framebuffer is preventing the uc_fw to be placed at
GUC_GGTT_TOP and we later free the GOP framebuffer (happens if
we find it unsuitable due to one reason or another). This opens
up a hole above GUC_GGTT_TOP again and something else could
get bound there.

Is my assumption about ggtt_reserve_guc_top() correct? And
if so do we know what are all the things the GuC needs to access
so that we could properly restrict those below GUC_GGTT_TOP
and get rid of this weird way of achieving that same result?

> 
>  >
>  > 2. relocate the display vma to a different part of the ggtt
> 
> I think this point is what I have proposed.
> 
>  >
>  >
>  > 1. should be far simpler by the looks of it, as 2. would involve:
>  > a) pinning the same object into two places in the ggtt simultanously
>  >    I don't think we have support for that except for special ggtt views,
>  >    and xe doesn't have even that (should we need this trick there as 
> well)
> 
> AFAIU the fb is not yet pinned in terms of kernel structures, so it 
> doesn't seems to be an issue.

Hmm. Yeah, I suppose we don't really care where in the ggtt the thing
is bound as we rewrite the PTEs anyway when we bind the vma. So
whether we bind it into its current place or not doesn't really
matter. Well, assuming we move the thing to a non-overlapping section
of the ggtt. Any overlap with the current positon would clobber the
currently used PTEs and cause display corruption.

If we did try this we'd also have to think where to move it. Due
the requirement of no overlaps I suppose offset 0 might be the most
obvious choice. That would also avoid permanently fragmentating
the ggtt.

Though I suppose one has to wonder why the GOP doesn't just use
offset 0 anyway. Is there perhaps some other weird requirement
that needs the bottom end of ggtt to be available for something
else?

> Of course there is problem with PLANE_SURF still pointing to old VA, it 
> should be replaced with new VA before ggtt will be populated with new stuff.
> 
>  >
>  > b) flip the plane(s) over to the relocated vma
>  > c) wait for vblank(s)
>  > d) discard the original vma
>  > e) all of that would need to happen pretty early so we may not have
>  >    a lot of the required normal machinery fully up and running yet
> 
> Async update to PLANE_SURF shouldn't be enough?

Async flips:
- Can't be assume to be universally available
- Don't really help anyway as they don't happen instaneously.
  Ie. you still need to wait for them to have happened
  before clobbering old the PTEs.

-- 
Ville Syrjälä
Intel


More information about the Intel-gfx mailing list