[Intel-gfx] [PATCH] drm/i915: Refine VT-d scanout workaround

Thu Feb 11 17:17:46 UTC 2021

Quoting Matthew Auld (2021-02-11 17:00:20)
> Throwing some color_adjust at it might be another option to consider. 
> Maybe something like:
> 
> +static void i915_ggtt_color_adjust_vdt(const struct drm_mm_node *node,
> +                                      unsigned long color,
> +                                      u64 *start,
> +                                      u64 *end)
> +{
> +       if (color == COLOR_VDT) {
> +               *start += VDT_PADDING;
> +               *end -= VDT_PADDING;
> +               return;
> +       }
> +
> +       if (node->allocated && node->color == COLOR_VDT)
> +               *start += VDT_PADDING;
> +
> +       node = list_next_entry(node, node_list);
> +       if (node->allocated && node->color == COLOR_VDT)
> +               *end -= VDT_PADDING;
> +}
> 
> But not sure if I would call that simpler. Plus we need to add more 
> special casing in the eviction code. And I guess the cache coloring 
> might also be active here, which might be nasty. Also we end up with a 
> bunch of holes in the address space that are unusable, yet the mm search 
> will keep hitting them anyway. Ok, I guess that's what you meant with 
> not introducing "extra nodes". Hmmm.

I considered trying to use coloring, but the problem I found was in
knowing whether or not to fill outside of the vma with scratch pages. We
would have to lookup each PTE to check it is not in use, then fill with
scratch. And if we removed a neighbour, it would have to check to see if
it should replace the guard pages. (And it's more complicated by the
wrap-around of the VT-d, an object at beginning of the GGTT would
overfetch into the pages at the other end of the GGTT.)

I felt that make an explicit reservation and accounting the VT-d
overfetch to the scanout vma would save a lot of hassle. The PTE are
accounted for (will not be reused, and safe to clear after resume),
dedicated as scratch for the overfetch and the overfetch cannot wrap
around as we force the vma to be away from the edges.

Packing the information into the single scanout i915_vma so that we do a
single padded i915_vma_insert seemed to be much easier to manage than
having to do additional i915_vma_inserts on either side (and so we have
to somehow manage searching for enough space for all 3 in the first call
etc). Plus i915_vma is already massive :|
-Chris