[Intel-gfx] [RFC] drm/i915/tgl: Advanced preparser support for GPU relocs

Chris Wilson chris at chris-wilson.co.uk
Fri Aug 23 17:28:38 UTC 2019


Quoting Daniele Ceraolo Spurio (2019-08-23 18:01:03)
> 
> 
> On 8/23/19 9:31 AM, Chris Wilson wrote:
> > Quoting Daniele Ceraolo Spurio (2019-08-23 16:56:54)
> >>
> >>
> >> On 8/23/19 8:52 AM, Chris Wilson wrote:
> >>> Quoting Daniele Ceraolo Spurio (2019-08-23 16:39:14)
> >>>>
> >>>>
> >>>> On 8/23/19 8:28 AM, Chris Wilson wrote:
> >>>>> Quoting Chris Wilson (2019-08-23 16:10:48)
> >>>>>> Quoting Daniele Ceraolo Spurio (2019-08-23 16:05:45)
> >>>>>>>
> >>>>>>>
> >>>>>>> On 8/23/19 7:26 AM, Chris Wilson wrote:
> >>>>>>>> Quoting Chris Wilson (2019-08-23 08:27:25)
> >>>>>>>>> Quoting Daniele Ceraolo Spurio (2019-08-23 03:09:09)
> >>>>>>>>>> TGL has an improved CS pre-parser that can now pre-fetch commands across
> >>>>>>>>>> batch boundaries. This improves performances when lots of small batches
> >>>>>>>>>> are used, but has an impact on self-modifying code. If we want to modify
> >>>>>>>>>> the content of a batch from another ring/batch, we need to either
> >>>>>>>>>> guarantee that the memory location is updated before the pre-parser gets
> >>>>>>>>>> to it or we need to turn the pre-parser off around the modification.
> >>>>>>>>>> In i915, we use self-modifying code only for GPU relocations.
> >>>>>>>>>>
> >>>>>>>>>> The pre-parser fetches across memory synchronization commands as well,
> >>>>>>>>>> so the only way to guarantee that the writes land before the parser gets
> >>>>>>>>>> to it is to have more instructions between the sync and the destination
> >>>>>>>>>> than the parser FIFO depth, which is not an optimal solution.
> >>>>>>>>>
> >>>>>>>>> Well, our ABI is that memory is coherent before the breadcrumb of *each*
> >>>>>>>>> batch. That is a fundamental requirement for our signaling to userspace.
> >>>>>>>>> Please tell me that there is a context flag to turn this off, or we else
> >>>>>>>>> we need to emit 32x flushes or whatever it takes.
> >>>>>>>>
> >>>>>>> Are you referring to the specific case where we have a request modifying
> >>>>>>> an object that is then used as a batch in the next request? Because
> >>>>>>> coherency of objects that are not executed as batches is not impacted.
> >>>>>>
> >>>>>> "Fetches across memory sync" sounds like a major ABI break. The batches
> >>>>>> are a hard serialisation barrier, with memory coherency guaranteed prior
> >>>>>> to the signaling at the end of one batch and clear caches guaranteed at
> >>>>>> the start of the next.
> >>>>>
> >>>>> We have relocs, oa and sseu all using self-modifying code. I expect we
> >>>>> will have PTE modifications and much more done via the GPU in the near
> >>>>> future. All rely on the CS_STALL doing exactly what it says on the tin.
> >>>>> -Chris
> >>>>>
> >>>>
> >>>> I guess the easiest solution is then to keep the parser off outside of
> >>>> user batches. We can default to off and then restore what the user has
> >>>> programmed before the BBSTART. It's not a breach of contract if we say
> >>>> that if you opt-in to the parser then you need to make sure your batches
> >>>> are not self-modifying, right?
> >>>
> >>> Is it just the MI_ARB_ONOFF bits, and is that still a privileged
> >>> command? i.e. can userspace change mode by itself, or it is a
> >>> context-param?
> >>
> >> It's the ARB_CHECK, not the ARB_ONOFF, so yes, it is not privileged and
> >> userspace can modify it itself. It would've been easier if it was a
> >> context param :)
> > 
> > Does it go across a context switch? That might be an easy solution for
> > our internal requests (already true for oa/sseu where we use one context
> > to modify another). I do worry though if there might be leakage
> > across our flush-invalidate barriers between userspace batches.
> 
> The pre-fetching? no, AFAIK that's confined to the context, so moving 
> the relocs to another context would work; the status of the parser is 
> also ctx save/restored. What leakage case are you worried about? The 
> memory synchronization between contexts is unchanged, a context can only 
> mess up its own instructions. AFAIU the only thing that's possible is 
> that if a batch modifies the next batch in the same context then the CS 
> won't see the update.

Our mission is to ensure that one user cannot mess with another. So if
the context boundary is solid, and the user cannot interfere with the
TLBs (or at least not able to manipulate an invalid lookup into fetching
other data), then the flush-invalidate should be solid between batches. 
(One always expects the worst though, and we already know the horrors of
speculative fetches from the CPU filling the cache with stale data and
now it seems like the GPU may be learning similar evil tricks...)

I'm still unnerved though about the prospect of a soft bottom-of-pipe
sync. But that's just standard fare for pipecontrols.

Ok, it should be easy enough to create a context on demand for the
"rare" GPU relocs and I never have to worry about this again. :)
-Chris


More information about the Intel-gfx mailing list