[Intel-gfx] [PATCH 02/21] drm/i915/gtt: Workaround for HW preload not flushing pdps

Thu Aug 13 07:56:13 PDT 2015

On Thu, Aug 13, 2015 at 01:03:30PM +0100, Dave Gordon wrote:
> On 13/08/15 12:42, Dave Gordon wrote:
> >On 13/08/15 11:12, Michel Thierry wrote:
> >>On 8/13/2015 5:08 PM, Zhiyuan Lv wrote:
> >>>Hi Michel,
> >>>
> >>>Thanks for the reply!
> >>>
> >>>I yet have another question: right now the mark_tlb_dirty() will be
> >>>called if any level of PPGTT table is changed. But for the EXECLIST
> >>>context submission, we only need LRI commands if there are L3 PDP root
> >>>pointer changes right? Thanks!
> >>
> >>mark_tlbs_dirty is not only for execlists mode, we re-used it since it
> >>was already there.
> >>
> >>The update is only required when a PDP is allocated.
> >>
> >>-Michel
> >
> >Doesn't that depend on whether the context is running? The LRI reload
> >has the side effect of flushing all current knowledge of mappings, so
> >every level of PD gets refreshed from memory.
> >
> >If we're not updating the top level PDPs, and we know the context isn't
> >active, then we *assume* that lower-level PDs will be refreshed when the
> >context is next loaded. (This hasn't been true on all hardware, some of
> >which cached previously-retrieved PDs across ctx save-and-reload, and
> >that's one reason why there's a "Force PD Restore" bit, but we've been
> >told not to use it on current h/w). AFAICT, current chips don't cache
> >previous PDs and don't need the "Force" bit for this case.
> >
> >OTOH, if we don't know whether the context is running, then we can't be
> >sure when (or whether) any PD updates will be seen. As long as the
> >changes of interest are only ever *from* NULL *to* non-NULL, we *expect*
> >it to work, because (we *assume*) the GPU won't cache negative results
> >from PD lookups; so any lookup that previously hit an invalid mapping
> >will be re-fetched next time it's required (and may now be good).
> >
> >If we don't reload the PDPs with LRIs, then perhaps to be safe we need
> >to inject some other instruction that will just force a re-fetch of the
> >lower-level PDs from memory, without altering any top-level PDPs? In
> >conjunction with preallocating the top-level entries, that ought to
> >guarantee that the updates would be seen just before the point where
> >they're about to be used?
> >
> >.Dave.
> >_______________________________________________
> >Intel-gfx mailing list
> >Intel-gfx at lists.freedesktop.org
> >http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> I found the following comment in the BSpec:
> 
> "Pre-loading of Page Directory Entries (PD load) for 32b legacy mode
> is not supported from Gen9 onwards.  PD entries are loaded on demand
> when there is a miss in the PDE cache of the corresponding page
> walker.  Any new page additions by the driver are transparent to the
> HW, and the new page translations will be fetched on demand.
> However, any removal of the pages by the driver should initiate a
> TLB invalidation to remove the stale entries."
> 
> So, I think that confirms that we should inject some form of TLB
> invalidation into the ring before the next batch uses any updated
> PDs. Presumably an MI_FLUSH_DW with TLB_INVALIDATE would do?

Hi Dave and Michel,

So the conclusion is still the same: that for 32b legacy mode,
emit_pdps() is only needed for PDP changes. Other level page table
changes can be handled by TLB_INVALIDATE with ring buffer commands. Is
that correct? Thanks!

Regards,
-Zhiyuan

> 
> .Dave.
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx