[Intel-gfx] [PATCH 02/21] drm/i915/gtt: Workaround for HW preload not flushing pdps

Thu Aug 13 05:03:30 PDT 2015

On 13/08/15 12:42, Dave Gordon wrote:
> On 13/08/15 11:12, Michel Thierry wrote:
>> On 8/13/2015 5:08 PM, Zhiyuan Lv wrote:
>>> Hi Michel,
>>>
>>> Thanks for the reply!
>>>
>>> I yet have another question: right now the mark_tlb_dirty() will be
>>> called if any level of PPGTT table is changed. But for the EXECLIST
>>> context submission, we only need LRI commands if there are L3 PDP root
>>> pointer changes right? Thanks!
>>
>> mark_tlbs_dirty is not only for execlists mode, we re-used it since it
>> was already there.
>>
>> The update is only required when a PDP is allocated.
>>
>> -Michel
>
> Doesn't that depend on whether the context is running? The LRI reload
> has the side effect of flushing all current knowledge of mappings, so
> every level of PD gets refreshed from memory.
>
> If we're not updating the top level PDPs, and we know the context isn't
> active, then we *assume* that lower-level PDs will be refreshed when the
> context is next loaded. (This hasn't been true on all hardware, some of
> which cached previously-retrieved PDs across ctx save-and-reload, and
> that's one reason why there's a "Force PD Restore" bit, but we've been
> told not to use it on current h/w). AFAICT, current chips don't cache
> previous PDs and don't need the "Force" bit for this case.
>
> OTOH, if we don't know whether the context is running, then we can't be
> sure when (or whether) any PD updates will be seen. As long as the
> changes of interest are only ever *from* NULL *to* non-NULL, we *expect*
> it to work, because (we *assume*) the GPU won't cache negative results
> from PD lookups; so any lookup that previously hit an invalid mapping
> will be re-fetched next time it's required (and may now be good).
>
> If we don't reload the PDPs with LRIs, then perhaps to be safe we need
> to inject some other instruction that will just force a re-fetch of the
> lower-level PDs from memory, without altering any top-level PDPs? In
> conjunction with preallocating the top-level entries, that ought to
> guarantee that the updates would be seen just before the point where
> they're about to be used?
>
> .Dave.
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

I found the following comment in the BSpec:

"Pre-loading of Page Directory Entries (PD load) for 32b legacy mode is 
not supported from Gen9 onwards.  PD entries are loaded on demand when 
there is a miss in the PDE cache of the corresponding page walker.  Any 
new page additions by the driver are transparent to the HW, and the new 
page translations will be fetched on demand.  However, any removal of 
the pages by the driver should initiate a TLB invalidation to remove the 
stale entries."

So, I think that confirms that we should inject some form of TLB 
invalidation into the ring before the next batch uses any updated PDs. 
Presumably an MI_FLUSH_DW with TLB_INVALIDATE would do?

.Dave.