[Intel-gfx] [PATCH v2] drm/i915: prevent out of range pt in the PDE macros (take 3)

Tue Oct 6 03:19:25 PDT 2015

On Tue, Oct 06, 2015 at 12:09:51PM +0200, Daniel Vetter wrote:
> On Tue, Oct 06, 2015 at 10:43:48AM +0100, Chris Wilson wrote:
> > On Tue, Oct 06, 2015 at 10:38:22AM +0200, Daniel Vetter wrote:
> > > On Mon, Oct 05, 2015 at 05:59:50PM +0100, Michel Thierry wrote:
> > > > On 10/5/2015 5:36 PM, Dave Gordon wrote:
> > > > >On 02/10/15 14:16, Michel Thierry wrote:
> > > > >>We tried to fix this in commit fdc454c1484a ("drm/i915: Prevent out of
> > > > >>range pt in gen6_for_each_pde").
> > > > >>
> > > > >>But the static analyzer still complains that, just before we break due
> > > > >>to "iter < I915_PDES", we do "pt = (pd)->page_table[iter]" with an
> > > > >>iter value that is bigger than I915_PDES. Of course, this isn't really
> > > > >>a problem since no one uses pt outside the macro. Still, every single
> > > > >>new usage of the macro will create a new issue for us to mark as a
> > > > >>false positive.
> > > > >>
> > > > >>Also, Paulo re-started the discussion a while ago [1], but didn't end up
> > > > >>implemented.
> > > > >>
> > > > >>In order to "solve" this "problem", this patch takes the ideas from
> > > > >>Chris and Dave, but that check would change the desired behavior of the
> > > > >>code, because the object (for example pdp->page_directory[iter]) can be
> > > > >>null during init/alloc, and C would take this as false, breaking the for
> > > > >>loop immediately.
> > > > >>
> > > > >>This has been already verified with "static analysis tools".
> > > > >>
> > > > >>[1]http://lists.freedesktop.org/archives/intel-gfx/2015-June/068548.html
> > > > >>
> > > > >>v2: Make it a single statement, while preventing the common subexpression
> > > > >>elimination (Chris)
> > > > >>
> > > > >>Cc: Paulo Zanoni <paulo.r.zanoni at intel.com>
> > > > >>Cc: Chris Wilson <chris at chris-wilson.co.uk>
> > > > >>Cc: Dave Gordon <david.s.gordon at intel.com>
> > > > >>Signed-off-by: Michel Thierry <michel.thierry at intel.com>
> > 
> > > Yeah, since ?: is a ternary operator parsing implicitly adds the () in the
> > > middle and always parses it as a ? (b) : c. If lower-level operators in
> > > the middle could split the ternary operator then it would result in
> > > parsing fail (sinc ? without the : is just useless). So lgtm. Someone
> > > willing to smack an r-b onto the patch?
> > 
> > I think it's good enough.
> > Reviewed-by: Chris Wilson <chris at chris-wilson.co.uk>
> 
> Queued for -next, thanks for the patch.
> 
> > Something to consider is that ppgtt_insert is 10x slower than
> > ppgtt_clear, and that some workloads (admittedly not 48b!) spend a
> > disproportionate amount of time changing PTE. If you have ideas for
> > spending up insertion, feel free to experiment!
> 
> Hm, where do we waste all that time? 10x slower is pretty impressive since
> on a quick look I can only see the sg table walk as the additional bit of
> memory traversals insert does on top of clear ...

The sg_page_iter claims top spot, followed by the memory dereferences
(at a guess, it doesn't seem the individual sg struct isn't dense enough
to be cache friendly or maybe we should just cachealign it?). We can make
substantial improvement by opencoding the page iter (under the assumption
that we do not have page coallescing) like:

http://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=nightly&id=f9baf155c3096058ef9aeeb39b586713291efc56

with that and eliminating the ivb_pte_encode() indirection gets us to
only be ~5x slower than clearing. It is not pretty (and is rather
cavalier about the last page), but at that point it seems to be memory bound.

However, in this particular benchmark where inserting ppgtt dominates
the kernel profile, we are GPU bound and improving the insert is lost in
the noise. (Unless we disable the GPU load and assume an infinitely fast
GPU like Skylake!)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre