Large coordinates in the X server

Robert Morell rmorell at nvidia.com
Fri Dec 12 17:58:51 PST 2014


On Fri, Dec 12, 2014 at 04:40:24PM -0800, Keith Packard wrote:
> * PGP Signed by an unknown key
> 
> Robert Morell <rmorell at nvidia.com> writes:
> 
> > Hmm, I guess that depends on whether we're already fetching coordinates
> > directly out of the request "fresh off the wire", or copying them.  If
> > there is already a copy, then scaling up to 2x wider integers may
> > admittedly trash 2x more cachelines on write, but I wouldn't expect
> > anything too drastic.
> 
> The driver is handed the raw request buffer, so widening it would
> require allocating a new buffer and copying the data.
> 
> > I think "a couple of lines of code" is a bit optimistic for how much
> > libfb would need to change :)
> 
> It really is just fbSetSpans and fbFillSpans, each just needing to add
> the drawable origin to the span coordinate.

Yeah, it's less than I was afraid it was going to be.  After going
through mi, it looks like in addition to SetSpans and FillSpans,
skipping miTranslate logic also affects arguments to PushPixels and a
bunch of damage code that I haven't gone through yet.

However, even though other drivers and acceleration architectures don't
explicitly enable pGC->miTranslate, pretty much everything loads libfb,
so pGC->miTranslate is set implicitly.  I'll also need to change all of
those to handle translation.  Still, I don't expect a huge amount of
churn.

> > In the case of miFillPolygon, for example, the polygon is decomposed
> > into spans by miFill{Convex,General}Poly().  The number of points which
> > need to be translated by the lower rendering layers after it's been
> > decomposed is potentially much larger than the initial polygon
> > coordinates.
> 
> You're talking about core polygons and wide primitives. Nothing else
> goes through spans at this point. None of those operations are going to
> see a significant impact from having the CPU do a couple more integer
> operations per scanline.

I think you're right.  I ran a few preliminary experiments with x11perf
and although I haven't done anything rigorous with error bars yet, it
looks like making libfb do the translation is actually faster (at least
for -64poly10convex).  This makes sense because libfb is already pulling
the points into the cache, so the translation is pretty close to free.

> > Anyway, there are some performance tradeoffs here that aren't
> > immediately obvious.  Unless anyone has any better ideas, I'll look into
> > writing up some preliminary changes to compare.
> 
> The tradeoff is pretty obvious to me -- core operations that currently
> go directly to hardware acceleration (thin lines, points and rectangles)
> would take a performance hit while operations rendered with spans would
> do a translation per span instead of a translation per vertex.

Yeah, I agree.  I'll go ahead with this for now, thanks.

- Robert


More information about the xorg-devel mailing list