EXA prepare/finish hooks & random X questions

Benjamin Herrenschmidt benh at kernel.crashing.org
Wed Aug 31 23:12:29 PDT 2005


I've started looking into adding a pair of hooks to Exa to wrap all
direct access to objects in vram (front buffer or pixmaps). This is
necessary to be able to handle endian swappers properly. I've started
digging into the Exa source, and that led me to digging into "fb" and a
bit of "mi" source as well, but my lack of knowledge of X guts is really
showing, so I'm throwing various questions here in no specific orders,
hoping somebody will answer ;)

First of all, my initial goal was to write the hooks so they look like

(Ptr is a random type for now, see below)

Ptr (*PrepareAccess)(PixmapPtr pixmap, int direction);
void (*FinishAccess)(PixmapPtr ptr, int direction, Ptr p);

The idea behind that Ptr and direction things is that the hook would
return the actual pointer to be used to access the pixels. That allows a
couple of interesting features:

 - If the framebuffer isn't entirely accessible in a linear way, the
driver can play tricks with aperture windows, or eventually do a fast
local blit to some reserved accessible area

 - It gives us a way to play with cache tricks in the low level driver
if we ever want to experiment with cacheable framebuffers, by doing
flush & invalidate at the right time and possibly using alternate
mappings to the framebuffer

 - It gives us a way to "catch" for bugs where X might tap the fb
without sync'ing by always giving it pointers to some non mapped memory
for pixmaps & front buffer, and having the low level hook "translate"
those to proper pointers

 - By selectively returning "NULL" from PrepareAccess(), the driver can
force operating in main memory for performances experiments (see blurb
about swappers below for an explanation)

 - Whatever else :)

However, from looking at the actual code, I noticed that for most normal
drawable operations, what we do when we fallback is to call the "fb" or
"mi" layers which will themselves pick the pointer out of the pixmap.

So if we want to have those low level functions to actually use the
pointer returned by PrepareAcess() to access the data, we should either

 - Save & modify devPrivate.ptr before calling fbXXX() and restore it on
return

 - Have PrepareAccess allocate & return a full pixmap structure instead
with a new pointer that we pass down as a drawable to fb()

 - Whatever better idea you might have ?

Also, I was thinking that the call to exaWaitSync() that is done by
almost every fb wrapper in Exa could be moved to the PrepareAccess()
hook when present. That is, in absence of the hook, the "default" would
do exaWaitSync(), but when present, leave it to the hook. That gives a
bit of flexibility to the driver to have smarter sync policy.

While we are there, I notice we don't wrap some mi calls. From my quick
look at the code, I suspect that is because they just call back other GC
ops that are already wrapped, am I right ?

That leads to my worry that we might wrap an fb call with
PrepareAccess/FinishAccess, but the fbXXX call itself might call back
through the GCOops, thus causing us to stack PrepareAccess/FinishAccess
calls, which would be bad, especially if they have to keep track of the
surfaces they use/allocate for the swappers (see discussion on swappers
below). Can that actually happen ? If yes, I can see various strategies,
best beeing to be able to "mark" the pixmap already prepared and keep a
counter, but do we have some field in there we can use for that ? Worst
is to keep a global array of currently prepared pixmaps...

Can somebody give me a quick overview of what are "fb" vs. "mi" layers ?
fb seem to do actual drawing to the framebuffer. I haven't looked into
too much details into "mi". We seem to use fb most of the time, but we
fallback to mi in a couple of cases, sometimes, based on the value of
pGC->lineWidth (wether it's 0 or not).

While looking at the code, I also noticed something a bit odd in Exa:

A bunch of wrappers do:

    if (pGC->lineWidth == 0) {
	exaWaitSync(pDrawable->pScreen);
	exaDrawableDirty (pDrawable);
    }
    exaDrawableDirty (pDrawable);
    fbPolyLine (pDrawable, pGC, mode, npt, ppt);

What is the point of the doing exaDrawableDirty() twice in the lineWidth
== 0 case ? Also, why do we _not_ sync with lineWidth non-0 ? Because it
will just call us back with lineWidth == 0 ? In that case, we hit the
problem I've mentioned above with dealing with recursive Prepare on the
same pixmap. A would appreciate a little bit of insight here ;)

Note that this is not consistent accross all functions. For example:

void
ExaCheckPolyRectangle (DrawablePtr pDrawable, GCPtr pGC,
		      int nrects, xRectangle *prect)
{
    if (pGC->lineWidth == 0) {
	exaWaitSync(pDrawable->pScreen);
	exaDrawableDirty (pDrawable);
    }
    fbPolyRectangle (pDrawable, pGC, nrects, prect);
}

doesn't mark dirty when lineWidth is not 0 at all (and doesn't sync
neither).

Now, a note about the strategy for dealing with the swappers. We might
have PrepareAccess() called for up to 3 pixmaps at the same time in the
case of composite. For cards like radeon, we can use the surface
mecanism to have separate sets of swappers for every pixmap (the swapper
has to be set differently based on the bpp).

But not all cards can do that. My idea was that if your driver is asked
to do more than what it can, it would just return NULL to the
PrepareAccess() call.

When Exa gets that NULL return, the idea was to then allocate a local
Pixmap in RAM for the operation, DownloadFromScreen() the vram pixmap to
that local pixmap, and perform the operation in RAM. If necessary, once
the operation is completed, the result can be pushed back into vram (if
it's the front buffer, but I don't expect PrepareAccess() to fail on the
front buffer anyway.

Note that in many cases, the driver will not need to do that. The
majority of pixmaps dealt with seem to have the same endian as the main
framebuffer, which means no special setting is needed.

This can also be "hijacked" as a mean to force EXA to work in RAM
instead of VRAM for some operations, which might be more performant,
depending on various things. The "direction" parameter could be extended
to provide more information about the operation here to help do a better
decision, but I'll leave that to further discussions.

Ben.





More information about the xorg mailing list