[Libdlo] [PATCH] udlfb: high-throughput urb pool

Sat Dec 18 10:35:49 PST 2010

Hi Andrew,

>> > On Fri, Dec 17, 2010 at 02:17:34PM -0600, Andrew Kephart wrote:
>> > > udlfb: add parameters for URB pool size and for nonblocking URB pool

Thanks for submitting this patch!  And I'm very glad the defio path is
getting this attention.

Some background: when I originally looked at merging the Roberto
(damage) and Jaya (defio) implentations, I hoped to get the best of
both worlds: use defio to map the framebuffer for page fault handling,
so any fbdev client could work without modification, then switch to
the much more efficient damage notification approach when a client let
us know it was capable. Note that defio is not less efficient from a
DisplayLink protocol perspective (it should result in the same USB
traffic to the device), it's just it requires more MMU and CPU work to
get to those same results.  But it's wonderful from a convenience
point of view to magically use page faults to detect dirty framebuffer
regions.

A couple problems prevented this from coming to full fruition:

1) defio isn't really set up to handle turning it on/off dynamically
(e.g. remapping the framebuffer page tables on the fly), and
separately, faults triggered from certain kernel contexts are a
problem.
2) there was a bug or two in defio, and several in udlfb's defio
client, which made the defio path unstable until recently

#2 is largely resolved now (as of defio in 2.6.35, and latest udlfb in
2.6.37), so defio is becoming a viable and appealing path now.
However, #1 is still a source of heartache.

I'm emailing all this background to put the value of the defio path in
perspective (since it's currently an option that's default off).  It
would be great to have this new attention lead to it being on turned
on, without the need for an option, if possible.

So the first point that's important is: these high-level issues of
"can we enable defio by default?" are more important than any
relatively minor differences in performance, like the extra scheduling
latency of releasing buffers via workitem in the defio path.  I'm very
excited to have you looking at defio, in hopes this will lead to
clarity on the high-level issues, since I think we're close.

Then on to this particular patch, the nonblocking URB pool is
fundamentaly problematic, unfortunately.  There's just so much data
that gets generated here with graphics, that we can flood any pool
scheme of any size, with some pretty standard user scenarios (full
screen video playback at high resolutions).  So we must deal with the
pool running out.  In the choice of dropping pixels (without telling
the client) vs. blocking the client until the next buffer comes free,
there are some big simplicity advantages of blocking.  We can't drop
pixels without scheduling a full-screen repaint for some point in the
future - and it just gets tricky to work that out.  Right now in
udlfb, dropped pixels are rare enough (basically only errors), that we
haven't had to walk into any of that complexity.  But I'd be happy to
have more back-and-forth on these options.

In terms of pool size, I'd agree that we should not torture the user
with module options for this.

What I'd recommend is this:  4 64K buffers works great for the damage
path in all the scenarios I've observed .. we only rarely make it to
the 3rd or 4th URB, before the earlier ones in flight come back.

The defio path should be the same, except for that scheduling latency
to free the buffer (and transfer control to the current waiter) in a
normal context via delayed work (defio introduces difficult context
considerations - we crash if we don't get it right).  First, off,
perhaps there's a solution to defio's context limitations, that don't
involve an extra context switch.

Barring that, there's a point where that scheduling latency shouldn't
matter (where we'd be waiting anyway).  I would bump up with number of
buffers from 4 until we find that point of diminishing returns.  I
suspect it's not too many more.  Can you run a few tests to see an
adjustment of this parameter in the header can get us what we need?
Also, if we're not filling the 64K buffers (if defio is sending us
just a few pages at a time), then also try reducing the 64K buffer
size, and increasing the number of buffers accordingly.

So, again, thanks so much for this patch!  While it's not one that can
get applied as-is, the focus it puts on defio is great (what do we
need to finish to enable defio by default?), and hopefully the
discussion it triggers will help us find a good way to optimize perf
in the defio path, that's also simple.

Thanks!
Bernie