DMA scheduling

Felix Kühling fxkuehl at gmx.de
Thu Mar 16 17:42:42 PST 2006


Hi Keith,

I have some more thoughts and questions about your proposed DMA
scheduler.

It's not clear to me whether you're planning to do this for a specific
driver or whether you're thinking of a device-independent scheduler that
uses some device-specific hooks.

You outlined some data structure that applications would pass to the
scheduler. If the scheduler is going to be hardware independent, that
structure should be opaque to the scheduler and the internals should be
handled by hardware-specific code. The following is my idea of how this
would be done. Is that more or less what you're having in mind or am I
way off here: The driver would build some opaque scheduler atom and
queue it up to be scheduled on behalf of a process. It would attach some
meta-data to the atom that contains the necessary data for the scheduler
to make informed scheduling decisions. When a command atom is due to be
submitted to the hardware, the scheduler invokes a hardware-specific
hook provided by the driver that would extract the hardware-specific
commands and data and submit them to the hardware.

Also you make references to some memory manager. Do you mean the work on
the texman_0_1_branch? AFAICT the intel drivers are the only ones
affected by this. How does your proposed scheduler interact with that
memory manager? Does it need to interact with it at all? I think all
buffer ageing etc. could be done in the driver specific submit-hook
outlined above. Or are you planning to tie buffer ageing to one unified
memory manager that is (hopefully) going to be used by all drivers?

Regards,
  Felix

Am Donnerstag, den 16.03.2006, 15:52 +0000 schrieb Keith Whitwell:
> 
> I've been thinking a little bit about DMA scheduling for the graphics 
> hardware.
> 
> Currently we have a situation where any 3d app which happens to be
> able to grab the DRI lock can enqueue as many commands on the hardware
> dma queue as it sees fit.  The Linux scheduler is the only arbiter
> between competing 3D clients, and it has no information regarding the
> GPU usage of these clients.
> 
> Even if it did, there are benefits to be reaped from keeping the 3d
> DMA streams seperate and explicitly scheduling the dma rather than
> allowing clients to inject it in arbitary quantities and orders.
> 
> Why do we want a GPU scheduler?
> 
> 1) Fairness.  We can currently have situations where one 3d
>     applications manages to dominate the GPU while a second app in
>     another window is locked out entirely.
> 
> 2) Interactivity.  It is quite possible to have one application which
>     does so little rendering per frame that it can run at 3000fps while
>     another eg, video-based application does a lot more and can just
>     about keep up a 30fps framerate.  Consider a situation where both
>     applications are running at once.  Simple fairness criteria would
>     have them running at 1500fps and 15fps respectively - but it seems
>     that fairness isn't what is required here.  It would be preferable
>     give the slower application a greater percentage of the GPU, so
>     that it manages eg. 27fps, while the other is scaled down to "only"
>     300fps or so.
> 
>     Note that we currently don't even have the "fair" situation...
> 
> 3) Resource management.  Imagine two applications each of which has a
>     texture working set of 90% of the available video ram.  Even a
>     smart replacement algorithm will end up thrashing if the
>     applications are able to rapidly grab the DRI lock from each other
>     and enqueue buffer loads and rendering.  A scheduler could
>     recognize the thrashing and improve matters by giving each app a
>     longer timeslice on the GPU to minimize texture reloads.
> 
> There are probably other reasons as well, but these are the ones that
> spring to mind.  A scheduler should help to provide a graphics
> environment that degrades gracefully under load, rather than the
> current situation where apps stutter in and out of life, where one app
> sits dead while another renders frantically.
> 
> 
> 
> Scheduling is a kernel activity
> -------------------------------
> 
> First of all, and I think this is the currently most relevent bit, is
> that a scheduler is going to be a largely kernel-space entity.  That
> is processes will package up dma command buffers with all the
> information needed to fire them, and hand them off (somehow) to a
> scheduler which will live in the kernel.  I think this is likely to be
> the only sensible place for a scheduler to live.
> 
> Scheduling and memory management
> --------------------------------
> 
> What I see processes handing to the scheduler is something like a
> struct of:
> 
>      bmBufferList *bufferlist;
>      bmBuffer dmaCommandBuffer;
>      bmFixupList *dmaFixups;
>      bool  apply_cliprects;
>      DrawableID cliprectDrawable;
> 
> This is basically the information we pass to the memory manager now.
> The fixup list is the set of relocations which must be applied to the
> command buffer once the memory manager has loaded all the referenced
> buffers into locations in vram.
> 
> The departure from the current code is that we are now asking for this
> to be done at some point in the future.  The current code actually
> pulls in the buffers before phsyically enqueuing the command buffer on
> the hardware.
> 
> Allowing this to be deferred to the future will allow the scheduler to
> optimize the usage pattern of buffers, to reduce thrashing and to
> attempt to fairly divide GPU resources (time, bandwidth, framerate,
> etc) between competing 3d applications.
> 
> The scheduler will at some point in the future do the equivalent of:
> 
>   - choose a particular dma command buffer for execution.
>   - effectively:
>      LOCK_HARDWARE
>      validateBuffers()
>      applyFixups()
>      retrieveClipRects()
>      foreach cliprect {
>          set cliprect;
>          fire command buffer;
>      }
>      UNLOCK_HARDWARE
> 
> at this point, note that validateBuffers() is primarily used within
> kernelspace.  This may mean that optimizations aimed at improving the
> userspace behaviour of this call may not be important in the longer
> term.
> 
> That aside, what is missing before we can implement the scheduler?  I
> see a only couple of (small?) items.
> 
> 1) retrieveClipRects()
> 
> - The way that other driver architectures have done this is to create
>    a regular shared memory region that the X server and kernel module
>    can access which holds the cliprects of all active drawables.  The
>    memory region doesn't have to be pinned or anything special, just
>    readable and understandable by both parties.  Access is probably
>    protected by the DRI lock.
> 
> 2) 2D blits.
> 
> - The regular mechanism of setting a cliprect and firing a command
>    buffer works for most hardware we know about, but only for 3d
>    commands.  2d commands tend not to be affected by the 3d state used
>    for the cliprect.
> 
> - The i830 may or may not have a way to set a cliprect which affects
>    2d state, but that's probably not helpful for solving the general
>    problem.
> 
> If you look at the i915 driver, there's quite a few places where we
> lock hardware and then use the cliprect list to emit 2d blits, eg for
> screen clearing, swapBuffers, copyPixels, etc.
> 
> This could work with the scheduler if the application cleared out all
> previous dma from the scheduler queue before being allowed to emit
> those hard-coded blits.  The blits themselves would have to go to
> hardware immediately, before UNLOCK_HARDWARE, and not be subject to
> scheduling, otherwise the X server might change the cliprects before
> they are scheduled.
> 
> While the above would work, and would certainly be fine as a first
> step, it seems to reduce the utility of the scheduler as clients can
> still skew the behaviour of the dma stream significantly just by
> issuing lots of blits.
> 
> I'm considering a system where operations that don't respect the
> standard method of setting a cliprect are passed to the scheduler as
> special tokens.  These tokens are scheduled as usual, the same as dma
> command buffers, but when it comes time to fire them, are passed to
> the hardware's drm component to be turned into real dma commands.  At
> the moment, I think the two tokens would be "copy-blit" and
> "fill-blit", and that is pretty much all the 3d drivers need.
> 
> 3) The X server.
> 
> Is the X server command stream scheduled?  I would like to think it
> was, but see the above.  The X server would want more and more varied
> control over the 2d and video hardware and command streams.  For Xgl,
> it is a lot easier to see how this would work.  What about regular X
> servers?
> 
> It has been pointed out that you can divide X server drawing into two
> components:
> 
> 	1) Drawing on behalf of clients.  This includes 2D xlib
> 	   drawing as well as 3D commands arising from indirect GLX
> 	   clients.
> 
> 	2) Drawing as a result of window management operations, such
>             as mapping, unmapping and moving windows.
> 
>  From the point of view of the scheduler, it may be advantageous to
> treat these seperately.  The drawing commands from (1) can effectively
> be sheduled normally, maybe even as multiple streams, one per
> client/context.
> 
> The window-managment drawing operations are associated with changes to
> cliprect lists and these may benefit from being scheduled differently.
> They may also be subject to different constraints based on how easy or
> difficult it is to propogate the cliprect changes to other queues.  If
> cliprect changes cannot be propogated, it will be necessary to drain
> the other queues before executing the window managment drawing.
> 
> Keith
> _______________________________________________
> xorg mailing list
> xorg at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/xorg
> 
-- 
| Felix Kühling <fxkuehl at gmx.de>                     http://fxk.de.vu |
| PGP Fingerprint: 6A3C 9566 5B30 DDED 73C3  B152 151C 5CC1 D888 E595 |




More information about the xorg mailing list