DMA scheduling

Fri Mar 17 12:28:03 PST 2006

Am Freitag, den 17.03.2006, 09:17 +0000 schrieb Keith Whitwell:
> Felix Kühling wrote:
> > Hi Keith,
> > 
> > I have some more thoughts and questions about your proposed DMA
> > scheduler.
> > 
> > It's not clear to me whether you're planning to do this for a specific
> > driver or whether you're thinking of a device-independent scheduler that
> > uses some device-specific hooks.
> 
> Device independent.  See below...
> 
> > You outlined some data structure that applications would pass to the
> > scheduler. If the scheduler is going to be hardware independent, that
> > structure should be opaque to the scheduler and the internals should be
> > handled by hardware-specific code. The following is my idea of how this
> > would be done. Is that more or less what you're having in mind or am I
> > way off here: The driver would build some opaque scheduler atom and
> > queue it up to be scheduled on behalf of a process. It would attach some
> > meta-data to the atom that contains the necessary data for the scheduler
> > to make informed scheduling decisions. When a command atom is due to be
> > submitted to the hardware, the scheduler invokes a hardware-specific
> > hook provided by the driver that would extract the hardware-specific
> > commands and data and submit them to the hardware.
> 
> The structure passed is really a device-independent construct defined by 
> the needs of the memory manager.  It includes a reference to a command 
> (DMA) buffer which is in hardware format and largely opaque to the 
> scheduler and memory manager.
> 
> The current exception is that the memory manager will know how to apply 
> the list of relocations to fixup the command buffer with the current 
> locations of the referenced buffers (eg texture image locations).  I can 
> imagine obscure hardware might have wierd ways of packing buffer offsets 
> that might not be expressible in the format we've used to describe 
> relocations, which may mean that applying the relocation fixups is a 
> task that gets handed off to the device-dependent driver.
> 
> This design assumes that there *is* a command buffer concept.  Hardware 
> which is purely MMIO, like tdfx, isn't so amenable to the concept of DMA 
> scheduling (funnily enough).  That driver could either continue as it 
> is, or fake out DMA buffers and submit them to the scheduler.  My guess 
> is the former.
> 
> > Also you make references to some memory manager. Do you mean the work on
> > the texman_0_1_branch? AFAICT the intel drivers are the only ones
> > affected by this. How does your proposed scheduler interact with that
> > memory manager? Does it need to interact with it at all? I think all
> > buffer ageing etc. could be done in the driver specific submit-hook
> > outlined above. Or are you planning to tie buffer ageing to one unified
> > memory manager that is (hopefully) going to be used by all drivers?
> 
> While the memory manager is being developed on i915, the concept and 
> code is largely device-independent.  Most of the guts are in 
> device-independent libdrm.so and drm.ko.
> 
> The intention is for other drivers to pull it in once it is finished and 
> proven.  Just a quick look at what we've been able to implement on i915 
> using the memory manager (fast copytexsubimage, read/draw/copypixels, 
> FBO's, etc) should be enough to convince most people that it's worth the 
> effort...
> 
> Memory managment and scheduling are intimately tied.  Specifically the 
> buffers that the memory manager pulls into video memory are determined 
> by the scheduler's choice of which dma stream to schedule.  Likewise, 
> the scheduler will base its choice of which stream to favor based (in 
> part) on the amount of dma to/from video ram required for each of the 
> active streams.

I should have read your original post more carefully. But thanks for the
explanation. Now I understand what the fixups are for. You wrote:

  - choose a particular dma command buffer for execution.
  - effectively:
     LOCK_HARDWARE
     validateBuffers()
     applyFixups()
     retrieveClipRects()
     foreach cliprect {
         set cliprect;
         fire command buffer;
     }
     UNLOCK_HARDWARE

Isn't there a step missing for uploading/mapping buffers to vram/GART?
Or is that implied in one of the steps.

Also I think it may make sense to have a distinction between commands
and state. Most state needs to be applied only once for all clip rects.
BTW, the savage DRM takes advantage of that fact:

  - choose a particular dma command buffer for execution.
  - effectively:
     LOCK_HARDWARE
     validateBuffers()
     upload/map data into VRAM/GART
     applyFixups()
     update hardware state;
     retrieveClipRects()
     foreach cliprect {
         set cliprect;
         fire command buffer; (no state changes)
     }
     UNLOCK_HARDWARE

> 
> I imagine that the architecture will work out with the scheduler 
> seperate from but strongly interacting with the memory manager through a 
> fairly clear interface.  Both components are expected to be 
> device-independent with a small number of hooks.

Sounds good. And exciting. :)

Thanks,
  Felix

> 
> Keith
> 

-- 
| Felix Kühling <fxkuehl at gmx.de>                     http://fxk.de.vu |
| PGP Fingerprint: 6A3C 9566 5B30 DDED 73C3  B152 151C 5CC1 D888 E595 |