DMA scheduling

Fri Mar 17 14:11:55 PST 2006

Felix Kühling wrote:
> Am Freitag, den 17.03.2006, 09:17 +0000 schrieb Keith Whitwell:
>> Felix Kühling wrote:
>>> Hi Keith,
>>>
>>> I have some more thoughts and questions about your proposed DMA
>>> scheduler.
>>>
>>> It's not clear to me whether you're planning to do this for a specific
>>> driver or whether you're thinking of a device-independent scheduler that
>>> uses some device-specific hooks.
>> Device independent.  See below...
>>
>>> You outlined some data structure that applications would pass to the
>>> scheduler. If the scheduler is going to be hardware independent, that
>>> structure should be opaque to the scheduler and the internals should be
>>> handled by hardware-specific code. The following is my idea of how this
>>> would be done. Is that more or less what you're having in mind or am I
>>> way off here: The driver would build some opaque scheduler atom and
>>> queue it up to be scheduled on behalf of a process. It would attach some
>>> meta-data to the atom that contains the necessary data for the scheduler
>>> to make informed scheduling decisions. When a command atom is due to be
>>> submitted to the hardware, the scheduler invokes a hardware-specific
>>> hook provided by the driver that would extract the hardware-specific
>>> commands and data and submit them to the hardware.
>> The structure passed is really a device-independent construct defined by 
>> the needs of the memory manager.  It includes a reference to a command 
>> (DMA) buffer which is in hardware format and largely opaque to the 
>> scheduler and memory manager.
>>
>> The current exception is that the memory manager will know how to apply 
>> the list of relocations to fixup the command buffer with the current 
>> locations of the referenced buffers (eg texture image locations).  I can 
>> imagine obscure hardware might have wierd ways of packing buffer offsets 
>> that might not be expressible in the format we've used to describe 
>> relocations, which may mean that applying the relocation fixups is a 
>> task that gets handed off to the device-dependent driver.
>>
>> This design assumes that there *is* a command buffer concept.  Hardware 
>> which is purely MMIO, like tdfx, isn't so amenable to the concept of DMA 
>> scheduling (funnily enough).  That driver could either continue as it 
>> is, or fake out DMA buffers and submit them to the scheduler.  My guess 
>> is the former.
>>
>>> Also you make references to some memory manager. Do you mean the work on
>>> the texman_0_1_branch? AFAICT the intel drivers are the only ones
>>> affected by this. How does your proposed scheduler interact with that
>>> memory manager? Does it need to interact with it at all? I think all
>>> buffer ageing etc. could be done in the driver specific submit-hook
>>> outlined above. Or are you planning to tie buffer ageing to one unified
>>> memory manager that is (hopefully) going to be used by all drivers?
>> While the memory manager is being developed on i915, the concept and 
>> code is largely device-independent.  Most of the guts are in 
>> device-independent libdrm.so and drm.ko.
>>
>> The intention is for other drivers to pull it in once it is finished and 
>> proven.  Just a quick look at what we've been able to implement on i915 
>> using the memory manager (fast copytexsubimage, read/draw/copypixels, 
>> FBO's, etc) should be enough to convince most people that it's worth the 
>> effort...
>>
>> Memory managment and scheduling are intimately tied.  Specifically the 
>> buffers that the memory manager pulls into video memory are determined 
>> by the scheduler's choice of which dma stream to schedule.  Likewise, 
>> the scheduler will base its choice of which stream to favor based (in 
>> part) on the amount of dma to/from video ram required for each of the 
>> active streams.
> 
> I should have read your original post more carefully. But thanks for the
> explanation. Now I understand what the fixups are for. You wrote:
> 
>   - choose a particular dma command buffer for execution.
>   - effectively:
>      LOCK_HARDWARE
>      validateBuffers()
>      applyFixups()
>      retrieveClipRects()
>      foreach cliprect {
>          set cliprect;
>          fire command buffer;
>      }
>      UNLOCK_HARDWARE
> 
> Isn't there a step missing for uploading/mapping buffers to vram/GART?
> Or is that implied in one of the steps.

Yep, "validateBuffers" does this.

> Also I think it may make sense to have a distinction between commands
> and state. Most state needs to be applied only once for all clip rects.
> BTW, the savage DRM takes advantage of that fact:
> 
>   - choose a particular dma command buffer for execution.
>   - effectively:
>      LOCK_HARDWARE
>      validateBuffers()
>      upload/map data into VRAM/GART
>      applyFixups()
>      update hardware state;
>      retrieveClipRects()
>      foreach cliprect {
>          set cliprect;
>          fire command buffer; (no state changes)
>      }
>      UNLOCK_HARDWARE

This is the sort of stuff that I'd like to see get addressed once we've 
got something generally working.

Based on IRC discussions, I'm not sure that the multiple cliprect case 
is that important any more, particularly with reference to composited 
environments.  I'd be happiest to see the general framework up and 
solid, then see if there is a win from these sorts of changes.

Keith