DMA scheduling
Felix Kühling
fxkuehl at gmx.de
Fri Mar 17 12:28:03 PST 2006
Am Freitag, den 17.03.2006, 09:17 +0000 schrieb Keith Whitwell:
> Felix Kühling wrote:
> > Hi Keith,
> >
> > I have some more thoughts and questions about your proposed DMA
> > scheduler.
> >
> > It's not clear to me whether you're planning to do this for a specific
> > driver or whether you're thinking of a device-independent scheduler that
> > uses some device-specific hooks.
>
> Device independent. See below...
>
> > You outlined some data structure that applications would pass to the
> > scheduler. If the scheduler is going to be hardware independent, that
> > structure should be opaque to the scheduler and the internals should be
> > handled by hardware-specific code. The following is my idea of how this
> > would be done. Is that more or less what you're having in mind or am I
> > way off here: The driver would build some opaque scheduler atom and
> > queue it up to be scheduled on behalf of a process. It would attach some
> > meta-data to the atom that contains the necessary data for the scheduler
> > to make informed scheduling decisions. When a command atom is due to be
> > submitted to the hardware, the scheduler invokes a hardware-specific
> > hook provided by the driver that would extract the hardware-specific
> > commands and data and submit them to the hardware.
>
> The structure passed is really a device-independent construct defined by
> the needs of the memory manager. It includes a reference to a command
> (DMA) buffer which is in hardware format and largely opaque to the
> scheduler and memory manager.
>
> The current exception is that the memory manager will know how to apply
> the list of relocations to fixup the command buffer with the current
> locations of the referenced buffers (eg texture image locations). I can
> imagine obscure hardware might have wierd ways of packing buffer offsets
> that might not be expressible in the format we've used to describe
> relocations, which may mean that applying the relocation fixups is a
> task that gets handed off to the device-dependent driver.
>
> This design assumes that there *is* a command buffer concept. Hardware
> which is purely MMIO, like tdfx, isn't so amenable to the concept of DMA
> scheduling (funnily enough). That driver could either continue as it
> is, or fake out DMA buffers and submit them to the scheduler. My guess
> is the former.
>
> > Also you make references to some memory manager. Do you mean the work on
> > the texman_0_1_branch? AFAICT the intel drivers are the only ones
> > affected by this. How does your proposed scheduler interact with that
> > memory manager? Does it need to interact with it at all? I think all
> > buffer ageing etc. could be done in the driver specific submit-hook
> > outlined above. Or are you planning to tie buffer ageing to one unified
> > memory manager that is (hopefully) going to be used by all drivers?
>
> While the memory manager is being developed on i915, the concept and
> code is largely device-independent. Most of the guts are in
> device-independent libdrm.so and drm.ko.
>
> The intention is for other drivers to pull it in once it is finished and
> proven. Just a quick look at what we've been able to implement on i915
> using the memory manager (fast copytexsubimage, read/draw/copypixels,
> FBO's, etc) should be enough to convince most people that it's worth the
> effort...
>
> Memory managment and scheduling are intimately tied. Specifically the
> buffers that the memory manager pulls into video memory are determined
> by the scheduler's choice of which dma stream to schedule. Likewise,
> the scheduler will base its choice of which stream to favor based (in
> part) on the amount of dma to/from video ram required for each of the
> active streams.
I should have read your original post more carefully. But thanks for the
explanation. Now I understand what the fixups are for. You wrote:
- choose a particular dma command buffer for execution.
- effectively:
LOCK_HARDWARE
validateBuffers()
applyFixups()
retrieveClipRects()
foreach cliprect {
set cliprect;
fire command buffer;
}
UNLOCK_HARDWARE
Isn't there a step missing for uploading/mapping buffers to vram/GART?
Or is that implied in one of the steps.
Also I think it may make sense to have a distinction between commands
and state. Most state needs to be applied only once for all clip rects.
BTW, the savage DRM takes advantage of that fact:
- choose a particular dma command buffer for execution.
- effectively:
LOCK_HARDWARE
validateBuffers()
upload/map data into VRAM/GART
applyFixups()
update hardware state;
retrieveClipRects()
foreach cliprect {
set cliprect;
fire command buffer; (no state changes)
}
UNLOCK_HARDWARE
>
> I imagine that the architecture will work out with the scheduler
> seperate from but strongly interacting with the memory manager through a
> fairly clear interface. Both components are expected to be
> device-independent with a small number of hooks.
Sounds good. And exciting. :)
Thanks,
Felix
>
> Keith
>
--
| Felix Kühling <fxkuehl at gmx.de> http://fxk.de.vu |
| PGP Fingerprint: 6A3C 9566 5B30 DDED 73C3 B152 151C 5CC1 D888 E595 |
More information about the xorg
mailing list