thomas at shipmail.org
Sun Nov 28 12:37:34 PST 2010
On 11/28/2010 05:11 PM, Francisco Jerez wrote:
> Francisco Jerez<currojerez at riseup.net> writes:
>> Thomas Hellstrom<thomas at shipmail.org> writes:
>>> I'm looking at a way to make TTM memory management asynchronous with
>>> the CPU. The idea is that you should basically be able to DMA data to
>>> and from memory regions without waiting for idle, as long as the GPU
>>> has a means to provide operation ordering.
>> Sounds good. I guess you're mainly dealing with BO eviction
>> synchronization? The only problem I see on our side is that calls to our
>> move() hook aren't guaranteed to be carried out in order (because of the
>> multiple hardware channels). I'm thinking that move() could be extended
>> with an optional sync_obj argument, that way move() would be able to
>> make sure that evictions are strictly ordered with respect to the fence
The way evictions will work is that they appear to take place
"instantly", but are scheduled on a channel, and there will be a data
structure that keeps track about what fences need to be signaled before
a managed area can be reused.
The driver will need to provide a function that, given a list of fences,
returns a fence that when it signals, guarantees that all other fences
in the list have signaled.
Single-channel hardware will just return the fence with the highest
sequence. Multi-channel hardware may need to insert command stream
barriers if available and create a new sync object to return or resort
to simply waiting to determine which fence signals last.
I guess Nouveau can do command stream barriers, (waiting for other
channels to reach a certain command before progressing?)
Needless to say, drivers need not activate async operation if they don't
want to, but for single-channel hardware it will hopefully be very simple.
>>> While doing that I looked a bit at the Nouveau fencing. It appears
>>> like waiting for fences is polling only (no irq to signal fences)? Is
>>> that correct?
>> That's right, nvidia hardware has no nice way to schedule a fence-like
>> interrupt we could selectively turn on and off around the sync_obj_wait
>> hook. There's a bunch of (more or less) chipset-specific hacks that
>> could be used to get an equivalent effect, but polling has seemed good
>> enough so far (in the typical case we only take the "lazy" path so CPU
>> usage is still OK).
Indeed, I saw the same with unichromes. lazy for throttling and not lazy
for other waits, although I ended up with a hrtimer polling loop in the
non-lazy case, since software fallbacks tended to eat a lot of CPU while
waiting for buffer idle.
More information about the dri-devel