Nouveau fences?

Sun Nov 28 12:37:34 PST 2010

On 11/28/2010 05:11 PM, Francisco Jerez wrote:
> Francisco Jerez<currojerez at riseup.net>  writes:
>
>    
>> Thomas Hellstrom<thomas at shipmail.org>  writes:
>>
>>      
>>> Ben,
>>>
>>> I'm looking at a way to make TTM memory management asynchronous with
>>> the CPU. The idea is that you should basically be able to DMA data to
>>> and from memory regions without waiting for idle, as long as the GPU
>>> has a means to provide operation ordering.
>>>
>>>        
>> Sounds good. I guess you're mainly dealing with BO eviction
>> synchronization? The only problem I see on our side is that calls to our
>> move() hook aren't guaranteed to be carried out in order (because of the
>> multiple hardware channels). I'm thinking that move() could be extended
>> with an optional sync_obj argument, that way move() would be able to
>> make sure that evictions are strictly ordered with respect to the fence
>> specified.
>>      
The way evictions will work is that they appear to take place 
"instantly", but are scheduled on a channel, and there will be a data 
structure that keeps track about what fences need to be signaled before 
a managed area can be reused.

The driver will need to provide a function that, given a list of fences, 
returns a fence that when it signals, guarantees that all other fences 
in the list have signaled.
Single-channel hardware will just return the fence with the highest 
sequence. Multi-channel hardware may need to insert command stream 
barriers if available and create a new sync object to return or resort 
to simply waiting to determine which fence signals last.

I guess Nouveau can do command stream barriers, (waiting for other 
channels to reach a certain command before progressing?)

Needless to say, drivers need not activate async operation if they don't 
want to, but for single-channel hardware it will hopefully be very simple.

>>      
>>> While doing that I looked a bit at the Nouveau fencing. It appears
>>> like waiting for fences is polling only (no irq to signal fences)? Is
>>> that correct?
>>>
>>>        
>> That's right, nvidia hardware has no nice way to schedule a fence-like
>> interrupt we could selectively turn on and off around the sync_obj_wait
>> hook. There's a bunch of (more or less) chipset-specific hacks that
>> could be used to get an equivalent effect, but polling has seemed good
>> enough so far (in the typical case we only take the "lazy" path so CPU
>> usage is still OK).
>>      

Indeed, I saw the same with unichromes. lazy for throttling and not lazy 
for other waits, although I ended up with a hrtimer polling loop in the 
non-lazy case, since software fallbacks tended to eat a lot of CPU while 
waiting for buffer idle.

/Thomas