Nouveau fences?

Sun Nov 28 23:28:27 PST 2010

On 11/28/2010 10:55 PM, Francisco Jerez wrote:
> Thomas Hellstrom<thomas at shipmail.org>  writes:
>
>    
>> On 11/28/2010 05:11 PM, Francisco Jerez wrote:
>>      
>>> Francisco Jerez<currojerez at riseup.net>   writes:
>>>
>>>
>>>        
>>>> Thomas Hellstrom<thomas at shipmail.org>   writes:
>>>>
>>>>
>>>>          
>>>>> Ben,
>>>>>
>>>>> I'm looking at a way to make TTM memory management asynchronous with
>>>>> the CPU. The idea is that you should basically be able to DMA data to
>>>>> and from memory regions without waiting for idle, as long as the GPU
>>>>> has a means to provide operation ordering.
>>>>>
>>>>>
>>>>>            
>>>> Sounds good. I guess you're mainly dealing with BO eviction
>>>> synchronization? The only problem I see on our side is that calls to our
>>>> move() hook aren't guaranteed to be carried out in order (because of the
>>>> multiple hardware channels). I'm thinking that move() could be extended
>>>> with an optional sync_obj argument, that way move() would be able to
>>>> make sure that evictions are strictly ordered with respect to the fence
>>>> specified.
>>>>
>>>>          
>> The way evictions will work is that they appear to take place
>> "instantly", but are scheduled on a channel, and there will be a data
>> structure that keeps track about what fences need to be signaled
>> before a managed area can be reused.
>>
>> The driver will need to provide a function that, given a list of
>> fences, returns a fence that when it signals, guarantees that all
>> other fences in the list have signaled.
>>      
> Ah, so, evictions made in response to ttm_bo_mem_force_space() are still
> going to be synchronous after the changes you have in mind (because in
> that case you need to reuse the freed memory immediately), right?
>    
No and yes. Evictions will be asynchronous, but the new user of the 
memory area needs to
take appropriate action to make sure it doesn't overwrite old contents. 
If it's a CPU upload, it needs to wait on a fence. Single-channel GPU 
with dma uploads needs to do nothing.
Multi-channel GPU needs to insert a barrier before uploading, that waits 
on the eviction DMA.

So you're right in that we need to give the new move function 
information on what to wait on / insert barriers for. I was initially 
thinking of a single fence object (and that's why the order function is 
needed).

> In other cases (e.g. evictions triggered by BO validation), what exactly
> would we gain from this function? I mean, why can't we just push waiting
> down to ttm_bo_move_ttm/memcpy?
>    

That's essentially what's going to happen, but those functions also need 
to know what exactly to wait on.

>    
>> Single-channel hardware will just return the fence with the highest
>> sequence. Multi-channel hardware may need to insert command stream
>> barriers if available and create a new sync object to return or resort
>> to simply waiting to determine which fence signals last.
>>
>> I guess Nouveau can do command stream barriers, (waiting for other
>> channels to reach a certain command before progressing?)
>>
>>      
> Yep, that's what nouveau_fence_sync() does.
>    

OK, thanks.

/Thomas