[PATCH] drm/ttm: pass buffer object for bind/unbind callback

Fri Nov 18 07:06:05 PST 2011

On 11/18/2011 03:56 PM, Jerome Glisse wrote:
> On Fri, Nov 18, 2011 at 03:30:03PM +0100, Thomas Hellstrom wrote:
>    
>> On 11/18/2011 02:15 PM, Ben Skeggs wrote:
>>      
>>> On Fri, 2011-11-18 at 08:57 +0100, Thomas Hellstrom wrote:
>>>        
>>>> Jerome,
>>>>
>>>> I don't like this change for the following reasons
>>>>          
>>> -snip-
>>>
>>>        
>>>>> One can take advantage of move notify callback but, there are
>>>>> corner case where bind/unbind might be call without move notify
>>>>> being call (in error path mostly). So to make sure that each
>>>>> virtual address space have a consistent view of wether a buffer
>>>>> object is backed or not by system page it's better to pass the
>>>>> buffer object to bind/unbind.
>>>>>            
>>> As I discussed previously with Jerome on IRC, I believe the move_notify
>>> hook is sufficient.  I fixed a couple of issues back with it back when I
>>> implemented support for this in nouveau.
>>>
>>> Jerome did point out two issues however, which I believe can be solved
>>> easily enough.
>>>
>>> The first was that the error path after move_notify() has been called
>>> doesn't reverse the move_notify() too, leaving incorrect page table
>>> entries.  This, I think, could easily be solved by swapping bo->mem and
>>> mem, and re-calling move_notify() to have the driver reverse whatever it
>>> did.
>>>
>>> The second issue is that apparently move_notify() doesn't get called in
>>> the destroy path.  Nouveau's GEM layer takes care of this for our
>>> userspace bos, and we don't use any kernel bos that are mapped into a
>>> channel's address space so we don't hit it.  This can probably be solved
>>> easily enough too I expect.
>>>
>>> Ben.
>>>
>>>        
>> OK. I understand. Surely if a move_notify is missing somewhere, that
>> can easily be fixed.
>>
>> It might be good if we can outline how the virtual tables are set
>> up. In my world, the following would happen:
>>
>> 1) Pre command submission:
>>
>> a) reserve bo
>> b) call ttm_bo_validate to set placement. move_notify will tear down
>> any existing GPU page tables for the bo.
>> c) Set up GPU page tables.
>> d) Command submission
>> e) unreserve_bo().
>>
>>
>> 2) When eviction happens:
>> a) reserve bo
>> b) move_notify is called to tear down page tables
>> c) change placement
>> d) Unreserve bo.
>>
>> Let's say an error occurs in 1d) Why would you need to undo 1c?
>>
>> Similarly if an error occurs in 2c) Why would you  need to undo 2b)?
>>
>>      
> Error is in ttm_bo_handle_move_mem move_notify is call before we do the
> move but the move might fail.
>    
But even if the move fails in 1b) isn't it safe to just leave the 
virtual GPU map unbound, since GPU page tables will *always* be set up 
when placement is complete in 1c?

> For destroy issue is when destroying a gtt buffer, it will just be
> unbind, no call to ttm_bo_handle_move_mem which is the only function
> calling back move_notify. (see ttm_bo_release_list).
>    

Yes, here a fix is needed.

> I will fix this 2 corner case. I just wanted to make things symetrical.
> By hooking up the virtual address space update with bind/unbind
>
> Note that at one point in the future the dream i have is no more
> reserve, iommu page fault which is coming up (PASID process address
> space id, with ATS address translation service and PRI page request
> interface) would allow such things. Only synchronization will be
> needed when moving object from system ram to vram or vice versa.
>    

Indeed, but then TTM will probably be overkill, and something simpler 
can be implemented.

> Cheers,
> Jerome
>    

Thanks,
Thomas