[PATCH 2/5] drm/nouveau: add reservation to nouveau_gem_ioctl_cpu_prep

Wed Jan 22 04:52:51 PST 2014

On 01/22/2014 01:38 PM, Maarten Lankhorst wrote:
> op 22-01-14 13:11, Thomas Hellstrom schreef:
>> On 01/22/2014 11:58 AM, Maarten Lankhorst wrote:
>>> op 22-01-14 11:27, Thomas Hellstrom schreef:
>>>> On 01/22/2014 10:55 AM, Maarten Lankhorst wrote:
>>>>> op 22-01-14 10:40, Thomas Hellstrom schreef:
>>>>>> On 01/22/2014 09:19 AM, Maarten Lankhorst wrote:
>>>>>>> op 21-01-14 18:44, Thomas Hellstrom schreef:
>>>>>>>> On 01/21/2014 04:29 PM, Maarten Lankhorst wrote:
>>>>>>>>> Hey,
>>>>>>>>>
>>>>>>>>> op 21-01-14 16:17, Thomas Hellstrom schreef:
>>>>>>>>>> Maarten, for this and the other patches in this series,
>>>>>>>>>>
>>>>>>>>>> I seem to recall we have this discussion before?
>>>>>>>>>> IIRC I stated that reservation was a too heavy-weight lock to
>>>>>>>>>> hold to
>>>>>>>>>> determine whether a buffer was idle? It's a pretty nasty
>>>>>>>>>> thing to
>>>>>>>>>> build in.
>>>>>>>>>>
>>>>>>>>> I've sent this patch after determining that this already didn't
>>>>>>>>> end up
>>>>>>>>> being heavyweight.
>>>>>>>>> Most places were already using the fence_lock and reservation, I
>>>>>>>>> just
>>>>>>>>> fixed up the few
>>>>>>>>> places that didn't hold a reservation while waiting.
>>>>>>>>> Converting the
>>>>>>>>> few places that didn't
>>>>>>>>> ended up being trivial, so I thought I'd submit it.
>>>>>>>> Actually the only *valid* reason for holding a reservation when
>>>>>>>> waiting
>>>>>>>> for idle is
>>>>>>>> 1) You want to block further command submission on the buffer.
>>>>>>>> 2) You want to switch GPU engine and don't have access to gpu
>>>>>>>> semaphores
>>>>>>>> / barriers.
>>>>>>>>
>>>>>>>> Reservation has the nasty side effect that it blocks command
>>>>>>>> submission
>>>>>>>> and pins the buffer (in addition now makes the evict list
>>>>>>>> traversals
>>>>>>>> skip the buffer) which in general is *not* necessary for most wait
>>>>>>>> cases, so we should instead actually convert the wait cases that
>>>>>>>> don't
>>>>>>>> fulfill 1) and 2) above in the other direction if we have
>>>>>>>> performance
>>>>>>>> and latency-reduction in mind. I can't see how a spinlock
>>>>>>>> protecting a
>>>>>>>> fence pointer or fence list is stopping you from using RW
>>>>>>>> fences as
>>>>>>>> long
>>>>>>>> as the spinlock is held while manipulating the fence list?
>>>>>>>>
>>>>>>> You wish. Fine I'll enumerate all cases of ttm_bo_wait (with the
>>>>>>> patchset, though) and enumerate if they can be changed to work
>>>>>>> without
>>>>>>> reservation or not.
>>>>>>>
>>>>>>> ttm/ttm_bo.c
>>>>>>> ttm_bo_cleanup_refs_or_queue: needs reservation and ttm_bo_wait to
>>>>>>> finish for the direct destroy fastpath, if either fails it needs
>>>>>>> to be
>>>>>>> queued. Cannot work without reservation.
>>>>>> Doesn't block and no significant reservation contention expected.
>>>>>>
>>>>>>> ttm_bo_cleanup_refs_and_unlock: already drops reservation to wait,
>>>>>>> doesn't need to re-acquire. Simply reordering ttm_bo_wait until
>>>>>>> after
>>>>>>> re-reserve is enough.
>>>>>> Currently follows the above rules.
>>>>>>
>>>>>>> ttm_bo_evict: already has the reservation, cannot be dropped since
>>>>>>> only trylock is allowed. Dropping reservation would cause badness,
>>>>>>> cannot be converted.
>>>>>> Follows rule 2 above. We're about to move the buffer and if that
>>>>>> can't
>>>>>> be pipelined using the GPU (which TTM currently doesn't allow), we
>>>>>> need
>>>>>> to wait. Although eviction should be low priority compared to new
>>>>>> command submission, so I can't really see why we couldn't wait
>>>>>> before
>>>>>> trying to reserve here?
>>>>>>
>>>>>>> ttm_bo_move_buffer: called from ttm_bo_validate, cannot drop
>>>>>>> reservation for same reason as ttm_bo_evict. It might be part of a
>>>>>>> ticketed reservation so really don't drop lock here.
>>>>>> Part of command submission and as such follows rule 2 above. If
>>>>>> we can
>>>>>> pipeline the move with the GPU, no need to wait (but needs to be
>>>>>> implemented, of course).
>>>>>>
>>>>>>> ttm_bo_synccpu_write_grab: the wait could be converted to be done
>>>>>>> afterwards, without  fence_lock. But in this case reservation could
>>>>>>> take the role of fence_lock too,
>>>>>>>
>>>>>>> so no separate fence_lock would be needed.
>>>>>> With the exception that reservation is more likely to be contended.
>>>>> True but rule 1.
>>>>>>> ttm_bo_swapout: see ttm_bo_evict.
>>>>>>>
>>>>>>> ttm/ttm_bo_util.c:
>>>>>>> ttm_bo_move_accel_cleanup: calls ttm_bo_wait, cannot drop lock, see
>>>>>>> ttm_bo_move_buffer, can be called from that function.
>>>>>> Rule 2.
>>>>>>
>>>>>>> ttm/ttm_bo_vm.c
>>>>>>> ttm_bo_vm_fault_idle: I guess you COULD drop the reservation here,
>>>>>>> but
>>>>>>> you already had the reservation, so a similar optimization to
>>>>>>> ttm_bo_synccpu_write_grab could be done without requiring
>>>>>>> fence_lock.
>>>>>>> If you would write it like that, you would end up with a patch
>>>>>>> similar
>>>>>>> to drm/nouveau: add reservation to nouveau_gem_ioctl_cpu_prep. I
>>>>>>> think
>>>>>>> we should do this, an
>>>>>>>
>>>>>>> Ok, so the core does NOT need fence_lock because we can never drop
>>>>>>> reservations except in synccpu_write_grab and maybe
>>>>>>> ttm_bo_vm_fault_idle, but even in those cases reservation is
>>>>>>> done. So
>>>>>>> that could be used instead of fence_lock.
>>>>>>>
>>>>>>> nouveau_gem_ioctl_cpu_prep:
>>>>>>> Either block on a global spinlock or a local reservation lock.
>>>>>>> Doesn't
>>>>>>> matter much which, I don't need the need to keep a global lock for
>>>>>>> this function...
>>>>>>> 2 cases can happen in the trylock reservation failure case:
>>>>>>> buffer is
>>>>>>> not reserved, so it's not in the process of being evicted.
>>>>>>> buffer is
>>>>>>> reserved, which means it's being used in command submission right
>>>>>>> now,
>>>>>>> or in one of the functions described above (eg not idle).
>>>>>>>
>>>>>>> nouveau_gem_pushbuf_reloc_apply:
>>>>>>> has to call ttm_bo_wait with reservation, cannot be dropped.
>>>>>>>
>>>>>>> So for core ttm and nouveau the fence_lock is never needed, radeon
>>>>>>> has
>>>>>>> only 1 function that calls ttm_bo_wait which uses a reservation
>>>>>>> too.
>>>>>>> It doesn't need the fence_lock either.
>>>>>> And vmwgfx now also has a syccpu IOCTL (see drm-next).
>>>>>>
>>>>>> So assuming that we converted the functions that can be converted to
>>>>>> wait outside of reservation, the same way you have done with
>>>>>> Nouveau,
>>>>>> leaving the ones that fall under 1) and 2) above, I would still
>>>>>> argue
>>>>>> that a spinlock should be used because taking a reservation may
>>>>>> implicitly mean wait for gpu, and could give bad performance- and
>>>>>> latency charateristics. You shouldn't need to wait for gpu to check
>>>>>> for
>>>>>> buffer idle.
>>>>> Except that without reservation you can't tell if the buffer is
>>>>> really
>>>>> idle, or is currently being
>>>>> used as part of some command submission/eviction before the fence
>>>>> pointer is set.
>>>>>
>>>> Yes, but when that matters, you're either in case 1 or case 2 again.
>>>> Otherwise, when you release the reservation, you still don't know.
>>>> A typical example of this is the vmwgfx synccpu ioctl, where you can
>>>> either choose to block command submission (not used currently)
>>>> or not (user-space inter-process synchronization). The former is a
>>>> case
>>>> 1 wait and holds reservation while waiting for idle and then ups
>>>> cpu_writers. The latter waits without reservation for previously
>>>> submitted rendering to finish.
>>> Yeah you could, but what exactly are you waiting on then? If it's some
>>> specific existing rendering,
>>> I would argue that you should create an android userspace fence during
>>> command submission,
>>> or provide your own api to block on a specfic fence in userspace.
>>>
>>> If you don't then I think taking a reservation is not unreasonable. In
>>> the most common case the buffer is
>>> idle and not reserved, so it isn't contested. The actual waiting
>>> itself can be done without reservation held,
>>> by taking a reference on the fence.
>> Yeah, here is where we disagree. I'm afraid people will start getting
>> sloppy with reservations and use them to protect more stuff, and after a
>> while they start wondering why the GPU command queue drains...
>>
>> Perhaps we could agree on a solution (building on one of your original
>> ideas) where we require reservation to modify the fence pointers, and
>> the buffer object moving flag, but the structure holding the fence
>> pointer(s) is RCU safe, so that the pointers can be safely read under an
>> rcu lock.
> I think not modifying the fence pointer without reservation would be
> safest.
> I also don't think readers need the capability to clear sync_obj, this
> might
> simplify the implementation some.
>
> But my preferred option is getting rid of sync_obj completely, and
> move to
> using reservation_object->fence_shared/exclusive, like the incomplete
> proof
> of concept conversion done in nouveau. But then I do need to grab the
> reservation lock to touch things, because fences may be set by the
> i915 driver
> I share the reservation_object with.
>
> Alternatively could vmwgfx hold a spinlock when decrementing fence
> refcount instead?
> Then we wouldn't need this in the core, and vmwgfx could use:

Maarten,
requiring reservation to access the fence pointers really turns my gut!
Being able to read them under rcu is a remedy, but something I figure
would be the default and recommended thing to do. Not a vmware
exception. This is about as far as I'm prepared to go.

/Thomas