[Intel-gfx] [PATCH] drm/i915: Use dma_resv_iter for waiting in i915_gem_object_wait_reservation.
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Thu Oct 14 13:25:57 UTC 2021
On 14/10/2021 13:05, Maarten Lankhorst wrote:
> Op 14-10-2021 om 10:37 schreef Tvrtko Ursulin:
>>
>> On 13/10/2021 11:41, Maarten Lankhorst wrote:
>>> No memory should be allocated when calling i915_gem_object_wait,
>>> because it may be called to idle a BO when evicting memory.
>>>
>>> Fix this by using dma_resv_iter helpers to call
>>> i915_gem_object_wait_fence() on each fence, which cleans up the code a lot.
>>> Also remove dma_resv_prune, it's questionably.
>>>
>>> This will result in the following lockdep splat.
>>
>> <snip>
>>
>>> @@ -37,56 +36,17 @@ i915_gem_object_wait_reservation(struct dma_resv *resv,
>>> unsigned int flags,
>>> long timeout)
>>> {
>>> - struct dma_fence *excl;
>>> - bool prune_fences = false;
>>> -
>>> - if (flags & I915_WAIT_ALL) {
>>> - struct dma_fence **shared;
>>> - unsigned int count, i;
>>> - int ret;
>>> + struct dma_resv_iter cursor;
>>> + struct dma_fence *fence;
>>> - ret = dma_resv_get_fences(resv, &excl, &count, &shared);
>>> - if (ret)
>>> - return ret;
>>> -
>>> - for (i = 0; i < count; i++) {
>>> - timeout = i915_gem_object_wait_fence(shared[i],
>>> - flags, timeout);
>>> - if (timeout < 0)
>>> - break;
>>> + dma_resv_iter_begin(&cursor, resv, flags & I915_WAIT_ALL);
>>> + dma_resv_for_each_fence_unlocked(&cursor, fence) {
>>> - dma_fence_put(shared[i]);
>>> - }
>>> -
>>> - for (; i < count; i++)
>>> - dma_fence_put(shared[i]);
>>> - kfree(shared);
>>> -
>>> - /*
>>> - * If both shared fences and an exclusive fence exist,
>>> - * then by construction the shared fences must be later
>>> - * than the exclusive fence. If we successfully wait for
>>> - * all the shared fences, we know that the exclusive fence
>>> - * must all be signaled. If all the shared fences are
>>> - * signaled, we can prune the array and recover the
>>> - * floating references on the fences/requests.
>>> - */
>>> - prune_fences = count && timeout >= 0;
>>> - } else {
>>> - excl = dma_resv_get_excl_unlocked(resv);
>>> + timeout = i915_gem_object_wait_fence(fence, flags, timeout);
>>> + if (timeout <= 0)
>>> + break;
>>
>> You have another change in behaviour here, well a bug really. When userspace passes in zero timeout you fail to report activity in other than the first fence.
>
> Hmm, not necessarily, passing 0 to i915_gem_object_wait_fence timeout = 0 is a special case and means test only. It will return 1 on success.
I tried to enumerate the whole chain here. All for timeout == 0. Please double check I did not make a mistake somewhere since there are many return code inversions here.
As building blocks for the whole "game" we have:
1. dma_fence_default_wait, it returns for states:
not signaled -> 0
signaled -> 1
2. i915_request_wait
not signaled -> -ETIME
signaled -> 0
Then i915_gem_object_wait_fence builds on top of it and has therefore these possible outputs:
signaled -> 0
not signaled:
i915 path -> -ETIME
ext fence -> 0
So this looks a like problem already with 0 for signaled and not signaled. Unless it is by design that the return value does not want to report external fences? But it is not documented and it still waits on them so odd.
Then in i915_gem_object_wait_reservation we have a loop:
for (i = 0; i < count; i++) {
timeout = i915_gem_object_wait_fence(shared[i],
flags, timeout);
if (timeout < 0)
break;
So short circuit happens only for i915 fences, by virtue of no negative return codes otherwise.
If we focus for i915 fences only for a moment. It means it keeps skipping signaled to check if any is not, therefore returning -ETIME if any is not signaled. i915_gem_object_wait passes the negative return on.
With your patch you have:
+ timeout = i915_gem_object_wait_fence(fence, flags, timeout);
+ if (timeout <= 0)
+ break;
Which means you break on first signaled fence (i915 or external), therefore missing to report any possible subsequent unsignaled fences. So gem_wait ioctl breaks unless I am missing something.
Regards,
Tvrtko
>
> Of course it is still broken, I sent a reply to könig about it, hope it will get fixed and respun. :)
>
> ~Maarten
>
More information about the dri-devel
mailing list