[Intel-gfx] [PATCH v3 08/17] drm/i915: Call i915_gem_evict_vm in vm_fault_gtt to prevent new ENOSPC errors
Maarten Lankhorst
maarten.lankhorst at linux.intel.com
Fri Dec 17 15:29:34 UTC 2021
On 17-12-2021 12:58, Matthew Auld wrote:
> On Thu, 16 Dec 2021 at 14:28, Maarten Lankhorst
> <maarten.lankhorst at linux.intel.com> wrote:
>> Now that we cannot unbind kill the currently locked object directly
> "unbind kill"
>
>> because we're removing short term pinning, we may have to unbind the
>> object from gtt manually, using a i915_gem_evict_vm() call.
>>
>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst at linux.intel.com>
> Maybe mention that this only in preparation for some future patches,
> once the actual eviction is trylock and evict_for_vm can also handle
> shared dma-resv? At this point in the series we shouldn't expect to
> hit -ENOSPC, right?
>
>> ---
>> drivers/gpu/drm/i915/gem/i915_gem_mman.c | 18 ++++++++++++++++--
>> 1 file changed, 16 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>> index af81d6c3332a..00cd9642669a 100644
>> --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>> @@ -358,8 +358,22 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
>> vma = i915_gem_object_ggtt_pin_ww(obj, &ww, &view, 0, 0, flags);
>> }
>>
>> - /* The entire mappable GGTT is pinned? Unexpected! */
>> - GEM_BUG_ON(vma == ERR_PTR(-ENOSPC));
>> + /*
>> + * The entire mappable GGTT is pinned? Unexpected!
>> + * Try to evict the object we locked too, as normally we skip it
>> + * due to lack of short term pinning inside execbuf.
>> + */
>> + if (vma == ERR_PTR(-ENOSPC)) {
>> + ret = mutex_lock_interruptible(&ggtt->vm.mutex);
>> + if (!ret) {
>> + ret = i915_gem_evict_vm(&ggtt->vm);
>> + mutex_unlock(&ggtt->vm.mutex);
>> + }
>> + if (ret)
>> + goto err_reset;
>> + vma = i915_gem_object_ggtt_pin_ww(obj, &ww, &view, 0, 0, flags);
>> + }
>> + GEM_WARN_ON(vma == ERR_PTR(-ENOSPC));
> Looks like this is being triggered in CI, I assume because the trylock
> could easily fail, due to contention? Is this expected for now? Do we
> keep the WARN and track it as a known issue?
I think it makes sense. I can probably fix i915_gem_evict_vm to attempt to take all objects in a blocking way.
I had some primitives that could lock for eviction, and keep a refcount on the object. i915_gem_evict_vm could probably be changed to use it.
>> }
>> if (IS_ERR(vma)) {
>> ret = PTR_ERR(vma);
>> --
>> 2.34.1
>>
More information about the Intel-gfx
mailing list