[RFC PATCH 2/2] drm: etnaviv: Unmap gems on gem_close
Daniel Vetter
daniel at ffwll.ch
Thu Oct 29 18:20:54 UTC 2020
On Thu, Oct 29, 2020 at 03:38:21PM +0100, Lucas Stach wrote:
> Hi Guido,
>
> Am Donnerstag, den 29.10.2020, 15:20 +0100 schrieb Guido Günther:
> > So far the unmap from gpu address space only happened when dropping the
> > last ref in gem_free_object_unlocked, however that is skipped if there's
> > still multiple handles to the same GEM object.
> >
> > Since userspace (here mesa) in the case of softpin hands back the memory
> > region to the pool of available GPU virtual memory closing the handle
> > via DRM_IOCTL_GEM_CLOSE this can lead to etnaviv_iommu_insert_exact
> > failing later since userspace thinks the vaddr is available while the
> > kernel thinks it isn't making the submit fail like
> >
> > [E] submit failed: -14 (No space left on device) (etna_cmd_stream_flush:244)
> >
> > Fix this by unmapping the memory via the .gem_close_object callback.
> >
> > Signed-off-by: Guido Günther <agx at sigxcpu.org>
> > ---
> > drivers/gpu/drm/etnaviv/etnaviv_drv.c | 1 +
> > drivers/gpu/drm/etnaviv/etnaviv_drv.h | 1 +
> > drivers/gpu/drm/etnaviv/etnaviv_gem.c | 32 +++++++++++++++++++++++++++
> > 3 files changed, 34 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.c b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> > index a9a3afaef9a1..fdcc6405068c 100644
> > --- a/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> > +++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> > @@ -491,6 +491,7 @@ static struct drm_driver etnaviv_drm_driver = {
> > .open = etnaviv_open,
> > .postclose = etnaviv_postclose,
> > .gem_free_object_unlocked = etnaviv_gem_free_object,
> > + .gem_close_object = etnaviv_gem_close_object,
> > .gem_vm_ops = &vm_ops,
> > .prime_handle_to_fd = drm_gem_prime_handle_to_fd,
> > .prime_fd_to_handle = drm_gem_prime_fd_to_handle,
> > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.h b/drivers/gpu/drm/etnaviv/etnaviv_drv.h
> > index 4d8dc9236e5f..2226a9af0d63 100644
> > --- a/drivers/gpu/drm/etnaviv/etnaviv_drv.h
> > +++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.h
> > @@ -65,6 +65,7 @@ int etnaviv_gem_cpu_prep(struct drm_gem_object *obj, u32 op,
> > struct drm_etnaviv_timespec *timeout);
> > int etnaviv_gem_cpu_fini(struct drm_gem_object *obj);
> > void etnaviv_gem_free_object(struct drm_gem_object *obj);
> > +void etnaviv_gem_close_object(struct drm_gem_object *obj, struct drm_file *file);
> > int etnaviv_gem_new_handle(struct drm_device *dev, struct drm_file *file,
> > u32 size, u32 flags, u32 *handle);
> > int etnaviv_gem_new_userptr(struct drm_device *dev, struct drm_file *file,
> > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> > index f06e19e7be04..5aec4a23c77e 100644
> > --- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> > +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> > @@ -515,6 +515,38 @@ static const struct etnaviv_gem_ops etnaviv_gem_shmem_ops = {
> > .mmap = etnaviv_gem_mmap_obj,
> > };
> >
> > +void etnaviv_gem_close_object(struct drm_gem_object *obj, struct drm_file *unused)
> > +{
> > + struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj);
> > + struct etnaviv_vram_mapping *mapping, *tmp;
> > +
> > + /* Handle this via etnaviv_gem_free_object */
> > + if (obj->handle_count == 1)
> > + return;
> > +
> > + WARN_ON(is_active(etnaviv_obj));
> > +
> > + /*
> > + * userspace wants to release the handle so we need to remove
> > + * the mapping from the gpu's virtual address space to stay
> > + * in sync.
> > + */
> > + list_for_each_entry_safe(mapping, tmp, &etnaviv_obj->vram_list,
> > + obj_node) {
> > + struct etnaviv_iommu_context *context = mapping->context;
> > +
> > + WARN_ON(mapping->use);
> > +
> > + if (context) {
> > + etnaviv_iommu_unmap_gem(context, mapping);
> > + etnaviv_iommu_context_put(context);
>
> I see the issue you are trying to fix here, but this is not a viable
> fix. While userspace may close the handle, the GPU may still be
> processing jobs referening the BO, so we can't just unmap it from the
> address space.
>
> I think what we need to do here is waiting for the current jobs/fences
> of the BO when we hit the case where userspace tries to assign a new
> address to the BO. Basically wait for current jobs -> unamp from the
> address space -> map at new userspace assigned address.
Yeah was about to say the same. There's two solutions to this:
- let the kernel manage the VA space. This is what amdgpu does in some
cases (but still no relocations)
- pipeline the VA/PTE updates in your driver, because userspace has a
somewhat hard time figuring out when a buffer is done. Doing that would
either at complexity or stalls when the kernel is doing all the tracking
already anyway. Minimal fix is to do what Lucas explained above, but
importantly with the kernel solution we have the option to fully
pipeline everything and avoid stalls. I think this is what everyone else
who lets userspace manage VA does in their kernel side.
-Daniel
>
> Regards,
> Lucas
>
> > + }
> > +
> > + list_del(&mapping->obj_node);
> > + kfree(mapping);
> > + }
> > +}
> > +
> > void etnaviv_gem_free_object(struct drm_gem_object *obj)
> > {
> > struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj);
>
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
More information about the etnaviv
mailing list