[RFC PATCH 2/2] drm: etnaviv: Unmap gems on gem_close
Daniel Vetter
daniel at ffwll.ch
Fri Oct 30 09:33:48 UTC 2020
On Fri, Oct 30, 2020 at 10:19:54AM +0100, Lucas Stach wrote:
> Am Donnerstag, den 29.10.2020, 19:20 +0100 schrieb Daniel Vetter:
> > On Thu, Oct 29, 2020 at 03:38:21PM +0100, Lucas Stach wrote:
> > > Hi Guido,
> > >
> > > Am Donnerstag, den 29.10.2020, 15:20 +0100 schrieb Guido Günther:
> > > > So far the unmap from gpu address space only happened when dropping the
> > > > last ref in gem_free_object_unlocked, however that is skipped if there's
> > > > still multiple handles to the same GEM object.
> > > >
> > > > Since userspace (here mesa) in the case of softpin hands back the memory
> > > > region to the pool of available GPU virtual memory closing the handle
> > > > via DRM_IOCTL_GEM_CLOSE this can lead to etnaviv_iommu_insert_exact
> > > > failing later since userspace thinks the vaddr is available while the
> > > > kernel thinks it isn't making the submit fail like
> > > >
> > > > [E] submit failed: -14 (No space left on device) (etna_cmd_stream_flush:244)
> > > >
> > > > Fix this by unmapping the memory via the .gem_close_object callback.
> > > >
> > > > Signed-off-by: Guido Günther <agx at sigxcpu.org>
> > > > ---
> > > > drivers/gpu/drm/etnaviv/etnaviv_drv.c | 1 +
> > > > drivers/gpu/drm/etnaviv/etnaviv_drv.h | 1 +
> > > > drivers/gpu/drm/etnaviv/etnaviv_gem.c | 32 +++++++++++++++++++++++++++
> > > > 3 files changed, 34 insertions(+)
> > > >
> > > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.c b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> > > > index a9a3afaef9a1..fdcc6405068c 100644
> > > > --- a/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> > > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.c
> > > > @@ -491,6 +491,7 @@ static struct drm_driver etnaviv_drm_driver = {
> > > > .open = etnaviv_open,
> > > > .postclose = etnaviv_postclose,
> > > > .gem_free_object_unlocked = etnaviv_gem_free_object,
> > > > + .gem_close_object = etnaviv_gem_close_object,
> > > > .gem_vm_ops = &vm_ops,
> > > > .prime_handle_to_fd = drm_gem_prime_handle_to_fd,
> > > > .prime_fd_to_handle = drm_gem_prime_fd_to_handle,
> > > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_drv.h b/drivers/gpu/drm/etnaviv/etnaviv_drv.h
> > > > index 4d8dc9236e5f..2226a9af0d63 100644
> > > > --- a/drivers/gpu/drm/etnaviv/etnaviv_drv.h
> > > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_drv.h
> > > > @@ -65,6 +65,7 @@ int etnaviv_gem_cpu_prep(struct drm_gem_object *obj, u32 op,
> > > > struct drm_etnaviv_timespec *timeout);
> > > > int etnaviv_gem_cpu_fini(struct drm_gem_object *obj);
> > > > void etnaviv_gem_free_object(struct drm_gem_object *obj);
> > > > +void etnaviv_gem_close_object(struct drm_gem_object *obj, struct drm_file *file);
> > > > int etnaviv_gem_new_handle(struct drm_device *dev, struct drm_file *file,
> > > > u32 size, u32 flags, u32 *handle);
> > > > int etnaviv_gem_new_userptr(struct drm_device *dev, struct drm_file *file,
> > > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> > > > index f06e19e7be04..5aec4a23c77e 100644
> > > > --- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> > > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
> > > > @@ -515,6 +515,38 @@ static const struct etnaviv_gem_ops etnaviv_gem_shmem_ops = {
> > > > .mmap = etnaviv_gem_mmap_obj,
> > > > };
> > > >
> > > > +void etnaviv_gem_close_object(struct drm_gem_object *obj, struct drm_file *unused)
> > > > +{
> > > > + struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj);
> > > > + struct etnaviv_vram_mapping *mapping, *tmp;
> > > > +
> > > > + /* Handle this via etnaviv_gem_free_object */
> > > > + if (obj->handle_count == 1)
> > > > + return;
> > > > +
> > > > + WARN_ON(is_active(etnaviv_obj));
> > > > +
> > > > + /*
> > > > + * userspace wants to release the handle so we need to remove
> > > > + * the mapping from the gpu's virtual address space to stay
> > > > + * in sync.
> > > > + */
> > > > + list_for_each_entry_safe(mapping, tmp, &etnaviv_obj->vram_list,
> > > > + obj_node) {
> > > > + struct etnaviv_iommu_context *context = mapping->context;
> > > > +
> > > > + WARN_ON(mapping->use);
> > > > +
> > > > + if (context) {
> > > > + etnaviv_iommu_unmap_gem(context, mapping);
> > > > + etnaviv_iommu_context_put(context);
> > >
> > > I see the issue you are trying to fix here, but this is not a viable
> > > fix. While userspace may close the handle, the GPU may still be
> > > processing jobs referening the BO, so we can't just unmap it from the
> > > address space.
> > >
> > > I think what we need to do here is waiting for the current jobs/fences
> > > of the BO when we hit the case where userspace tries to assign a new
> > > address to the BO. Basically wait for current jobs -> unamp from the
> > > address space -> map at new userspace assigned address.
> >
> > Yeah was about to say the same. There's two solutions to this:
> > - let the kernel manage the VA space. This is what amdgpu does in some
> > cases (but still no relocations)
> > - pipeline the VA/PTE updates in your driver, because userspace has a
> > somewhat hard time figuring out when a buffer is done. Doing that would
> > either at complexity or stalls when the kernel is doing all the tracking
> > already anyway. Minimal fix is to do what Lucas explained above, but
> > importantly with the kernel solution we have the option to fully
> > pipeline everything and avoid stalls. I think this is what everyone else
> > who lets userspace manage VA does in their kernel side.
>
> I thought a bit more about this and the issue is only about userspace
> reusing regions of its _private_ address space, before they are GPU
> inactive. So the problem of tracking BO active for figuring out when it
> is safe to close the handle (and thus forget about the last placement
> of the BO in the AS) goes from "is this BO globally active anywhere?"
> to "is this BO still active in one of the jobs I submitted?", which is
> not hard at all for the userspace to figure out.
>
> With this in mind I think we can reasonably declare the kernel behavior
> of rejecting the submit as okay and just add the tiny bit of userspace
> fencing needed for the userspace driver to not close the handle until
> the last submit referencing this BO has finished.
Hm why don't you have the same issue for shared buffers? Are these handled
with relocs?
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
More information about the etnaviv
mailing list