[PATCH v3 01/12] drm: Add dummy page per device or GEM object

Thu Jan 7 16:30:52 UTC 2021

On Thu, Jan 07, 2021 at 11:26:52AM -0500, Andrey Grodzovsky wrote:
> 
> On 1/7/21 11:21 AM, Daniel Vetter wrote:
> > On Tue, Jan 05, 2021 at 04:04:16PM -0500, Andrey Grodzovsky wrote:
> > > On 11/23/20 3:01 AM, Christian König wrote:
> > > > Am 23.11.20 um 05:54 schrieb Andrey Grodzovsky:
> > > > > On 11/21/20 9:15 AM, Christian König wrote:
> > > > > > Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
> > > > > > > Will be used to reroute CPU mapped BO's page faults once
> > > > > > > device is removed.
> > > > > > Uff, one page for each exported DMA-buf? That's not something we can do.
> > > > > > 
> > > > > > We need to find a different approach here.
> > > > > > 
> > > > > > Can't we call alloc_page() on each fault and link them together
> > > > > > so they are freed when the device is finally reaped?
> > > > > 
> > > > > For sure better to optimize and allocate on demand when we reach
> > > > > this corner case, but why the linking ?
> > > > > Shouldn't drm_prime_gem_destroy be good enough place to free ?
> > > > I want to avoid keeping the page in the GEM object.
> > > > 
> > > > What we can do is to allocate a page on demand for each fault and link
> > > > the together in the bdev instead.
> > > > 
> > > > And when the bdev is then finally destroyed after the last application
> > > > closed we can finally release all of them.
> > > > 
> > > > Christian.
> > > 
> > > Hey, started to implement this and then realized that by allocating a page
> > > for each fault indiscriminately
> > > we will be allocating a new page for each faulting virtual address within a
> > > VA range belonging the same BO
> > > and this is obviously too much and not the intention. Should I instead use
> > > let's say a hashtable with the hash
> > > key being faulting BO address to actually keep allocating and reusing same
> > > dummy zero page per GEM BO
> > > (or for that matter DRM file object address for non imported BOs) ?
> > Why do we need a hashtable? All the sw structures to track this should
> > still be around:
> > - if gem_bo->dma_buf is set the buffer is currently exported as a dma-buf,
> >    so defensively allocate a per-bo page
> > - otherwise allocate a per-file page
> 
> 
> That exactly what we have in current implementation
> 
> 
> > 
> > Or is the idea to save the struct page * pointer? That feels a bit like
> > over-optimizing stuff. Better to have a simple implementation first and
> > then tune it if (and only if) any part of it becomes a problem for normal
> > usage.
> 
> 
> Exactly - the idea is to avoid adding extra pointer to drm_gem_object,
> Christian suggested to instead keep a linked list of dummy pages to be
> allocated on demand once we hit a vm_fault. I will then also prefault the entire
> VA range from vma->vm_end - vma->vm_start to vma->vm_end and map them
> to that single dummy page.

This strongly feels like premature optimization. If you're worried about
the overhead on amdgpu, pay down the debt by removing one of the redundant
pointers between gem and ttm bo structs (I think we still have some) :-)

Until we've nuked these easy&obvious ones we shouldn't play "avoid 1
pointer just because" games with hashtables.
-Daniel

> 
> Andrey
> 
> 
> > -Daniel
> > 
> > > Andrey
> > > 
> > > 
> > > > > Andrey
> > > > > 
> > > > > 
> > > > > > Regards,
> > > > > > Christian.
> > > > > > 
> > > > > > > Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
> > > > > > > ---
> > > > > > >    drivers/gpu/drm/drm_file.c  |  8 ++++++++
> > > > > > >    drivers/gpu/drm/drm_prime.c | 10 ++++++++++
> > > > > > >    include/drm/drm_file.h      |  2 ++
> > > > > > >    include/drm/drm_gem.h       |  2 ++
> > > > > > >    4 files changed, 22 insertions(+)
> > > > > > > 
> > > > > > > diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> > > > > > > index 0ac4566..ff3d39f 100644
> > > > > > > --- a/drivers/gpu/drm/drm_file.c
> > > > > > > +++ b/drivers/gpu/drm/drm_file.c
> > > > > > > @@ -193,6 +193,12 @@ struct drm_file *drm_file_alloc(struct drm_minor *minor)
> > > > > > >                goto out_prime_destroy;
> > > > > > >        }
> > > > > > >    +    file->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> > > > > > > +    if (!file->dummy_page) {
> > > > > > > +        ret = -ENOMEM;
> > > > > > > +        goto out_prime_destroy;
> > > > > > > +    }
> > > > > > > +
> > > > > > >        return file;
> > > > > > >      out_prime_destroy:
> > > > > > > @@ -289,6 +295,8 @@ void drm_file_free(struct drm_file *file)
> > > > > > >        if (dev->driver->postclose)
> > > > > > >            dev->driver->postclose(dev, file);
> > > > > > >    +    __free_page(file->dummy_page);
> > > > > > > +
> > > > > > >        drm_prime_destroy_file_private(&file->prime);
> > > > > > >          WARN_ON(!list_empty(&file->event_list));
> > > > > > > diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
> > > > > > > index 1693aa7..987b45c 100644
> > > > > > > --- a/drivers/gpu/drm/drm_prime.c
> > > > > > > +++ b/drivers/gpu/drm/drm_prime.c
> > > > > > > @@ -335,6 +335,13 @@ int drm_gem_prime_fd_to_handle(struct drm_device *dev,
> > > > > > >          ret = drm_prime_add_buf_handle(&file_priv->prime,
> > > > > > >                dma_buf, *handle);
> > > > > > > +
> > > > > > > +    if (!ret) {
> > > > > > > +        obj->dummy_page = alloc_page(GFP_KERNEL | __GFP_ZERO);
> > > > > > > +        if (!obj->dummy_page)
> > > > > > > +            ret = -ENOMEM;
> > > > > > > +    }
> > > > > > > +
> > > > > > >        mutex_unlock(&file_priv->prime.lock);
> > > > > > >        if (ret)
> > > > > > >            goto fail;
> > > > > > > @@ -1020,6 +1027,9 @@ void drm_prime_gem_destroy(struct
> > > > > > > drm_gem_object *obj, struct sg_table *sg)
> > > > > > >            dma_buf_unmap_attachment(attach, sg, DMA_BIDIRECTIONAL);
> > > > > > >        dma_buf = attach->dmabuf;
> > > > > > >        dma_buf_detach(attach->dmabuf, attach);
> > > > > > > +
> > > > > > > +    __free_page(obj->dummy_page);
> > > > > > > +
> > > > > > >        /* remove the reference */
> > > > > > >        dma_buf_put(dma_buf);
> > > > > > >    }
> > > > > > > diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h
> > > > > > > index 716990b..2a011fc 100644
> > > > > > > --- a/include/drm/drm_file.h
> > > > > > > +++ b/include/drm/drm_file.h
> > > > > > > @@ -346,6 +346,8 @@ struct drm_file {
> > > > > > >         */
> > > > > > >        struct drm_prime_file_private prime;
> > > > > > >    +    struct page *dummy_page;
> > > > > > > +
> > > > > > >        /* private: */
> > > > > > >    #if IS_ENABLED(CONFIG_DRM_LEGACY)
> > > > > > >        unsigned long lock_count; /* DRI1 legacy lock count */
> > > > > > > diff --git a/include/drm/drm_gem.h b/include/drm/drm_gem.h
> > > > > > > index 337a483..76a97a3 100644
> > > > > > > --- a/include/drm/drm_gem.h
> > > > > > > +++ b/include/drm/drm_gem.h
> > > > > > > @@ -311,6 +311,8 @@ struct drm_gem_object {
> > > > > > >         *
> > > > > > >         */
> > > > > > >        const struct drm_gem_object_funcs *funcs;
> > > > > > > +
> > > > > > > +    struct page *dummy_page;
> > > > > > >    };
> > > > > > >      /**
> > > > > _______________________________________________
> > > > > amd-gfx mailing list
> > > > > amd-gfx at lists.freedesktop.org
> > > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C3997b06c55f64db960ee08d8b3285ad4%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637456333209139294%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Org1D62C0GXyVn6rW8SnAkhhX8xvJXFCqA5zqyaR%2BeU%3D&reserved=0
> > > > > 
> > > > _______________________________________________
> > > > amd-gfx mailing list
> > > > amd-gfx at lists.freedesktop.org
> > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C3997b06c55f64db960ee08d8b3285ad4%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637456333209149289%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=WcnuJGZg%2B8jysOk2nTN9jXeyFkhauxMr4ajQYjP39zQ%3D&reserved=0
> > > > 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch