[PATCH v3 01/12] drm: Add dummy page per device or GEM object

Tue Jan 12 09:13:41 UTC 2021

On Tue, Jan 12, 2021 at 9:12 AM Christian König
<christian.koenig at amd.com> wrote:
>
> Am 11.01.21 um 17:13 schrieb Daniel Vetter:
> > On Fri, Jan 08, 2021 at 04:49:55PM +0000, Grodzovsky, Andrey wrote:
> >> Ok then, I guess I will proceed with the dummy pages list implementation then.
> >>
> >> Andrey
> >>
> >> ________________________________
> >> From: Koenig, Christian <Christian.Koenig at amd.com>
> >> Sent: 08 January 2021 09:52
> >> To: Grodzovsky, Andrey <Andrey.Grodzovsky at amd.com>; Daniel Vetter <daniel at ffwll.ch>
> >> Cc: amd-gfx at lists.freedesktop.org <amd-gfx at lists.freedesktop.org>; dri-devel at lists.freedesktop.org <dri-devel at lists.freedesktop.org>; daniel.vetter at ffwll.ch <daniel.vetter at ffwll.ch>; robh at kernel.org <robh at kernel.org>; l.stach at pengutronix.de <l.stach at pengutronix.de>; yuq825 at gmail.com <yuq825 at gmail.com>; eric at anholt.net <eric at anholt.net>; Deucher, Alexander <Alexander.Deucher at amd.com>; gregkh at linuxfoundation.org <gregkh at linuxfoundation.org>; ppaalanen at gmail.com <ppaalanen at gmail.com>; Wentland, Harry <Harry.Wentland at amd.com>
> >> Subject: Re: [PATCH v3 01/12] drm: Add dummy page per device or GEM object
> >>
> >> Mhm, I'm not aware of any let over pointer between TTM and GEM and we
> >> worked quite hard on reducing the size of the amdgpu_bo, so another
> >> extra pointer just for that corner case would suck quite a bit.
> > We have a ton of other pointers in struct amdgpu_bo (or any of it's lower
> > things) which are fairly single-use, so I'm really not much seeing the
> > point in making this a special case. It also means the lifetime management
> > becomes a bit iffy, since we can't throw away the dummy page then the last
> > reference to the bo is released (since we don't track it there), but only
> > when the last pointer to the device is released. Potentially this means a
> > pile of dangling pages hanging around for too long.
>
> Yeah, all of them are already on my TODO list, but see below.
>
> > If you need some ideas for redundant pointers:
> > - destroy callback (kinda not cool to not have this const anyway), we
> >    could refcount it all with the overall gem bo. Quite a bit of work.
>
> The bigger problems is that TTM based drivers are using the destroy
> callback pointer to distinct ghost objects from real ones.
>
> We first need to get rid of those. I already have a plan for that and
> ~20% of it implemented, but it is more complicated because of the driver
> specific backends in Nouveau, Amdgpu and vmwgfx.
>
> > - bdev pointer, if we move the device ttm stuff into struct drm_device, or
> >    create a common struct ttm_device, we can ditch that
>
> Yes, exactly that's what my device structure rename patch set is aiming
> for :)

Hm already on the list and did I miss it?

> > - We could probably merge a few of the fields and find 8 bytes somewhere
>
> Please point out where.

Flags and bool deleted looked compressible at a glance. Not sure
that's worth it.

> > - we still have 2 krefs, would probably need to fix that before we can
> >    merge the destroy callbacks
>
> Yes, already on my TODO list as well. But the last time I looked into
> this I was blocked by the struct_mutex once more.

Uh struct_mutex, I thought we've killed that for good. How is it
getting in the way?

> > So there's plenty of room still, if the size of a bo struct is really that
> > critical. Imo it's not.
>
> It is. See we had a size of struct amdgpu_bo of over 1500 bytes because
> we stopped caring for that, no we are down to 816 at the moment.
>
> We really need to get rid of this duplication of functionality and
> structure between TTM and GEM.

Yeah, and if you have patches nag me, happy to review them anytime really.

Cheers, Daniel

>
> Christian.
>
> > -Daniel
> >
> >
> >> Christian.
> >>
> >> Am 08.01.21 um 15:46 schrieb Andrey Grodzovsky:
> >>> Daniel had some objections to this (see bellow) and so I guess I need
> >>> you both to agree on the approach before I proceed.
> >>>
> >>> Andrey
> >>>
> >>> On 1/8/21 9:33 AM, Christian König wrote:
> >>>> Am 08.01.21 um 15:26 schrieb Andrey Grodzovsky:
> >>>>> Hey Christian, just a ping.
> >>>> Was there any question for me here?
> >>>>
> >>>> As far as I can see the best approach would still be to fill the VMA
> >>>> with a single dummy page and avoid pointers in the GEM object.
> >>>>
> >>>> Christian.
> >>>>
> >>>>> Andrey
> >>>>>
> >>>>> On 1/7/21 11:37 AM, Andrey Grodzovsky wrote:
> >>>>>> On 1/7/21 11:30 AM, Daniel Vetter wrote:
> >>>>>>> On Thu, Jan 07, 2021 at 11:26:52AM -0500, Andrey Grodzovsky wrote:
> >>>>>>>> On 1/7/21 11:21 AM, Daniel Vetter wrote:
> >>>>>>>>> On Tue, Jan 05, 2021 at 04:04:16PM -0500, Andrey Grodzovsky wrote:
> >>>>>>>>>> On 11/23/20 3:01 AM, Christian König wrote:
> >>>>>>>>>>> Am 23.11.20 um 05:54 schrieb Andrey Grodzovsky:
> >>>>>>>>>>>> On 11/21/20 9:15 AM, Christian König wrote:
> >>>>>>>>>>>>> Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
> >>>>>>>>>>>>>> Will be used to reroute CPU mapped BO's page faults once
> >>>>>>>>>>>>>> device is removed.
> >>>>>>>>>>>>> Uff, one page for each exported DMA-buf? That's not
> >>>>>>>>>>>>> something we can do.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> We need to find a different approach here.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Can't we call alloc_page() on each fault and link them together
> >>>>>>>>>>>>> so they are freed when the device is finally reaped?
> >>>>>>>>>>>> For sure better to optimize and allocate on demand when we reach
> >>>>>>>>>>>> this corner case, but why the linking ?
> >>>>>>>>>>>> Shouldn't drm_prime_gem_destroy be good enough place to free ?
> >>>>>>>>>>> I want to avoid keeping the page in the GEM object.
> >>>>>>>>>>>
> >>>>>>>>>>> What we can do is to allocate a page on demand for each fault
> >>>>>>>>>>> and link
> >>>>>>>>>>> the together in the bdev instead.
> >>>>>>>>>>>
> >>>>>>>>>>> And when the bdev is then finally destroyed after the last
> >>>>>>>>>>> application
> >>>>>>>>>>> closed we can finally release all of them.
> >>>>>>>>>>>
> >>>>>>>>>>> Christian.
> >>>>>>>>>> Hey, started to implement this and then realized that by
> >>>>>>>>>> allocating a page
> >>>>>>>>>> for each fault indiscriminately
> >>>>>>>>>> we will be allocating a new page for each faulting virtual
> >>>>>>>>>> address within a
> >>>>>>>>>> VA range belonging the same BO
> >>>>>>>>>> and this is obviously too much and not the intention. Should I
> >>>>>>>>>> instead use
> >>>>>>>>>> let's say a hashtable with the hash
> >>>>>>>>>> key being faulting BO address to actually keep allocating and
> >>>>>>>>>> reusing same
> >>>>>>>>>> dummy zero page per GEM BO
> >>>>>>>>>> (or for that matter DRM file object address for non imported
> >>>>>>>>>> BOs) ?
> >>>>>>>>> Why do we need a hashtable? All the sw structures to track this
> >>>>>>>>> should
> >>>>>>>>> still be around:
> >>>>>>>>> - if gem_bo->dma_buf is set the buffer is currently exported as
> >>>>>>>>> a dma-buf,
> >>>>>>>>>      so defensively allocate a per-bo page
> >>>>>>>>> - otherwise allocate a per-file page
> >>>>>>>> That exactly what we have in current implementation
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>> Or is the idea to save the struct page * pointer? That feels a
> >>>>>>>>> bit like
> >>>>>>>>> over-optimizing stuff. Better to have a simple implementation
> >>>>>>>>> first and
> >>>>>>>>> then tune it if (and only if) any part of it becomes a problem
> >>>>>>>>> for normal
> >>>>>>>>> usage.
> >>>>>>>> Exactly - the idea is to avoid adding extra pointer to
> >>>>>>>> drm_gem_object,
> >>>>>>>> Christian suggested to instead keep a linked list of dummy pages
> >>>>>>>> to be
> >>>>>>>> allocated on demand once we hit a vm_fault. I will then also
> >>>>>>>> prefault the entire
> >>>>>>>> VA range from vma->vm_end - vma->vm_start to vma->vm_end and map
> >>>>>>>> them
> >>>>>>>> to that single dummy page.
> >>>>>>> This strongly feels like premature optimization. If you're worried
> >>>>>>> about
> >>>>>>> the overhead on amdgpu, pay down the debt by removing one of the
> >>>>>>> redundant
> >>>>>>> pointers between gem and ttm bo structs (I think we still have
> >>>>>>> some) :-)
> >>>>>>>
> >>>>>>> Until we've nuked these easy&obvious ones we shouldn't play "avoid 1
> >>>>>>> pointer just because" games with hashtables.
> >>>>>>> -Daniel
> >>>>>>
> >>>>>> Well, if you and Christian can agree on this approach and suggest
> >>>>>> maybe what pointer is
> >>>>>> redundant and can be removed from GEM struct so we can use the
> >>>>>> 'credit' to add the dummy page
> >>>>>> to GEM I will be happy to follow through.
> >>>>>>
> >>>>>> P.S Hash table is off the table anyway and we are talking only
> >>>>>> about linked list here since by prefaulting
> >>>>>> the entire VA range for a vmf->vma i will be avoiding redundant
> >>>>>> page faults to same VMA VA range and so
> >>>>>> don't need to search and reuse an existing dummy page but simply
> >>>>>> create a new one for each next fault.
> >>>>>>
> >>>>>> Andrey
>

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch