[RFC PATCH] drm/panfrost: Add support for mapping BOs on GPU page faults

Thu Jun 27 16:38:40 UTC 2019

On Thu, Jun 27, 2019 at 4:57 AM Steven Price <steven.price at arm.com> wrote:
>
> Sorry for the slow response, I've been on holiday for a few weeks.

Welcome back.

>
> On 20/06/2019 06:50, Tomeu Vizoso wrote:
> > On Mon, 17 Jun 2019 at 16:56, Rob Herring <robh at kernel.org> wrote:
> >>
> >> On Sun, Jun 16, 2019 at 11:15 PM Tomeu Vizoso
> >> <tomeu.vizoso at collabora.com> wrote:
> >>>
> >>> On Fri, 14 Jun 2019 at 23:22, Rob Herring <robh at kernel.org> wrote:
> >>>>
> >>>> On Wed, Jun 12, 2019 at 6:55 AM Tomeu Vizoso <tomeu at tomeuvizoso.net> wrote:
> >>>>>
> >>>>> On Mon, 10 Jun 2019 at 19:06, Rob Herring <robh at kernel.org> wrote:
> >>>>>>
> >>>>>> The midgard/bifrost GPUs need to allocate GPU memory which is allocated
> >>>>>> on GPU page faults and not pinned in memory. The vendor driver calls
> >>>>>> this functionality GROW_ON_GPF.
> >>>>>>
> >>>>>> This implementation assumes that BOs allocated with the
> >>>>>> PANFROST_BO_NOMAP flag are never mmapped or exported. Both of those may
> >>>>>> actually work, but I'm unsure if there's some interaction there. It
> >>>>>> would cause the whole object to be pinned in memory which would defeat
> >>>>>> the point of this.
>
> Although in normal usage user space will never care about the contents
> of growable memory it can be useful to be able to access it for
> debugging (although not critical to have it working immediately). In
> particular it allow submitting the jobs in a job chain separately.
> Exporting I can't see a use-case for.
>
> So personally I'd prefer not using a "NOMAP" flag to mean "grow on fault".

NOMAP means 'no gpu map on alloc'. The CPU mapping part is just a
limitation in the implementation which could be handled if needed.

NOPIN? It's not really 'growing' either as the total/max size is
fixed. No sure if that's the same for kbase. Maybe faults happen to be
sequential in addresses and it grows in that sense.

Maybe just saying what the buffer is used for (HEAP) would be best?
Speaking of alloc flags, Alyssa also mentioned we need a way to align
shader buffers. My suggestion there is an executable flag. That way we
can also set pages to XN. Though maybe alignment requirements need to
be explicit?

> >>>>>> Issues/questions/thoughts:
> >>>>>>
> >>>>>> What's the difference between i_mapping and f_mapping?
> >>>>>>
> >>>>>> What kind of clean-up on close is needed? Based on vgem faults, there
> >>>>>> doesn't seem to be any refcounting. Assume userspace is responsible for
> >>>>>> not freeing the BO while a page fault can occur?
> >>>>>
> >>>>> Aren't we taking a reference on all BOs that a job relates to and
> >>>>> unreferencing them once the job is done? I would think that that's
> >>>>> enough, or am I missing something?
> >>>>
> >>>> No, I think we're fine.
> >>>>
> >>>>>> What about evictions? Need to call mapping_set_unevictable()? Maybe we
> >>>>>> want these pages to be swappable, but then we need some notification to
> >>>>>> unmap them.
> >>>>>
> >>>>> I'm not sure there's much point in swapping out pages with lifetimes
> >>>>> of a few milliseconds.
> >>>>
> >>>> The lifetime is *forever* though. If we don't allow swapping, then the
> >>>> heap is grow only until the FD is closed. IIRC, the maximum size is on
> >>>> the order of 1GB. Seems like you'd want to shrink it with some
> >>>> trigger.
> >>>
> >>> I thought that the lifetime of the *contents* of the heap was that of
> >>> the job chain that wrote them? Otherwise, only the GPU would know what
> >>> can be discarded.
> >>
> >> Yes, that's probably true. To take that to the extreme, we could
> >> allocate and free the heap BO on each job chain. But we don't do that
> >> because of the overhead. So mapping and unmapping is a similar trade
> >> off of frequency vs. overhead. The question is when do we allow pages
> >> to be swapped out (as that is unhandled in the current patch)? Our
> >> choices are:
> >>
> >> - at any time. This is what the patch currently does as we don't
> >> prevent eviction. Though we'd need some mechanism to be notified when
> >> a page is evicted which is currently missing.
> >> - when a job finishes. We'd have to iterate thru BO's and mark pages
> >> evict-able on NOMAP BOs. Not sure where we do that in the driver.
> >
> > My understanding is that any contents of NOMAP memory aren't expected
> > to persist across jobs. So, when a job finishes we can unmap all pages
> > that the job faulted on.
> >
> > If we do that, then we won't have a strong need to allow NOMAP pages
> > to be swapped out because jobs aren't expected to take that long.
>
> It would certainly seem reasonable that the contents of NOMAP memory can
> be thrown away when the job chain has been completed. But, there is a
> potential performance improvement by not immediately unmapping/freeing
> the memory but leaving it in the assumption a similar job will be
> submitted later requiring roughly the same amount of memory.
>
> Arm's blob/kernel have various mechanisms for freeing memory either
> after a period of being idle (in the blob) or when a shrinker is called
> (in kbase). The idea is that the heap memory is grown once to whatever
> the content needs and then the same buffer (or small set of buffers) is
> reused repeatedly. kbase has a mechanism called "ephemeral memory" (or
> evictable) which is memory which normally remains mapped on the GPU, but
> under memory pressure it can be freed (and later faulted in with empty
> pages if accessed again). A pinning mechanism is used to ensure that
> this doesn't happen in the middle of a job chain which uses the buffer.
> This mechanism is referred to as "JIT" (Just In Time allocation) in places.

That's a bit simpler than what I assumed JIT was.

So there's 2 different cases of memory not pinned on alloc. The first
is the heap memory which is just faulted on demand (i.e during jobs)
and the 2nd is the JIT which is pinned some time between alloc and a
job submit. Is that correct? Is that 2 different allocation flags or 1
flag with 2 different ways to get pages pinned?

> > I could very well be missing something that is needed by Arm's blob
> > and not by Panfrost atm, but I don't see in kbase any mechanism for
> > the kernel to know when the GPU is done with a page, other than the
> > job that mapped it having finished.
>
> Much of the memory management is done by the user space blob. The kernel
> driver usually doesn't actually know what memory a job will access.
> There are exceptions though, in particular: ephemeral memory (through
> JIT) and imported memory.

Presumably that's a difference. We have a complete list of BOs for each job.

Rob