[PATCH v2 6/7] drm/panfrost: Add support for GPU heap allocations

Tue Jul 30 18:49:00 UTC 2019

On Mon, Jul 29, 2019 at 1:18 AM Steven Price <steven.price at arm.com> wrote:
>
> On 25/07/2019 18:40, Alyssa Rosenzweig wrote:
> >> Sorry, I was being sloppy again![1] I meant CPU mmapped.
> >
> > No worries, just wanted to check :)
> >
> >> Apparently the blob in some cases creates a SAME_VA GROW_ON_GPF buffer -
> >> since SAME_VA means permanently mapped on the CPU this translated to
> >> mmapping a HEAP object. Why it does this I've no idea.
> >
> > I'm not sure I follow. Conceptually, if you're permanently mapped,
> > there's nothing to grow, right? Is there a reason not to just disable
> > HEAP in this cases, i.e.:
> >
> >       if (flags & SAME_VA)
> >               flags &= ~GROW_ON_GPF;
> >
> > It may not be fully optimal, but that way the legacy code keeps working
> > and upstream userspace isn't held back :)
>
> Yes, that's my hack at the moment and it works. It looks like the driver
> might be allocated a depth or stencil buffer without knowing whether it
> will be used. The buffer is then "grown" if it is needed by the GPU. The
> problem is it is possible for the application to access it later.
>
> >> The main use in the blob for
> >> this is being able to dump buffers when debugging (i.e. dump buffers
> >> before/after every GPU job).
> >
> > Could we disable HEAP support in userspace (not setting the flags) for
> > debug builds that need to dump buffers? In production the extra memory
> > usage matters, hence this patch, but in dev, there's plenty of memory to
> > spare.
> >
> >> Ideally you also need a way of querying which pages have been backed
> >> by faults (much easier with kbase where that's always just the number
> >> of pages).
> >
> > Is there a use case for this with one of the userland APIs? (Maybe
> > Vulkan?)
>
> I'm not aware of OpenGL(ES) APIs that expose functionality like this.
> But e.g. allocating a buffer ahead of time for depth/stencil "just in
> case" would need something like this because you may need CPU access to it.
>
> Vulkan has the concept of "sparse" bindings/residency. As far as I'm
> aware there's no requirement that memory is allocated on demand, but a
> page-by-page approach to populating memory is expected. There's quite a
> bit of complexity and the actual way this is represented on the GPU
> doesn't necessarily match the user visible API. Also I believe it's an
> optional feature.
>
> Panfrost, of course, doesn't yet have a good mechanism for supporting
> anything like SAME_VA. My hack so far is to keep allocating BOs until it
> happens to land at an address currently unused in user space.
>
> OpenCL does require something like SAME_VA ("Shared Virtual Memory" or
> SVM). This is apparently useful because the same pointer can be used on
> both CPU and GPU.
>
> I can see two approaches for integrating that:
>
> * Use HMM: CPU VA==GPU VA. This nicely solves the problem, but falls
> over badly when the GPU VA size is smaller than the user space VA size -
> which is sadly true on many 64 bit integrations.

If mmap limits CPU addresses to the GPU VA size to start with,
wouldn't that solve the problem (at least to the extent it is
solvable).

>From a brief read, while HMM supports page table mirroring, it seems
more geared towards supporting discreet graphics memory. It also
mentions that it avoids pinning all pages in memory, but that's kind
of an assumption of GEM objects (I'm kind of working against that with
the heap support). Or at least all the common GEM helpers work that
way.

> * Provide an allocation flag which causes the kernel driver to not pick
> a GPU address until the buffer is mapped on the CPU. The mmap callback
> would then need to look for a region that is free on both the CPU and GPU.

Using mmap seems like a neat solution compared to how other drivers
handle this which is yet another ioctl to set the GPU VA. In those
cases, all the GPU VA space management is in userspace which seems
backwards to me. I'm not sure if there are any downsides to using mmap
though.

In any case, per process AS is a prerequisite to all this. That's
probably the bigger chunk of work and still lower priority than not
running out of memory. :)

Rob