[PATCH v2 6/7] drm/panfrost: Add support for GPU heap allocations

Fri Jul 26 10:43:00 UTC 2019

On 25/07/2019 18:40, Alyssa Rosenzweig wrote:
>> Sorry, I was being sloppy again![1] I meant CPU mmapped.
> 
> No worries, just wanted to check :)
> 
>> Apparently the blob in some cases creates a SAME_VA GROW_ON_GPF buffer -
>> since SAME_VA means permanently mapped on the CPU this translated to
>> mmapping a HEAP object. Why it does this I've no idea.
> 
> I'm not sure I follow. Conceptually, if you're permanently mapped,
> there's nothing to grow, right? Is there a reason not to just disable
> HEAP in this cases, i.e.:
> 
> 	if (flags & SAME_VA)
> 		flags &= ~GROW_ON_GPF;
> 
> It may not be fully optimal, but that way the legacy code keeps working
> and upstream userspace isn't held back :)

Yes, that's my hack at the moment and it works. It looks like the driver 
might be allocated a depth or stencil buffer without knowing whether it 
will be used. The buffer is then "grown" if it is needed by the GPU. The 
problem is it is possible for the application to access it later.

>> The main use in the blob for
>> this is being able to dump buffers when debugging (i.e. dump buffers
>> before/after every GPU job).
> 
> Could we disable HEAP support in userspace (not setting the flags) for
> debug builds that need to dump buffers? In production the extra memory
> usage matters, hence this patch, but in dev, there's plenty of memory to
> spare.
> 
>> Ideally you also need a way of querying which pages have been backed
>> by faults (much easier with kbase where that's always just the number
>> of pages).
> 
> Is there a use case for this with one of the userland APIs? (Maybe
> Vulkan?)

I'm not aware of OpenGL(ES) APIs that expose functionality like this. 
But e.g. allocating a buffer ahead of time for depth/stencil "just in 
case" would need something like this because you may need CPU access to it.

Vulkan has the concept of "sparse" bindings/residency. As far as I'm 
aware there's no requirement that memory is allocated on demand, but a 
page-by-page approach to populating memory is expected. There's quite a 
bit of complexity and the actual way this is represented on the GPU 
doesn't necessarily match the user visible API. Also I believe it's an 
optional feature.

Panfrost, of course, doesn't yet have a good mechanism for supporting 
anything like SAME_VA. My hack so far is to keep allocating BOs until it 
happens to land at an address currently unused in user space.

OpenCL does require something like SAME_VA ("Shared Virtual Memory" or 
SVM). This is apparently useful because the same pointer can be used on 
both CPU and GPU.

I can see two approaches for integrating that:

* Use HMM: CPU VA==GPU VA. This nicely solves the problem, but falls 
over badly when the GPU VA size is smaller than the user space VA size - 
which is sadly true on many 64 bit integrations.

* Provide an allocation flag which causes the kernel driver to not pick 
a GPU address until the buffer is mapped on the CPU. The mmap callback 
would then need to look for a region that is free on both the CPU and GPU.

The second is obviously most similar to the kbase approach. kbase 
simplifies things because the kernel driver has the ultimate say over 
whether the buffer is SAME_VA or not. So on 64 bit user space we upgrade 
everything to be SAME_VA - which means the GPU VA space just follows the 
CPU VA (similar to HMM).

Steve