[RFC PATCH] drm/panfrost: Add support for mapping BOs on GPU page faults

Thu Jun 27 10:57:07 UTC 2019

Sorry for the slow response, I've been on holiday for a few weeks.

On 20/06/2019 06:50, Tomeu Vizoso wrote:
> On Mon, 17 Jun 2019 at 16:56, Rob Herring <robh at kernel.org> wrote:
>>
>> On Sun, Jun 16, 2019 at 11:15 PM Tomeu Vizoso
>> <tomeu.vizoso at collabora.com> wrote:
>>>
>>> On Fri, 14 Jun 2019 at 23:22, Rob Herring <robh at kernel.org> wrote:
>>>>
>>>> On Wed, Jun 12, 2019 at 6:55 AM Tomeu Vizoso <tomeu at tomeuvizoso.net> wrote:
>>>>>
>>>>> On Mon, 10 Jun 2019 at 19:06, Rob Herring <robh at kernel.org> wrote:
>>>>>>
>>>>>> The midgard/bifrost GPUs need to allocate GPU memory which is allocated
>>>>>> on GPU page faults and not pinned in memory. The vendor driver calls
>>>>>> this functionality GROW_ON_GPF.
>>>>>>
>>>>>> This implementation assumes that BOs allocated with the
>>>>>> PANFROST_BO_NOMAP flag are never mmapped or exported. Both of those may
>>>>>> actually work, but I'm unsure if there's some interaction there. It
>>>>>> would cause the whole object to be pinned in memory which would defeat
>>>>>> the point of this.

Although in normal usage user space will never care about the contents
of growable memory it can be useful to be able to access it for
debugging (although not critical to have it working immediately). In
particular it allow submitting the jobs in a job chain separately.
Exporting I can't see a use-case for.

So personally I'd prefer not using a "NOMAP" flag to mean "grow on fault".

>>>>>> Issues/questions/thoughts:
>>>>>>
>>>>>> What's the difference between i_mapping and f_mapping?
>>>>>>
>>>>>> What kind of clean-up on close is needed? Based on vgem faults, there
>>>>>> doesn't seem to be any refcounting. Assume userspace is responsible for
>>>>>> not freeing the BO while a page fault can occur?
>>>>>
>>>>> Aren't we taking a reference on all BOs that a job relates to and
>>>>> unreferencing them once the job is done? I would think that that's
>>>>> enough, or am I missing something?
>>>>
>>>> No, I think we're fine.
>>>>
>>>>>> What about evictions? Need to call mapping_set_unevictable()? Maybe we
>>>>>> want these pages to be swappable, but then we need some notification to
>>>>>> unmap them.
>>>>>
>>>>> I'm not sure there's much point in swapping out pages with lifetimes
>>>>> of a few milliseconds.
>>>>
>>>> The lifetime is *forever* though. If we don't allow swapping, then the
>>>> heap is grow only until the FD is closed. IIRC, the maximum size is on
>>>> the order of 1GB. Seems like you'd want to shrink it with some
>>>> trigger.
>>>
>>> I thought that the lifetime of the *contents* of the heap was that of
>>> the job chain that wrote them? Otherwise, only the GPU would know what
>>> can be discarded.
>>
>> Yes, that's probably true. To take that to the extreme, we could
>> allocate and free the heap BO on each job chain. But we don't do that
>> because of the overhead. So mapping and unmapping is a similar trade
>> off of frequency vs. overhead. The question is when do we allow pages
>> to be swapped out (as that is unhandled in the current patch)? Our
>> choices are:
>>
>> - at any time. This is what the patch currently does as we don't
>> prevent eviction. Though we'd need some mechanism to be notified when
>> a page is evicted which is currently missing.
>> - when a job finishes. We'd have to iterate thru BO's and mark pages
>> evict-able on NOMAP BOs. Not sure where we do that in the driver.
> 
> My understanding is that any contents of NOMAP memory aren't expected
> to persist across jobs. So, when a job finishes we can unmap all pages
> that the job faulted on.
> 
> If we do that, then we won't have a strong need to allow NOMAP pages
> to be swapped out because jobs aren't expected to take that long.

It would certainly seem reasonable that the contents of NOMAP memory can
be thrown away when the job chain has been completed. But, there is a
potential performance improvement by not immediately unmapping/freeing
the memory but leaving it in the assumption a similar job will be
submitted later requiring roughly the same amount of memory.

Arm's blob/kernel have various mechanisms for freeing memory either
after a period of being idle (in the blob) or when a shrinker is called
(in kbase). The idea is that the heap memory is grown once to whatever
the content needs and then the same buffer (or small set of buffers) is
reused repeatedly. kbase has a mechanism called "ephemeral memory" (or
evictable) which is memory which normally remains mapped on the GPU, but
under memory pressure it can be freed (and later faulted in with empty
pages if accessed again). A pinning mechanism is used to ensure that
this doesn't happen in the middle of a job chain which uses the buffer.
This mechanism is referred to as "JIT" (Just In Time allocation) in places.

> I could very well be missing something that is needed by Arm's blob
> and not by Panfrost atm, but I don't see in kbase any mechanism for
> the kernel to know when the GPU is done with a page, other than the
> job that mapped it having finished.

Much of the memory management is done by the user space blob. The kernel
driver usually doesn't actually know what memory a job will access.
There are exceptions though, in particular: ephemeral memory (through
JIT) and imported memory.

Steve

> Btw, I tested this patch locally and things seemed to work just fine.
> 
> Thanks,
> 
> Tomeu
> 
>> - when the BO is freed. This is the easiest to implement...
>>
>> Rob
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> _______________________________________________
> dri-devel mailing list
> dri-devel at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>