[PATCH 14/25] drm/amdkfd: Populate DRM render device minor

Wed Feb 14 07:49:05 UTC 2018

Am 13.02.2018 um 20:22 schrieb Felix Kuehling:
> On 2018-02-13 01:45 PM, Christian König wrote:
>> Am 13.02.2018 um 17:56 schrieb Felix Kuehling:
>>> [SNIP]
>>> Each process gets a whole page of the doorbell aperture assigned to it.
>>> The assumption is that amdgpu only uses the first page of the doorbell
>>> aperture, so KFD uses all the rest. On GFX8 and before, the queue ID is
>>> used as the offset into the doorbell page. On GFX9 the hardware does
>>> some engine-specific doorbell routing, so we added another layer of
>>> doorbell management that's decoupled from the queue ID.
>>>
>>> Either way, an entire doorbell page gets mapped into user mode and user
>>> mode knows the offset of the doorbells for specific queues. The mapping
>>> is currently handled by kfd_mmap in kfd_chardev.c.
>> Ok, wait a second. Taking a look at kfd_doorbell_mmap() it almost
>> looks like you map different doorbells with the same offset depending
>> on which process is calling this.
>>
>> Is that correct? If yes then that would be illegal and a problem if
>> I'm not completely mistaken.
> Why is that a problem. Each process has its own file descriptor. The
> mapping is done using io_remap_pfn_range in kfd_doorbell_mmap.

Yeah, but all share the same file address space from the inode, doesn't 
they?

E.g. imagine that you map an offset from a normal file into process A 
and the same offset from the same file into process B, what do you get? 
Exactly, the same page mapped into both processes.

Now take a look at the KFD interface, you map the same offset from the 
same file (or rather inode) into two different processes and get two 
different things.

> This is nothing new. It's been done like this forever even on Kaveri and Carrizo.

It most likely doesn't cause any problems of hand or otherwise we would 
have noticed already, but at bare minimum that totally confuses the 
reverse mapping code.

Need to think about it if that is really an issue or not and if yes how 
to fix/mitigate it.

Regards,
Christian.

>
>>>> Do you simply assume that after evicting a process it always needs to
>>>> be restarted without checking if it actually does something? Or how
>>>> does that work?
>>> Exactly.
>> Ok, understood. Well that limits the usefulness of the whole eviction
>> drastically.
>>
>>> With later addition of GPU self-dispatch a page-fault based
>>> mechanism wouldn't work any more. We have to restart the queues blindly
>>> with a timer. See evict_process_worker, which schedules the restore with
>>> a delayed worker.
>>> which was send either by the GPU o
>>> The user mode queue ABI specifies that user mode update both the
>>> doorbell and a WPTR in memory. When we restart queues we (or the CP
>>> firmware) use the WPTR to make sure we catch up with any work that was
>>> submitted while the queues were unmapped.
>> Putting cross process work dispatch aside for a moment GPU
>> self-dispatch works only when there is work on the GPU running.
>>
>> So you can still check if there are some work pending after you
>> unmapped everything and only restart the queues when there is new work
>> based on the page fault.
>>
>> In other words either there is work pending and it doesn't matter if
>> it was send by the GPU or by the CPU or there is no work pending and
>> we can delay restarting everything until there is.
> That sounds like a useful optimization.
>
> Regards,
>    Felix
>
>> Regards,
>> Christian.
>>
>>> Regards,
>>>     Felix
>>>
>>>> Regards,
>>>> Christian.
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx