[PATCH 14/25] drm/amdkfd: Populate DRM render device minor

Tue Feb 13 18:45:10 UTC 2018

Am 13.02.2018 um 17:56 schrieb Felix Kuehling:
> [SNIP]
> Each process gets a whole page of the doorbell aperture assigned to it.
> The assumption is that amdgpu only uses the first page of the doorbell
> aperture, so KFD uses all the rest. On GFX8 and before, the queue ID is
> used as the offset into the doorbell page. On GFX9 the hardware does
> some engine-specific doorbell routing, so we added another layer of
> doorbell management that's decoupled from the queue ID.
>
> Either way, an entire doorbell page gets mapped into user mode and user
> mode knows the offset of the doorbells for specific queues. The mapping
> is currently handled by kfd_mmap in kfd_chardev.c.

Ok, wait a second. Taking a look at kfd_doorbell_mmap() it almost looks 
like you map different doorbells with the same offset depending on which 
process is calling this.

Is that correct? If yes then that would be illegal and a problem if I'm 
not completely mistaken.

>> Do you simply assume that after evicting a process it always needs to
>> be restarted without checking if it actually does something? Or how
>> does that work?
> Exactly.

Ok, understood. Well that limits the usefulness of the whole eviction 
drastically.

> With later addition of GPU self-dispatch a page-fault based
> mechanism wouldn't work any more. We have to restart the queues blindly
> with a timer. See evict_process_worker, which schedules the restore with
> a delayed worker.
> which was send either by the GPU o
> The user mode queue ABI specifies that user mode update both the
> doorbell and a WPTR in memory. When we restart queues we (or the CP
> firmware) use the WPTR to make sure we catch up with any work that was
> submitted while the queues were unmapped.

Putting cross process work dispatch aside for a moment GPU self-dispatch 
works only when there is work on the GPU running.

So you can still check if there are some work pending after you unmapped 
everything and only restart the queues when there is new work based on 
the page fault.

In other words either there is work pending and it doesn't matter if it 
was send by the GPU or by the CPU or there is no work pending and we can 
delay restarting everything until there is.

Regards,
Christian.

>
> Regards,
>    Felix
>
>> Regards,
>> Christian.