[PATCH 1/1] drm/amdkfd: Do not ignore requested queue size during allocation

Felix Kuehling felix.kuehling at amd.com
Wed Nov 29 21:58:14 UTC 2017


You can see the state of the queues in debugfs:
/sys/kernel/debug/kfd/... You can look at MQDs and HQDs.

If your application isn't stopping queues deliberately, queues get
disabled by evictions, usually temporarily. You'll see kernel messages
when that happens.

A VM fault will result in queues of the offending process getting
disabled permanently. Again, you'll see messages about that in the
kernel log.

The RPTR can also stop advancing if you have an infinite loop in a
shader program, or just a shader that takes a very long time to execute.
Or maybe if you have some dependencies (barriers) in your AQL packets
that never get satisfied.

The function you changed only affects the HIQ, the queue that KFD uses
to control the HWS. It does not affect user mode queues. If your problem
is with a user mode queue, your change should have no effect at all.

Regards,
  Felix


On 2017-11-29 04:43 PM, Jan Vesely wrote:
> On Mon, 2017-11-20 at 14:22 -0500, Felix Kuehling wrote:
>> I think this patch is not correct. The EOP-mem is not associated with
>> the queue size. The EOP buffer is a separate buffer used by the firmware
>> to handle command completion. As I understand it, this allows more
>> concurrency, while still making it look like all commands in the queue
>> are completing in order.
> thanks for the explanation. I was looking for a source of a CP hang
> (rptr stops advancing), but bumping the eop size actually mode things
> worse. Is there a way to find out if a queue got disabled and for what
> reason? (I'm running ROCK-1.6.x based kernel)
>
> thanks,
> Jan
>
>> Regards,
>>   Felix
>>
>>
>> On 2017-11-19 03:19 AM, Oded Gabbay wrote:
>>> On Thu, Nov 16, 2017 at 11:36 PM, Jan Vesely <jan.vesely at rutgers.edu> wrote:
>>>> Signed-off-by: Jan Vesely <jan.vesely at rutgers.edu>
>>>> ---
>>>>  drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c | 5 +++--
>>>>  1 file changed, 3 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c
>>>> index f1d48281e322..b3bee39661ab 100644
>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c
>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c
>>>> @@ -37,15 +37,16 @@ static bool initialize_vi(struct kernel_queue *kq, struct kfd_dev *dev,
>>>>                         enum kfd_queue_type type, unsigned int queue_size)
>>>>  {
>>>>         int retval;
>>>> +       unsigned int size = ALIGN(queue_size, PAGE_SIZE);
>>>>
>>>> -       retval = kfd_gtt_sa_allocate(dev, PAGE_SIZE, &kq->eop_mem);
>>>> +       retval = kfd_gtt_sa_allocate(dev, size, &kq->eop_mem);
>>>>         if (retval != 0)
>>>>                 return false;
>>>>
>>>>         kq->eop_gpu_addr = kq->eop_mem->gpu_addr;
>>>>         kq->eop_kernel_addr = kq->eop_mem->cpu_ptr;
>>>>
>>>> -       memset(kq->eop_kernel_addr, 0, PAGE_SIZE);
>>>> +       memset(kq->eop_kernel_addr, 0, size);
>>>>
>>>>         return true;
>>>>  }
>>>> --
>>>> 2.13.6
>>>>
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx at lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>> Thanks!
>>> Applied to -next tree
>>> Oded
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx at lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>



More information about the amd-gfx mailing list