Change queue/pipe split between amdkfd and amdgpu

Andres Rodriguez andresx7 at gmail.com
Thu Feb 9 20:38:09 UTC 2017


Hey Oded,

Sorry to be a nuisance, but if you have everything still setup could you 
give this fix a quick go?

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 5321d18..9f70ee0 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -667,7 +667,7 @@ static int set_sched_resources(struct 
device_queue_manager *dqm)
                 /* This situation may be hit in the future if a new HW
                  * generation exposes more than 64 queues. If so, the
                  * definition of res.queue_mask needs updating */
-               if (WARN_ON(i > sizeof(res.queue_mask))) {
+               if (WARN_ON(i > (sizeof(res.queue_mask)*8))) {
                         pr_err("Invalid queue enabled by amdgpu: %d\n", i);
                         break;
                 }

John/Felix,

Any chance I could borrow a carrizo/kaveri for a few days? Or maybe you 
could help me run some final tests on this patch series?

- Andres


On 2017-02-09 03:11 PM, Oded Gabbay wrote:
>   Andres,
>
> I tried your patches on Kaveri with airlied's drm-next branch.
> I used radeon+amdkfd
>
> The following test failed: KFDQMTest.CreateMultipleCpQueues
> However, I can't debug it because I don't have the sources of kfdtest.
>
> In dmesg, I saw the following warning during boot:
> WARNING: CPU: 0 PID: 150 at
> drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c:670
> start_cpsch+0xc5/0x220 [amdkfd]
> [    4.393796] Modules linked in: hid_logitech_hidpp hid_logitech_dj
> hid_generic usbhid hid uas usb_storage amdkfd amd_iommu_v2 radeon(+)
> i2c_algo_bit ttm drm_kms_helper syscopyarea ahci sysfillrect sysimgblt
> libahci fb_sys_fops drm r8169 mii fjes video
> [    4.393811] CPU: 0 PID: 150 Comm: systemd-udevd Not tainted 4.10.0-rc5+ #1
> [    4.393811] Hardware name: Gigabyte Technology Co., Ltd. To be
> filled by O.E.M./F2A88XM-D3H, BIOS F5 01/09/2014
> [    4.393812] Call Trace:
> [    4.393818]  dump_stack+0x63/0x90
> [    4.393822]  __warn+0xcb/0xf0
> [    4.393823]  warn_slowpath_null+0x1d/0x20
> [    4.393830]  start_cpsch+0xc5/0x220 [amdkfd]
> [    4.393836]  ? initialize_cpsch+0xa0/0xb0 [amdkfd]
> [    4.393841]  kgd2kfd_device_init+0x375/0x490 [amdkfd]
> [    4.393883]  radeon_kfd_device_init+0xaf/0xd0 [radeon]
> [    4.393911]  radeon_driver_load_kms+0x11e/0x1f0 [radeon]
> [    4.393933]  drm_dev_register+0x14a/0x200 [drm]
> [    4.393946]  drm_get_pci_dev+0x9d/0x160 [drm]
> [    4.393974]  radeon_pci_probe+0xb8/0xe0 [radeon]
> [    4.393976]  local_pci_probe+0x45/0xa0
> [    4.393978]  pci_device_probe+0x103/0x150
> [    4.393981]  driver_probe_device+0x2bf/0x460
> [    4.393982]  __driver_attach+0xdf/0xf0
> [    4.393984]  ? driver_probe_device+0x460/0x460
> [    4.393985]  bus_for_each_dev+0x6c/0xc0
> [    4.393987]  driver_attach+0x1e/0x20
> [    4.393988]  bus_add_driver+0x1fd/0x270
> [    4.393989]  ? 0xffffffffc05c8000
> [    4.393991]  driver_register+0x60/0xe0
> [    4.393992]  ? 0xffffffffc05c8000
> [    4.393993]  __pci_register_driver+0x4c/0x50
> [    4.394007]  drm_pci_init+0xeb/0x100 [drm]
> [    4.394008]  ? 0xffffffffc05c8000
> [    4.394031]  radeon_init+0x98/0xb6 [radeon]
> [    4.394034]  do_one_initcall+0x53/0x1a0
> [    4.394037]  ? __vunmap+0x81/0xd0
> [    4.394039]  ? kmem_cache_alloc_trace+0x152/0x1c0
> [    4.394041]  ? vfree+0x2e/0x70
> [    4.394044]  do_init_module+0x5f/0x1ff
> [    4.394046]  load_module+0x24cc/0x29f0
> [    4.394047]  ? __symbol_put+0x60/0x60
> [    4.394050]  ? security_kernel_post_read_file+0x6b/0x80
> [    4.394052]  SYSC_finit_module+0xdf/0x110
> [    4.394054]  SyS_finit_module+0xe/0x10
> [    4.394056]  entry_SYSCALL_64_fastpath+0x1e/0xad
> [    4.394058] RIP: 0033:0x7f9cda77c8e9
> [    4.394059] RSP: 002b:00007ffe195d3378 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000139
> [    4.394060] RAX: ffffffffffffffda RBX: 00007f9cdb8dda7e RCX: 00007f9cda77c8e9
> [    4.394061] RDX: 0000000000000000 RSI: 00007f9cdac7ce2a RDI: 0000000000000013
> [    4.394062] RBP: 00007ffe195d2450 R08: 0000000000000000 R09: 0000000000000000
> [    4.394063] R10: 0000000000000013 R11: 0000000000000246 R12: 00007ffe195d245a
> [    4.394063] R13: 00007ffe195d1378 R14: 0000563f70cc93b0 R15: 0000563f70cba4d0
> [    4.394091] ---[ end trace 9c5af17304d998bb ]---
> [    4.394092] Invalid queue enabled by amdgpu: 9
>
> I suggest you get a Kaveri/Carrizo machine to debug these issues.
>
> Until that, I don't think we should merge this patch-set.
>
> Oded
>
> On Wed, Feb 8, 2017 at 9:47 PM, Andres Rodriguez <andresx7 at gmail.com> wrote:
>> Thank you Oded.
>>
>> - Andres
>>
>>
>> On 2017-02-08 02:32 PM, Oded Gabbay wrote:
>>> On Wed, Feb 8, 2017 at 6:23 PM, Andres Rodriguez <andresx7 at gmail.com>
>>> wrote:
>>>> Hey Felix,
>>>>
>>>> Thanks for the pointer to the ROCm mqd commit. I like that the
>>>> workarounds
>>>> are easy to spot. I'll add that to a new patch series I'm working on for
>>>> some bug-fixes for perf being lower on pipes other than pipe 0.
>>>>
>>>> I haven't tested this yet on kaveri/carrizo. I'm hoping someone with the
>>>> HW
>>>> will be able to give it a go. I put in a few small hacks to get KFD to
>>>> boot
>>>> but do nothing on polaris10.
>>>>
>>>> Regards,
>>>> Andres
>>>>
>>>>
>>>> On 2017-02-06 03:20 PM, Felix Kuehling wrote:
>>>>> Hi Andres,
>>>>>
>>>>> Thank you for tackling this task. It's more involved than I expected,
>>>>> mostly because I didn't have much awareness of the MQD management in
>>>>> amdgpu.
>>>>>
>>>>> I made one comment in a separate message about the unified MQD commit
>>>>> function, if you want to bring that more in line with our latest ROCm
>>>>> release on github.
>>>>>
>>>>> Also, were you able to test the upstream KFD with your changes on a
>>>>> Kaveri or Carrizo?
>>>>>
>>>>> Regards,
>>>>>     Felix
>>>>>
>>>>>
>>>>> On 17-02-03 11:51 PM, Andres Rodriguez wrote:
>>>>>> The current queue/pipe split policy is for amdgpu to take the first
>>>>>> pipe
>>>>>> of
>>>>>> MEC0 and leave the rest for amdkfd to use. This policy is taken as an
>>>>>> assumption in a few areas of the implementation.
>>>>>>
>>>>>> This patch series aims to allow for flexible/tunable queue/pipe split
>>>>>> policies
>>>>>> between kgd and kfd. It also updates the queue/pipe split policy to one
>>>>>> that
>>>>>> allows better compute app concurrency for both drivers.
>>>>>>
>>>>>> In the process some duplicate code and hardcoded constants were
>>>>>> removed.
>>>>>>
>>>>>> Any suggestions or feedback on improvements welcome.
>>>>>>
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx at lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>> Hi Andres,
>>> I will try to find sometime to test it on my Kaveri machine.
>>>
>>> Oded
>>



More information about the amd-gfx mailing list