vcn regression on raven1

Zhang, Jerry (Junwei) Jerry.Zhang at amd.com
Thu May 3 00:42:23 UTC 2018


Hi Tom,

Thanks for your update. That's good news.
If necessary, please also send out your patch to improve the functionality.
Thanks.

Jerry

On 05/02/2018 06:21 PM, Tom St Denis wrote:
> Hi Jerry,
>
> Just got up and going (6am ... ugh early).  I see the confusion.  Yes there is a
> patch on drm-next but the problem is there is a table for both decode and
> encode.  That patch that is already on drm-next only adds the callback for encode.
>
> My patch adds the callback for decode as well.  :-)
>
> Cheers,
> Tom
>
>
>
> On 05/01/2018 09:44 PM, Zhang, Jerry (Junwei) wrote:
>> Hi Tom,
>>
>> Ha, got your meaning.
>> Please check it with the latest drm-next from gerrit tomorrow.
>>
>> Jerry
>>
>> On 05/02/2018 09:41 AM, StDenis, Tom wrote:
>>> Hi Jerry,
>>>
>>> Like I said it's (now well) past EOD (meaning my workstation is powered off)
>>> so I'll have to check tomorrow.  But I do pull from gerrit daily and build
>>> from that.
>>>
>>> I'll take a look in the morning.
>>>
>>> Cheers,
>>> Tom
>>> ________________________________________
>>> From: Zhang, Jerry
>>> Sent: Tuesday, May 1, 2018 21:39
>>> To: StDenis, Tom; Deucher, Alexander
>>> Cc: Koenig, Christian; amd-gfx at lists.freedesktop.org
>>> Subject: Re: vcn regression on raven1
>>>
>>> Hi Tom,
>>>
>>> Do you mean you cannot find the patch from gerrit/amd-staging-dkms-next either?
>>>
>>> I do find it.
>>>
>>> the tip of gerrit/amd-staging-drm-next is
>>>     * bb54e82 2018-04-30 12:17:07 -0400 drm/amdgpu: Switch to interruptable wait
>>> to recover from ring hang. <Andrey Grodzovsky>
>>>
>>> while the tip of freedesktop is
>>>     * a11008c 2018-04-25 20:32:05 -0500 drm/powerplay: Add powertune table for
>>> VEGAM <Eric Huang>
>>>
>>> Jerry
>>>
>>> On 05/02/2018 09:29 AM, StDenis, Tom wrote:
>>>> I pull from gerrit.  I'm just pointing out that it's not on drm-next
>>>> upstream either.
>>>>
>>>> It may have been missed in a rebase or something.
>>>>
>>>> Tom
>>>> ________________________________________
>>>> From: Zhang, Jerry
>>>> Sent: Tuesday, May 1, 2018 21:07
>>>> To: StDenis, Tom; Deucher, Alexander
>>>> Cc: Koenig, Christian; amd-gfx at lists.freedesktop.org
>>>> Subject: Re: vcn regression on raven1
>>>>
>>>> Hi Tom,
>>>>
>>>> Sound you get the code from freedesktop rather than the internal drm-next.
>>>> Unfortunately freedesktop looks delay to sync the code from internal drm-next.
>>>> That's the gap it happened as issue in the test.
>>>>
>>>> Hi Alex,
>>>>
>>>> Is that a issue for code syncing between freedesktop and internal drm-next?
>>>> Or it's a known issue of delay syncing code.
>>>>
>>>> Jerry
>>>>
>>>> On 05/02/2018 08:57 AM, StDenis, Tom wrote:
>>>>> Hi Jerry,
>>>>>
>>>>> It's well past EOD for me I'll pick this up in the morning.
>>>>>
>>>>> I'm fairly certain I wrote my patch against the tip of amd-staging-drm-next
>>>>> as of my pull this morning though.
>>>>>
>>>>> If it's in there and I missed it somehow I apologize otherwise it'd be nice
>>>>> to make sure it's in there.
>>>>>
>>>>> Based on the public copy of the tree it's not there
>>>>>
>>>>> https://cgit.freedesktop.org/~agd5f/linux/tree/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c?h=amd-staging-drm-next#n1110
>>>>>
>>>>>
>>>>> Cheers,
>>>>> Tom
>>>>> ________________________________________
>>>>> From: Zhang, Jerry
>>>>> Sent: Tuesday, May 1, 2018 20:52
>>>>> To: StDenis, Tom; Deucher, Alexander
>>>>> Cc: Koenig, Christian; amd-gfx at lists.freedesktop.org
>>>>> Subject: Re: vcn regression on raven1
>>>>>
>>>>> Hi Tom,
>>>>>
>>>>> It was landed in the latest drm-next, like
>>>>>       * 964933a 2018-04-27 10:26:09 +0800 drm/amdgpu/uvd7: add
>>>>> emit_reg_write_reg_wait ring callback <Xiaojie Yuan>
>>>>>
>>>>> Did you test with that included?
>>>>> Please try to get the latest drm-next, if not.
>>>>> They look the same issue from the log.
>>>>>
>>>>> Jerry
>>>>>
>>>>> On 05/02/2018 08:47 AM, StDenis, Tom wrote:
>>>>>> Hi Jerry,
>>>>>>
>>>>>> So far as I know this wasn't included on the tip of drm-next.  I hit this
>>>>>> this morning in my semi-regular pull/build/test cycle.
>>>>>>
>>>>>> Was this missed in a recent rebase?
>>>>>>
>>>>>> Tom
>>>>>> ________________________________________
>>>>>> From: Zhang, Jerry
>>>>>> Sent: Tuesday, May 1, 2018 20:43
>>>>>> To: StDenis, Tom; Deucher, Alexander
>>>>>> Cc: Koenig, Christian; amd-gfx at lists.freedesktop.org
>>>>>> Subject: Re: vcn regression on raven1
>>>>>>
>>>>>> On 05/01/2018 09:34 PM, Tom St Denis wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I've noticed that on the tip of drm-next vcn playback of video is broken
>>>>>>> (see
>>>>>>> dmesg below).  I've bisected it to this commit
>>>>>>
>>>>>> It may be fixed here as a common issue.
>>>>>>
>>>>>>        * https://patchwork.freedesktop.org/patch/218909/
>>>>>>
>>>>>> Jerry
>>>>>>
>>>>>>>
>>>>>>> [root at raven linux]# git bisect good
>>>>>>> 701372349fd55b5396b335580e979ac4dde3dd02 is the first bad commit
>>>>>>> commit 701372349fd55b5396b335580e979ac4dde3dd02
>>>>>>> Author: Alex Deucher <alexander.deucher at amd.com>
>>>>>>> Date:   Tue Mar 27 17:10:56 2018 -0500
>>>>>>>
>>>>>>>          drm/amdgpu/gmc9: use amdgpu_ring_emit_reg_write_reg_wait in gpu
>>>>>>> tlb flush
>>>>>>>
>>>>>>>          Use amdgpu_ring_emit_reg_write_reg_wait.  On engines that
>>>>>>> support it,
>>>>>>>          it provides a write and wait in a single packet which avoids a
>>>>>>> missed
>>>>>>>          ack if a world switch happens between the request and waiting
>>>>>>> for the
>>>>>>>          ack.
>>>>>>>
>>>>>>>          Reviewed-by: Huang Rui <ray.huang at amd.com>
>>>>>>>          Reviewed-by: Christian König <christian.koenig at amd.com>
>>>>>>>          Signed-off-by: Alex Deucher <alexander.deucher at amd.com>
>>>>>>>
>>>>>>> :040000 040000 4e4312de03f4b34abd65f4bb12dba4c7093055ba
>>>>>>> ccc4abc78c0b6f24328fd998f998fa06bf0618b1 M      drivers
>>>>>>>
>>>>>>> Which is odd because the commit before this is the vcn change and it
>>>>>>> works fine
>>>>>>> (playing BBB right now).
>>>>>>>
>>>>>>> Here's the dmesg:
>>>>>>>
>>>>>>> [ 2925.640102] BUG: unable to handle kernel NULL pointer dereference at
>>>>>>> 0000000000000000
>>>>>>> [ 2925.640113] IP:           (null)
>>>>>>> [ 2925.640116] PGD 0 P4D 0
>>>>>>> [ 2925.640122] Oops: 0010 [#1] SMP KASAN NOPTI
>>>>>>> [ 2925.640126] Modules linked in: tun fuse amdkfd amdgpu mfd_core chash
>>>>>>> gpu_sched ttm ax88179_178a usbnet
>>>>>>> [ 2925.640139] CPU: 4 PID: 3791 Comm: vcn_dec Not tainted 4.16.0-rc7+ #20
>>>>>>> [ 2925.640142] Hardware name: System manufacturer System Product Name/TUF
>>>>>>> B350M-PLUS GAMING, BIOS 3803 01/22/2018
>>>>>>> [ 2925.640146] RIP: 0010:          (null)
>>>>>>> [ 2925.640148] RSP: 0018:ffff8801d54f7790 EFLAGS: 00010206
>>>>>>> [ 2925.640153] RAX: 0000000000000000 RBX: ffff8801d8b38420 RCX:
>>>>>>> 00000000007c0080
>>>>>>> [ 2925.640156] RDX: 000000000001a6fa RSI: 000000000001a6e8 RDI:
>>>>>>> ffff8801d8b38420
>>>>>>> [ 2925.640159] RBP: 000000000001a6fa R08: 0000000000000080 R09:
>>>>>>> ffffed003aa9eef9
>>>>>>> [ 2925.640162] R10: 0000000009c74f08 R11: fffffbfff0f5d1e7 R12:
>>>>>>> ffff8801d8b3277c
>>>>>>> [ 2925.640164] R13: ffff8801d8b3001c R14: 0000000000000005 R15:
>>>>>>> 0000000000000000
>>>>>>> [ 2925.640168] FS:  0000000000000000(0000) GS:ffff8801dcf00000(0000)
>>>>>>> knlGS:0000000000000000
>>>>>>> [ 2925.640171] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>> [ 2925.640174] CR2: 0000000000000000 CR3: 00000001d9712000 CR4:
>>>>>>> 00000000003406e0
>>>>>>> [ 2925.640176] Call Trace:
>>>>>>> [ 2925.640272]  ? gmc_v9_0_emit_flush_gpu_tlb+0x260/0x2a0 [amdgpu]
>>>>>>> [ 2925.640368]  ? vcn_v1_0_dec_ring_insert_start+0x360/0x360 [amdgpu]
>>>>>>> [ 2925.640459]  ? mmhub_v1_0_get_clockgating+0xc0/0xc0 [amdgpu]
>>>>>>> [ 2925.640545]  ? amdgpu_vmid_had_gpu_reset+0x89/0xc0 [amdgpu]
>>>>>>> [ 2925.640640]  ? vcn_v1_0_dec_ring_emit_vm_flush+0x64/0xb0 [amdgpu]
>>>>>>> [ 2925.640725]  ? amdgpu_vm_flush+0xb43/0xcc0 [amdgpu]
>>>>>>> [ 2925.640810]  ? amdgpu_vm_need_pipeline_sync+0x260/0x260 [amdgpu]
>>>>>>> [ 2925.640897]  ? amdgpu_vmid_had_gpu_reset+0xc0/0xc0 [amdgpu]
>>>>>>> [ 2925.641003]  ? vcn_v1_0_dec_ring_insert_start+0x2d7/0x360 [amdgpu]
>>>>>>> [ 2925.641095]  ? amdgpu_ib_schedule+0x1b5/0x800 [amdgpu]
>>>>>>> [ 2925.641102]  ? dma_fence_add_callback+0x15f/0x360
>>>>>>> [ 2925.641201]  ? amdgpu_job_run+0x32f/0x370 [amdgpu]
>>>>>>> [ 2925.641297]  ? amdgpu_job_free_resources+0xd0/0xd0 [amdgpu]
>>>>>>> [ 2925.641302]  ? __queue_delayed_work+0x144/0x1d0
>>>>>>> [ 2925.641306]  ? delayed_work_timer_fn+0x40/0x40
>>>>>>> [ 2925.641312]  ? prepare_to_wait_exclusive+0x1d0/0x1d0
>>>>>>> [ 2925.641318]  ? drm_sched_main+0x68c/0x940 [gpu_sched]
>>>>>>> [ 2925.641323]  ? drm_sched_entity_fini+0x60/0x60 [gpu_sched]
>>>>>>> [ 2925.641328]  ? save_stack+0x89/0xb0
>>>>>>> [ 2925.641332]  ? wait_woken+0x110/0x110
>>>>>>> [ 2925.641337]  ? ret_from_fork+0x22/0x40
>>>>>>> [ 2925.641343]  ? __schedule+0xd30/0xd30
>>>>>>> [ 2925.641346]  ? remove_wait_queue+0x150/0x150
>>>>>>> [ 2925.641353]  ? rcu_note_context_switch+0x2a0/0x2a0
>>>>>>> [ 2925.641359]  ? __lock_text_start+0x8/0x8
>>>>>>> [ 2925.641367]  ? drm_sched_entity_fini+0x60/0x60 [gpu_sched]
>>>>>>> [ 2925.641371]  ? kthread+0x19b/0x1c0
>>>>>>> [ 2925.641376]  ? kthread_create_worker_on_cpu+0xc0/0xc0
>>>>>>> [ 2925.641382]  ? ret_from_fork+0x22/0x40
>>>>>>> [ 2925.641387] Code:  Bad RIP value.
>>>>>>> [ 2925.641397] RIP:           (null) RSP: ffff8801d54f7790
>>>>>>> [ 2925.641400] CR2: 0000000000000000
>>>>>>> [ 2925.641405] ---[ end trace 0684cc0468f60fb1 ]---
>>>>>>>
>>>>>>>
>>>>>>> Note that regular compute/gfx workflows work fine on the tip of drm-next
>>>>>>> only
>>>>>>> vcn playback triggeers this (haven't tried encode yet...).
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Tom
>>>>>>> _______________________________________________
>>>>>>> amd-gfx mailing list
>>>>>>> amd-gfx at lists.freedesktop.org
>>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


More information about the amd-gfx mailing list