[Mesa-dev] [PATCH v2 0/8] The 2nd version for UVD HEVC encode

Mark Thompson sw at jkqxz.net
Tue Feb 13 22:03:30 UTC 2018


On 13/02/18 16:38, James Zhu wrote:
> Hi Mark,
> 
> Did you still encounter hung issue?
> 
> If yes, could you share me with your play and transcode streams and command line,
> then I can try to reproduce at my side.
> 
> Thanks & Best Regards!
> 
> James Zhu

Yes, it does still happen with the latest patches and vanila kernel 4.15.2, on an RX 460 / Polaris 11.


To reproduce:

Take a normal 1080p H.264 input file (I tried a few different ones and it didn't change anyway, if you want something exactly the same then the usual Big Buck Bunny video was among those tested).

Use the GPU to play back the video with mpv in a normal X session running on the AMD card (I'm running this via ssh in an otherwise-empty X instance):

mpv --fs --loop --no-audio --vo gpu --gpu-context=x11egl --hwdec=vaapi bbb_1080_264.mp4

Then transcode it to H.265 on the same device at the same time:

ffmpeg -y -hwaccel vaapi -hwaccel_device /dev/dri/renderD129 -hwaccel_output_format vaapi -i bbb_1080_264.mp4 -an -c:v hevc_vaapi -bf 0 out.mp4

and the GPU locks up completely very quickly (within a few seconds / a few hundred frames of starting).

That leaves unkillable zombie processes of everything which was touching the GPU at the time it died:

$ ps aux | grep [d]efunct
root      6994  0.4  0.0      0     0 ?        Zsl  20:43   0:22 [Xorg] <defunct>
mrt      20601  0.3  0.0      0     0 ?        Zl   21:50   0:02 [mpv] <defunct>
mrt      20630  0.0  0.0      0     0 ?        Zl   21:51   0:00 [ffmpeg_g] <defunct>


To compare, encoding H.264 instead of H.265 at the same time with:

ffmpeg -y -hwaccel vaapi -hwaccel_device /dev/dri/renderD129 -hwaccel_output_format vaapi -i bbb_1080_264.mp4 -an -c:v h264_vaapi -profile constrained_baseline -bf 0 out.mp4

does not fail.


Thanks,

- Mark



Kernel messages:

[279612.955929] INFO: task kworker/u24:3:20617 blocked for more than 120 seconds.
[279612.955936]       Not tainted 4.15.2 #2
[279612.955939] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[279612.955943] kworker/u24:3   D    0 20617      2 0x80000000
[279612.955957] Workqueue: events_unbound commit_work
[279612.955961] Call Trace:
[279612.955975]  ? __schedule+0x26b/0x840
[279612.955982]  schedule+0x28/0x80
[279612.955987]  schedule_timeout+0x1de/0x360
[279612.956123]  ? dce110_timing_generator_get_position+0x51/0x60 [amdgpu]
[279612.956246]  ? dce110_timing_generator_get_crtc_scanoutpos+0x6b/0xa0 [amdgpu]
[279612.956254]  dma_fence_default_wait+0x1f6/0x280
[279612.956261]  ? dma_fence_release+0x90/0x90
[279612.956267]  dma_fence_wait_timeout+0x33/0xe0
[279612.956274]  reservation_object_wait_timeout_rcu+0x198/0x340
[279612.956396]  amdgpu_dm_do_flip+0x112/0x350 [amdgpu]
[279612.956514]  amdgpu_dm_atomic_commit_tail+0x8a4/0x9a0 [amdgpu]
[279612.956521]  ? pick_next_task_fair+0x14f/0x5f0
[279612.956528]  commit_tail+0x3a/0x70
[279612.956534]  process_one_work+0x17c/0x370
[279612.956540]  worker_thread+0x2e/0x370
[279612.956545]  ? process_one_work+0x370/0x370
[279612.956551]  kthread+0x111/0x130
[279612.956558]  ? kthread_create_worker_on_cpu+0x70/0x70
[279612.956564]  ret_from_fork+0x1f/0x30
[279733.790840] INFO: task amdgpu_cs:0:20607 blocked for more than 120 seconds.
[279733.790848]       Not tainted 4.15.2 #2
[279733.790850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[279733.790854] amdgpu_cs:0     D    0 20607  20087 0x80000002
[279733.790861] Call Trace:
[279733.790876]  ? __schedule+0x26b/0x840
[279733.790883]  schedule+0x28/0x80
[279733.790890]  schedule_preempt_disabled+0xa/0x10
[279733.790898]  __mutex_lock.isra.1+0x18e/0x4c0
[279733.790906]  ? __slab_free+0x14b/0x300
[279733.790915]  ? drm_release+0x36/0x3b0
[279733.790920]  drm_release+0x36/0x3b0
[279733.790929]  __fput+0xcd/0x1d0
[279733.790937]  task_work_run+0x7b/0xa0
[279733.790943]  do_exit+0x2d0/0xb10
[279733.790948]  ? __check_object_size+0xaf/0x1b0
[279733.790954]  do_group_exit+0x3a/0xa0
[279733.790960]  get_signal+0x260/0x560
[279733.790968]  do_signal+0x36/0x690
[279733.791053]  ? amdgpu_drm_ioctl+0x6c/0x80 [amdgpu]
[279733.791060]  ? do_vfs_ioctl+0xa1/0x610
[279733.791066]  ? SyS_futex+0x12d/0x180
[279733.791072]  exit_to_usermode_loop+0x58/0x90
[279733.791077]  do_syscall_64+0xe8/0xf0
[279733.791082]  entry_SYSCALL_64_after_hwframe+0x21/0x86
[279733.791088] RIP: 0033:0x7f769b8f27dd
[279733.791091] RSP: 002b:00007f768b6bbd70 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
[279733.791097] RAX: fffffffffffffe00 RBX: 00007f76902db2f0 RCX: 00007f769b8f27dd
[279733.791100] RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00007f76902db318
[279733.791103] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[279733.791106] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000000019f0
[279733.791109] R13: 00007f76902db2c8 R14: 0000000000000000 R15: 00007f76902db318
[279733.791115] INFO: task kworker/u24:3:20617 blocked for more than 120 seconds.
[279733.791119]       Not tainted 4.15.2 #2
[279733.791121] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[279733.791124] kworker/u24:3   D    0 20617      2 0x80000000
[279733.791134] Workqueue: events_unbound commit_work
[279733.791138] Call Trace:
[279733.791145]  ? __schedule+0x26b/0x840
[279733.791152]  schedule+0x28/0x80
[279733.791156]  schedule_timeout+0x1de/0x360
[279733.791282]  ? dce110_timing_generator_get_position+0x51/0x60 [amdgpu]
[279733.791403]  ? dce110_timing_generator_get_crtc_scanoutpos+0x6b/0xa0 [amdgpu]
[279733.791411]  dma_fence_default_wait+0x1f6/0x280
[279733.791417]  ? dma_fence_release+0x90/0x90
[279733.791423]  dma_fence_wait_timeout+0x33/0xe0
[279733.791430]  reservation_object_wait_timeout_rcu+0x198/0x340
[279733.791552]  amdgpu_dm_do_flip+0x112/0x350 [amdgpu]
[279733.791673]  amdgpu_dm_atomic_commit_tail+0x8a4/0x9a0 [amdgpu]
[279733.791680]  ? pick_next_task_fair+0x14f/0x5f0
[279733.791686]  commit_tail+0x3a/0x70
[279733.791692]  process_one_work+0x17c/0x370
[279733.791697]  worker_thread+0x2e/0x370
[279733.791702]  ? process_one_work+0x370/0x370
[279733.791709]  kthread+0x111/0x130
[279733.791715]  ? kthread_create_worker_on_cpu+0x70/0x70
[279733.791721]  ret_from_fork+0x1f/0x30
[279733.791728] INFO: task ffmpeg_g:20642 blocked for more than 120 seconds.
[279733.791731]       Not tainted 4.15.2 #2
[279733.791733] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[279733.791736] ffmpeg_g        D    0 20642   7139 0x80000006
[279733.791741] Call Trace:
[279733.791748]  ? __schedule+0x26b/0x840
[279733.791755]  schedule+0x28/0x80
[279733.791864]  amd_sched_entity_push_job+0xa3/0xf0 [amdgpu]
[279733.791873]  ? finish_wait+0x80/0x80
[279733.791977]  amdgpu_job_submit+0x9c/0xc0 [amdgpu]
[279733.792062]  amdgpu_vm_bo_update_mapping+0x383/0x3f0 [amdgpu]
[279733.792145]  ? amdgpu_vm_free_mapping.isra.20+0x20/0x20 [amdgpu]
[279733.792225]  amdgpu_vm_clear_freed+0xbb/0x190 [amdgpu]
[279733.792301]  amdgpu_gem_object_close+0x19c/0x210 [amdgpu]
[279733.792313]  ? drm_gem_object_release_handle+0x2c/0x90
[279733.792320]  drm_gem_object_release_handle+0x2c/0x90
[279733.792327]  ? drm_gem_object_handle_put_unlocked+0xb0/0xb0
[279733.792332]  idr_for_each+0x48/0xe0
[279733.792340]  drm_gem_release+0x1c/0x30
[279733.792346]  drm_release+0x342/0x3b0
[279733.792353]  __fput+0xcd/0x1d0
[279733.792360]  task_work_run+0x7b/0xa0
[279733.792365]  do_exit+0x2d0/0xb10
[279733.792371]  do_group_exit+0x3a/0xa0
[279733.792376]  get_signal+0x260/0x560
[279733.792384]  do_signal+0x36/0x690
[279733.792392]  ? __vma_rb_erase+0x1f6/0x270
[279733.792398]  ? SyS_futex+0x12d/0x180
[279733.792403]  exit_to_usermode_loop+0x58/0x90
[279733.792408]  do_syscall_64+0xe8/0xf0
[279733.792413]  entry_SYSCALL_64_after_hwframe+0x21/0x86
[279733.792417] RIP: 0033:0x7f77a8ab37dd
[279733.792421] RSP: 002b:00007f778affcdd0 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
[279733.792426] RAX: fffffffffffffe00 RBX: 0000556b00c44718 RCX: 00007f77a8ab37dd
[279733.792429] RDX: 0000000000000000 RSI: 0000000000000080 RDI: 0000556b00c44740
[279733.792432] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000556b00ce59b8
[279733.792435] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000014
[279733.792437] R13: 0000556b00c447a8 R14: 0000000000000000 R15: 0000556b00c44740


More information about the mesa-dev mailing list