<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <p>Hi Mark,</p>
    <p>I couldn't reproduce the issue on my Polaris 11 to run mpv /
      ffmpeg about 1.5 hours.</p>
    <p>one terminal run: <br>
    </p>
    <p>ffmpeg -y -hwaccel vaapi -hwaccel_device /dev/dri/renderD128
      -hwaccel_output_format vaapi -i video/Mr.Right.mp4 -an -c:v
      hevc_vaapi -bf 0 out.mp4<br>
    </p>
    <p>the other  terminal run:</p>
    <p>mpv --fs --loop --no-audio --vo gpu --gpu-context=x11egl
      --hwdec=vaapi video/Mr.Right.mp4<br>
      But it has some failure with vaDeriveImage. I am not  sure if this
      failure matters, the video still can play without any other error,</p>
    <p>mpv --fs --loop --no-audio --vo vaapi  --hwdec=vaapi
      video/Mr.Right.mp4 <br>
    </p>
    <p>No error reported with this command line.</p>
    James Zhu<br>
    <img src="cid:part1.0C0537B7.576E390E@amd.com" alt="">
    <div class="moz-cite-prefix">On 2018-02-13 05:03 PM, Mark Thompson
      wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:939f4c7e-b22f-032d-bdb3-ad2e4bd3a53c@jkqxz.net">
      <pre wrap="">On 13/02/18 16:38, James Zhu wrote:
</pre>
      <blockquote type="cite">
        <pre wrap="">Hi Mark,

Did you still encounter hung issue?

If yes, could you share me with your play and transcode streams and command line,
then I can try to reproduce at my side.

Thanks & Best Regards!

James Zhu
</pre>
      </blockquote>
      <pre wrap="">
Yes, it does still happen with the latest patches and vanila kernel 4.15.2, on an RX 460 / Polaris 11.


To reproduce:

Take a normal 1080p H.264 input file (I tried a few different ones and it didn't change anyway, if you want something exactly the same then the usual Big Buck Bunny video was among those tested).

Use the GPU to play back the video with mpv in a normal X session running on the AMD card (I'm running this via ssh in an otherwise-empty X instance):

mpv --fs --loop --no-audio --vo gpu --gpu-context=x11egl --hwdec=vaapi bbb_1080_264.mp4

Then transcode it to H.265 on the same device at the same time:

ffmpeg -y -hwaccel vaapi -hwaccel_device /dev/dri/renderD129 -hwaccel_output_format vaapi -i bbb_1080_264.mp4 -an -c:v hevc_vaapi -bf 0 out.mp4

and the GPU locks up completely very quickly (within a few seconds / a few hundred frames of starting).

That leaves unkillable zombie processes of everything which was touching the GPU at the time it died:

$ ps aux | grep [d]efunct
root      6994  0.4  0.0      0     0 ?        Zsl  20:43   0:22 [Xorg] <defunct>
mrt      20601  0.3  0.0      0     0 ?        Zl   21:50   0:02 [mpv] <defunct>
mrt      20630  0.0  0.0      0     0 ?        Zl   21:51   0:00 [ffmpeg_g] <defunct>


To compare, encoding H.264 instead of H.265 at the same time with:

ffmpeg -y -hwaccel vaapi -hwaccel_device /dev/dri/renderD129 -hwaccel_output_format vaapi -i bbb_1080_264.mp4 -an -c:v h264_vaapi -profile constrained_baseline -bf 0 out.mp4

does not fail.


Thanks,

- Mark



Kernel messages:

[279612.955929] INFO: task kworker/u24:3:20617 blocked for more than 120 seconds.
[279612.955936]       Not tainted 4.15.2 #2
[279612.955939] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[279612.955943] kworker/u24:3   D    0 20617      2 0x80000000
[279612.955957] Workqueue: events_unbound commit_work
[279612.955961] Call Trace:
[279612.955975]  ? __schedule+0x26b/0x840
[279612.955982]  schedule+0x28/0x80
[279612.955987]  schedule_timeout+0x1de/0x360
[279612.956123]  ? dce110_timing_generator_get_position+0x51/0x60 [amdgpu]
[279612.956246]  ? dce110_timing_generator_get_crtc_scanoutpos+0x6b/0xa0 [amdgpu]
[279612.956254]  dma_fence_default_wait+0x1f6/0x280
[279612.956261]  ? dma_fence_release+0x90/0x90
[279612.956267]  dma_fence_wait_timeout+0x33/0xe0
[279612.956274]  reservation_object_wait_timeout_rcu+0x198/0x340
[279612.956396]  amdgpu_dm_do_flip+0x112/0x350 [amdgpu]
[279612.956514]  amdgpu_dm_atomic_commit_tail+0x8a4/0x9a0 [amdgpu]
[279612.956521]  ? pick_next_task_fair+0x14f/0x5f0
[279612.956528]  commit_tail+0x3a/0x70
[279612.956534]  process_one_work+0x17c/0x370
[279612.956540]  worker_thread+0x2e/0x370
[279612.956545]  ? process_one_work+0x370/0x370
[279612.956551]  kthread+0x111/0x130
[279612.956558]  ? kthread_create_worker_on_cpu+0x70/0x70
[279612.956564]  ret_from_fork+0x1f/0x30
[279733.790840] INFO: task amdgpu_cs:0:20607 blocked for more than 120 seconds.
[279733.790848]       Not tainted 4.15.2 #2
[279733.790850] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[279733.790854] amdgpu_cs:0     D    0 20607  20087 0x80000002
[279733.790861] Call Trace:
[279733.790876]  ? __schedule+0x26b/0x840
[279733.790883]  schedule+0x28/0x80
[279733.790890]  schedule_preempt_disabled+0xa/0x10
[279733.790898]  __mutex_lock.isra.1+0x18e/0x4c0
[279733.790906]  ? __slab_free+0x14b/0x300
[279733.790915]  ? drm_release+0x36/0x3b0
[279733.790920]  drm_release+0x36/0x3b0
[279733.790929]  __fput+0xcd/0x1d0
[279733.790937]  task_work_run+0x7b/0xa0
[279733.790943]  do_exit+0x2d0/0xb10
[279733.790948]  ? __check_object_size+0xaf/0x1b0
[279733.790954]  do_group_exit+0x3a/0xa0
[279733.790960]  get_signal+0x260/0x560
[279733.790968]  do_signal+0x36/0x690
[279733.791053]  ? amdgpu_drm_ioctl+0x6c/0x80 [amdgpu]
[279733.791060]  ? do_vfs_ioctl+0xa1/0x610
[279733.791066]  ? SyS_futex+0x12d/0x180
[279733.791072]  exit_to_usermode_loop+0x58/0x90
[279733.791077]  do_syscall_64+0xe8/0xf0
[279733.791082]  entry_SYSCALL_64_after_hwframe+0x21/0x86
[279733.791088] RIP: 0033:0x7f769b8f27dd
[279733.791091] RSP: 002b:00007f768b6bbd70 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
[279733.791097] RAX: fffffffffffffe00 RBX: 00007f76902db2f0 RCX: 00007f769b8f27dd
[279733.791100] RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00007f76902db318
[279733.791103] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[279733.791106] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000000019f0
[279733.791109] R13: 00007f76902db2c8 R14: 0000000000000000 R15: 00007f76902db318
[279733.791115] INFO: task kworker/u24:3:20617 blocked for more than 120 seconds.
[279733.791119]       Not tainted 4.15.2 #2
[279733.791121] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[279733.791124] kworker/u24:3   D    0 20617      2 0x80000000
[279733.791134] Workqueue: events_unbound commit_work
[279733.791138] Call Trace:
[279733.791145]  ? __schedule+0x26b/0x840
[279733.791152]  schedule+0x28/0x80
[279733.791156]  schedule_timeout+0x1de/0x360
[279733.791282]  ? dce110_timing_generator_get_position+0x51/0x60 [amdgpu]
[279733.791403]  ? dce110_timing_generator_get_crtc_scanoutpos+0x6b/0xa0 [amdgpu]
[279733.791411]  dma_fence_default_wait+0x1f6/0x280
[279733.791417]  ? dma_fence_release+0x90/0x90
[279733.791423]  dma_fence_wait_timeout+0x33/0xe0
[279733.791430]  reservation_object_wait_timeout_rcu+0x198/0x340
[279733.791552]  amdgpu_dm_do_flip+0x112/0x350 [amdgpu]
[279733.791673]  amdgpu_dm_atomic_commit_tail+0x8a4/0x9a0 [amdgpu]
[279733.791680]  ? pick_next_task_fair+0x14f/0x5f0
[279733.791686]  commit_tail+0x3a/0x70
[279733.791692]  process_one_work+0x17c/0x370
[279733.791697]  worker_thread+0x2e/0x370
[279733.791702]  ? process_one_work+0x370/0x370
[279733.791709]  kthread+0x111/0x130
[279733.791715]  ? kthread_create_worker_on_cpu+0x70/0x70
[279733.791721]  ret_from_fork+0x1f/0x30
[279733.791728] INFO: task ffmpeg_g:20642 blocked for more than 120 seconds.
[279733.791731]       Not tainted 4.15.2 #2
[279733.791733] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[279733.791736] ffmpeg_g        D    0 20642   7139 0x80000006
[279733.791741] Call Trace:
[279733.791748]  ? __schedule+0x26b/0x840
[279733.791755]  schedule+0x28/0x80
[279733.791864]  amd_sched_entity_push_job+0xa3/0xf0 [amdgpu]
[279733.791873]  ? finish_wait+0x80/0x80
[279733.791977]  amdgpu_job_submit+0x9c/0xc0 [amdgpu]
[279733.792062]  amdgpu_vm_bo_update_mapping+0x383/0x3f0 [amdgpu]
[279733.792145]  ? amdgpu_vm_free_mapping.isra.20+0x20/0x20 [amdgpu]
[279733.792225]  amdgpu_vm_clear_freed+0xbb/0x190 [amdgpu]
[279733.792301]  amdgpu_gem_object_close+0x19c/0x210 [amdgpu]
[279733.792313]  ? drm_gem_object_release_handle+0x2c/0x90
[279733.792320]  drm_gem_object_release_handle+0x2c/0x90
[279733.792327]  ? drm_gem_object_handle_put_unlocked+0xb0/0xb0
[279733.792332]  idr_for_each+0x48/0xe0
[279733.792340]  drm_gem_release+0x1c/0x30
[279733.792346]  drm_release+0x342/0x3b0
[279733.792353]  __fput+0xcd/0x1d0
[279733.792360]  task_work_run+0x7b/0xa0
[279733.792365]  do_exit+0x2d0/0xb10
[279733.792371]  do_group_exit+0x3a/0xa0
[279733.792376]  get_signal+0x260/0x560
[279733.792384]  do_signal+0x36/0x690
[279733.792392]  ? __vma_rb_erase+0x1f6/0x270
[279733.792398]  ? SyS_futex+0x12d/0x180
[279733.792403]  exit_to_usermode_loop+0x58/0x90
[279733.792408]  do_syscall_64+0xe8/0xf0
[279733.792413]  entry_SYSCALL_64_after_hwframe+0x21/0x86
[279733.792417] RIP: 0033:0x7f77a8ab37dd
[279733.792421] RSP: 002b:00007f778affcdd0 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
[279733.792426] RAX: fffffffffffffe00 RBX: 0000556b00c44718 RCX: 00007f77a8ab37dd
[279733.792429] RDX: 0000000000000000 RSI: 0000000000000080 RDI: 0000556b00c44740
[279733.792432] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000556b00ce59b8
[279733.792435] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000014
[279733.792437] R13: 0000556b00c447a8 R14: 0000000000000000 R15: 0000556b00c44740
</pre>
    </blockquote>
    <br>
  </body>
</html>