<html>
    <head>
      <base href="https://bugs.freedesktop.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - HEVC GPU hang"
   href="https://bugs.freedesktop.org/show_bug.cgi?id=100098">100098</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>HEVC GPU hang
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>DRI
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>XOrg git
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>Other
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>All
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>normal
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>medium
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>DRM/Intel
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>intel-gfx-bugs@lists.freedesktop.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>SZavialov@luxoft.com
          </td>
        </tr>

        <tr>
          <th>QA Contact</th>
          <td>intel-gfx-bugs@lists.freedesktop.org
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>intel-gfx-bugs@lists.freedesktop.org
          </td>
        </tr></table>
      <p>
        <div>
        <pre>Created <span class=""><a href="attachment.cgi?id=130111" name="attach_130111" title="ZIP Arch of dmidecode.txt, modinfo_i915.txt and error files">attachment 130111</a> <a href="attachment.cgi?id=130111&action=edit" title="ZIP Arch of dmidecode.txt, modinfo_i915.txt and error files">[details]</a></span>
ZIP Arch of dmidecode.txt, modinfo_i915.txt and error files

We see GPU hang which occurs when doing HW HEVC encoding (with
MediaServerStudio 2017 R2). The issue happens randomly and usully after 3-6
minites after beggining of HEVC encoding (or transcoding to HEVC). The issue
isn’t seen with AVC encoder. 

CPU: Intel(R) Core(TM) i7-6822EQ CPU @ 2.00GHz
OS: CentOS 7 
uname a: 3.10.0-327.el7.x86_64 #1 SMP Thu Nov 19 22:10:57 UTC 2015 x86_64
x86_64 x86_64 GNU/Linux
i915 is from MSS package. 

Command line:
./sample_multi_transcode -par smt.par  -timeout 18000
...
...............................................................................................................................................................................
[ERROR], sts=MFX_ERR_GPU_HANG(-21), PutBS, m_pmfxSession->SyncOperation failed
at
/home/lab_msdk/buildAgentDir/buildAgent_MediaSDK3/git/mdp_msdk-samples/samples/sample_multi_transcode/src/pipeline_transcode.cpp:1575

[ERROR], sts=MFX_ERR_GPU_HANG(-21), Transcode, PutBS failed at
/home/lab_msdk/buildAgentDir/buildAgent_MediaSDK3/git/mdp_msdk-samples/samples/sample_multi_transcode/src/pipeline_transcode.cpp:1540
...
Common transcoding time is 551.229 sec
-------------------------------------------------------------------------------
*** session 0 FAILED (MFX_ERR_GPU_HANG) 217.337 sec, 3485 frames
-i::h264 out_1.h264 -o::h265 out_1.h265 -b 10000 

*** session 1 FAILED (MFX_ERR_GPU_HANG) 210.426 sec, 3486 frames
-i::h264 out_1.h264 -o::h265 out_1.h265 -b 10000 

*** session 2 FAILED (MFX_ERR_GPU_HANG) 551.228 sec, 17314 frames
-i::h264 out_2.h264 -o::h265 out_2.h265 -b 10000 

*** session 3 FAILED (MFX_ERR_GPU_HANG) 217.329 sec, 3485 frames
-i::h264 out_2.h264 -o::h265 out_2.h265 -b 10000 

*** session 4 FAILED (MFX_ERR_GPU_HANG) 551.228 sec, 17314 frames
-i::h264 out_3.h264 -o::h265 out_3.h265 -b 10000 

dmesg | grep drm
[    1.035529] drm_ukmd_compat: module verification failed: signature and/or
required key missing - tainting kernel
[    1.036455] Initialized drm/i915 compat module
20161215-16.5.1-59511-k75a71d9
[    1.039089] [drm] Initialized drm 1.1.0 20060810
[    1.045545] [drm_ukmd] Initialized drm_ukmd module
[    1.227096] [drm_ukmd] Memory usable by graphics device = 4096M
[    1.227102] fb: conflicting fb hw usage inteldrmfb vs EFI VGA - removing
generic driver
[    1.227230] [drm_ukmd] Replacing VGA console driver
[    1.234477] [drm_ukmd] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    1.234478] [drm_ukmd] Driver supports precise vblank timestamp query.
[    1.234551] [drm_ukmd:i915_gem_init_stolen [i915]] *ERROR* conflict detected
with stolen region: [0x8e000000 - 0x90000000]
[    1.961398] [drm_ukmd] RC6 disabled, disabling runtime PM support
[    1.961402] [drm_ukmd] Initialized i915 1.6.0 20161215-16.5.1-59511-k686851e
for 0000:00:02.0 on minor 0
[    2.175155] fbcon: inteldrmfb (fb0) is primary device
[    2.419227] WARNING: at
../../../../qb/workspace/17023/p4gen/gfx_Development/builds/centos/_rpmbuild_tmp/BUILD/ukmd-16.5.1/drivers/gpu/drm/i915/intel_pm.c:3597
skl_update_other_pipe_wm+0x217/0x230 [i915]()
[    2.419241] Modules linked in: sd_mod crc_t10dif crct10dif_generic i915(OE)
i2c_algo_bit drm_ukmd_kms_helper(OE) syscopyarea sysfillrect sysimgblt ahci
fb_sys_fops e1000e libahci crct10dif_pclmul crct10dif_common crc32c_intel
libata ptp drm_ukmd(OE) serio_raw pps_core drm(OE) drm_ukmd_compat(OE) video
i2c_hid i2c_core dm_mirror dm_region_hash dm_log dm_mod
[    2.419399]  [<ffffffffa00f6be7>] _ukmd_drm_atomic_commit+0x37/0x60
[drm_ukmd]
[    2.419404]  [<ffffffffa0235f98>] restore_fbdev_mode+0x248/0x280
[drm_ukmd_kms_helper]
[    2.419409]  [<ffffffffa02381f3>]
_ukmd_drm_fb_helper_restore_fbdev_mode_unlocked+0x33/0x80 [drm_ukmd_kms_helper]
[    2.419413]  [<ffffffffa023826c>] _ukmd_drm_fb_helper_set_par+0x2c/0x60
[drm_ukmd_kms_helper]
[    2.419474]  [<ffffffffa023853c>]
_ukmd_drm_fb_helper_initial_config+0x29c/0x3f0 [drm_ukmd_kms_helper]
[    2.696982] i915 0000:00:02.0: fb0: inteldrmfb frame buffer device
[    3.730385] [drm_ukmd] RC6 off
[    3.732473] [drm_ukmd] The Ring/GT multiplier is 2
[    4.060722] SELinux: initialized (dev drm, type drm), not configured for
labeling
[  601.278623] WARNING: at
../../../../qb/workspace/17023/p4gen/gfx_Development/builds/centos/_rpmbuild_tmp/BUILD/ukmd-16.5.1/drivers/gpu/drm/i915/intel_pm.c:3597
skl_update_other_pipe_wm+0x217/0x230 [i915]()
[  601.278676] Modules linked in: snd_hda_codec_hdmi vfat fat
snd_hda_codec_realtek snd_hda_codec_generic intel_powerclamp coretemp
intel_rapl snd_hda_intel kvm_intel kvm snd_hda_codec snd_hda_core snd_hwdep
crc32_pclmul snd_seq ghash_clmulni_intel snd_seq_device snd_pcm aesni_intel lrw
gf128mul glue_helper ppdev ablk_helper snd_timer cryptd snd soundcore sg
cdc_acm pcspkr i2c_i801 parport_pc parport shpchp acpi_pad acpi_cpufreq
ip_tables xfs libcrc32c hid_multitouch sd_mod crc_t10dif crct10dif_generic
i915(OE) i2c_algo_bit drm_ukmd_kms_helper(OE) syscopyarea sysfillrect sysimgblt
ahci fb_sys_fops e1000e libahci crct10dif_pclmul crct10dif_common crc32c_intel
libata ptp drm_ukmd(OE) serio_raw pps_core drm(OE) drm_ukmd_compat(OE) video
i2c_hid i2c_core dm_mirror dm_region_hash dm_log dm_mod
[  601.278907]  [<ffffffffa00f6be7>] _ukmd_drm_atomic_commit+0x37/0x60
[drm_ukmd]
[  601.278922]  [<ffffffffa0234c2c>]
_ukmd_drm_atomic_helper_connector_dpms+0xfc/0x1b0 [drm_ukmd_kms_helper]
[  601.278935]  [<ffffffffa0237020>] drm_fb_helper_dpms.isra.9+0xa0/0xe0
[drm_ukmd_kms_helper]
[  601.278946]  [<ffffffffa0237099>] _ukmd_drm_fb_helper_blank+0x39/0xa0
[drm_ukmd_kms_helper]

GPU hang:
[ 2579.435226] [drm_ukmd] stuck on bsd ring
[ 2579.435671] [drm_ukmd] GPU HANG: ecode 9:1:0xc85efffe, in sample_multi_tr
[2500], reason: Ring hung, action: reset
[ 2579.435674] [drm_ukmd] GPU hangs can indicate a bug anywhere in the entire
gfx stack, including userspace.
[ 2579.435676] [drm_ukmd] Please file a _new_ bug report on
bugs.freedesktop.org against DRI -> DRM/Intel
[ 2579.435679] [drm_ukmd] drm/i915 developers can then reassign to the right
component if it's not a kernel issue.
[ 2579.435680] [drm_ukmd] The gpu crash dump is required to analyze gpu hangs,
so please always attach it.
[ 2579.435683] [drm_ukmd] GPU crash dump saved to /sys/class/drm/card0/error
[ 2579.437919] drm/i915: Resetting chip after gpu hang
[ 2581.435005] [drm_ukmd] RC6 off
[ 2581.435047] [drm_ukmd] The Ring/GT multiplier is 2
[ 4187.041260] SELinux: initialized (dev tmpfs, type tmpfs), uses transition
SIDs
[ 4269.242216] [drm_ukmd] stuck on bsd ring
[ 4269.242657] [drm_ukmd] GPU HANG: ecode 9:1:0xc85efffe, in sample_multi_tr
[2516], reason: Ring hung, action: reset
[ 4269.244830] drm/i915: Resetting chip after gpu hang
[ 4271.241998] [drm_ukmd] RC6 off
[ 4271.242037] [drm_ukmd] The Ring/GT multiplier is 2

Dmidecode is attached.
/sys/class/drm/card0/error file is attached.
modinfo i915 is attached.</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are the QA Contact for the bug.</li>
          <li>You are the assignee for the bug.</li>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>