<html>
    <head>
      <base href="https://bugs.freedesktop.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - Invalid data in error state"
   href="https://bugs.freedesktop.org/show_bug.cgi?id=107691">107691</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>Invalid data in error state
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>DRI
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>XOrg git
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>Other
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>All
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>normal
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>medium
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>DRM/Intel
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>intel-gfx-bugs@lists.freedesktop.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>lionel.g.landwerlin@linux.intel.com
          </td>
        </tr>

        <tr>
          <th>QA Contact</th>
          <td>intel-gfx-bugs@lists.freedesktop.org
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>intel-gfx-bugs@lists.freedesktop.org
          </td>
        </tr></table>
      <p>
        <div>
        <pre>As part of debugging hangs we're starting to look more at the error states
generated by i915.
Together with Jason we've noticed that the data produced is incorrect.
Chunks of 32 bytes appear to just go missing (replaced by 0s).

For example this bug report :
<a class="bz_bug_link 
          bz_status_NEEDINFO "
   title="NEEDINFO - [kbl] GPU HANG: ecode 9:0:0x85dffffb"
   href="show_bug.cgi?id=107586">https://bugs.freedesktop.org/show_bug.cgi?id=107586</a>
has 2 error state from which you can see that the context image has its first
32bytes at 0s.
What we should be finding at that location looks more like this :

<a href="https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/intel/tools/gen8_context.h#L27">https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/intel/tools/gen8_context.h#L27</a>

MI_NOOP followed by MI_LRI.

We've noticed this issue on both Skylake & Kabylake (I believe this affects at
least all big cores).

To workaround this issue I came up with this patch :
<a href="https://github.com/djdeath/linux/commit/c18d4e1ee66cf587c484a60bba64f3dc4f35fc2e">https://github.com/djdeath/linux/commit/c18d4e1ee66cf587c484a60bba64f3dc4f35fc2e</a>
This is probably wrong as pointed by Chris on IRC, but it gets us correct data
in the error state.

This issue can be easily reproduced by running the IGT drv_hangman test on the
render ring and checking the content of the "rcs0 --- HW context" BO in the
error state. At offset 8092 we should find the MI_NOOP following by MI_LRI I
pointed above, but instead of 32 bytes of 0s.</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are the QA Contact for the bug.</li>
          <li>You are the assignee for the bug.</li>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>