<html data-lt-installed="true"><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body style="padding-bottom: 1px;">
<p><font face="monospace"><br>
</font></p>
<div class="moz-cite-prefix"><font face="monospace">On 12/15/22
10:17, Harry Wentland wrote:<br>
</font></div>
<blockquote type="cite" cite="mid:636f4287-803f-4cb9-dec0-2ffcc0f072d4@amd.com">
<pre class="moz-quote-pre" wrap="">
On 12/15/22 05:29, Michel Dänzer wrote:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">On 12/15/22 09:09, Christian König wrote:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">Am 15.12.22 um 00:33 schrieb Alex Hung:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">On 2022-12-14 16:06, Alex Deucher wrote:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">On Wed, Dec 14, 2022 at 5:56 PM Alex Hung <a class="moz-txt-link-rfc2396E" href="mailto:alex.hung@amd.com"><alex.hung@amd.com></a> wrote:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">On 2022-12-14 15:35, Alex Deucher wrote:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">On Wed, Dec 14, 2022 at 5:25 PM Alex Hung <a class="moz-txt-link-rfc2396E" href="mailto:alex.hung@amd.com"><alex.hung@amd.com></a> wrote:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">On 2022-12-14 14:54, Alex Deucher wrote:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">On Wed, Dec 14, 2022 at 4:50 PM Alex Hung <a class="moz-txt-link-rfc2396E" href="mailto:alex.hung@amd.com"><alex.hung@amd.com></a> wrote:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">On 2022-12-14 13:48, Alex Deucher wrote:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">On Wed, Dec 14, 2022 at 3:22 PM Aurabindo Pillai
<a class="moz-txt-link-rfc2396E" href="mailto:aurabindo.pillai@amd.com"><aurabindo.pillai@amd.com></a> wrote:
</pre>
<blockquote type="cite">
<pre class="moz-quote-pre" wrap="">
From: Alex Hung <a class="moz-txt-link-rfc2396E" href="mailto:alex.hung@amd.com"><alex.hung@amd.com></a>
[Why]
When running IGT kms_bw test with DP monitor, some systems crash from
msleep no matter how long or short the time is.
[How]
To replace msleep with mdelay.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
Can you provide a bit more info about the crash? A lot of platforms
don't support delay larger than 2-4ms so this change will generate
errors on ARM and possibly other platforms.
Alex
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
The msleep was introduced in eec3303de3378 for non-compliant display
port monitors but IGT's kms_bw test can cause a recent h/w to hang at
msleep(60) when calling "igt_remove_fb" in IGT
(<a class="moz-txt-link-freetext" href="https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/blob/master/tests/kms_bw.c#L197">https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/blob/master/tests/kms_bw.c#L197</a>>>>>>>>>>>
It is possible to workaround this by reversing order of
igt_remove_fb(&buffer[i]), as the following example:
igt_create_color_fb with the order buffer[0], buffer[1], buffer[2]
Hangs:
igt_remove_fb with the order buffer[0], buffer[1], buffer[2]
No hangs:
igt_remove_fb with the reversed order buffer[2], buffer[1], buffer[0]
However, IGT simply exposes the problem and it makes more sense to stop
the hang from occurring.
I also tried to remove the msleep completely and it also work, but I
didn't want to break the fix for the original problematic hardware
configuration.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
Why does sleep vs delay make a difference? Is there some race that we
are not locking against?
Alex
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
That was my original thought but I did not find any previously. I will
investigate it again.
If mdelay(>4) isn't usable on other platforms, is it an option to use
mdelay on x86_64 only and keep msleep on other platforms or just remove
the msleep for other platforms, something like
- msleep(60);
+#ifdef CONFIG_X86_64
+ mdelay(60);
+#endif
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
That's pretty ugly. I'd rather try and resolve the root cause. How
important is the IGT test? What does it do? Is the test itself
correct?
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
Agreed, and I didn't want to add conditions around the mdelay for the
same reason. I will assume this is not an option now.
As in the previous comment, IGT can be modified to avoid the crash by
reversing the order fb is removed - though I suspect I will receive
questions why this is not fixed in kernel.
I wanted to fix this in kernel because nothing stops other user-space
applications to use the same way to crash kernel, so fixing IGT is the
second option.
Apparently causing problems on other platforms isn't an option at all so
I will try to figure out an non-mdelay solution, and then maybe an IGT
solution instead.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
What hangs? The test or the kernel or the hardware?
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
The system becomes completely unresponsive - no keyboard, mouse nor remote accesses.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
I agree with Alex that changing this is extremely questionable and not justified at all.
My educated guess is that by using mdelay() instead of msleep() we keep the CPU core busy and preventing something from happening at the same time as something else.
This clearly points to missing locking or similar to protect concurrent execution of things.
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">Might another possibility be that this code gets called from an atomic context which can't sleep?
</pre>
</blockquote>
<pre class="moz-quote-pre" wrap="">
It can come through handle_hpd_rx_irq but we're using a workqueue
to queue interrupt handling so this shouldn't come from an atomic
context. I currently don't see where else it might be used in an
atomic context. Alex Hung, can you do a dump_stack() in this function
to see where the problematic call is coming from?
Fixing IGT will only mask the issue. Userspace should never be able
to put the system in a state where it stops responding entirely. This
will need some sort of fix in the kernel.
Harry
</pre>
</blockquote>
<font face="monospace">I will drop this patch from the series.<br>
</font>
</body>
<lt-container></lt-container>
</html>