[Nouveau] Tracking down severe regression in 5.3-rc4/5.4 for TU116 - assistance needed

Marcin Zajączkowski mszpak at wp.pl
Thu Dec 19 20:27:28 UTC 2019


On 2019-12-16 19:45, Ilia Mirkin wrote:
> The obvious candidate based on a quick scan is
> 0acf5676dc0ffe0683543a20d5ecbd112af5b8ee -- it merges a fix that
> messes with PCI stuff, and there lie dragons. You could try building
> that commit, and if things still work, then I have no idea (and you've

Nice shot Ilia!

I managed to build kernel from suspected bd112af5b8ee and it fails
miserably (as previously described). The build from the previous commit
86a04561920b works fine.

> narrowed the range). Also I'd recommend ensuring that the good kernel
> is really good and the bad kernel is really bad -- boot them a few
> times.

Well, this problem is reproducible in 100% in newer kernels. I see the
errors on boot logs and after login to Gnome Shell the first execution
of xrandr (or opening a lid) hangs the system (the graphic card). On the
other side I haven't seen that problem in any earlier kernel. Therefore,
the situation is rather clear in my case. Nevertheless, I will stay with
that self-build good kernel (5.3.0-0.rc3 + git) to check it further.


How would you see it, Ilia? Is there anything in nouveau that needs to
be adjusted to that changes or rather those changes break something in
nouveau that would be best to fix/revert them (and it would be good to
let the committer know about the problem)?

Marcin



> On Mon, Dec 16, 2019 at 12:42 PM Marcin Zajączkowski <mszpak at wp.pl> wrote:
>>
>> On 2019-12-16 18:08, Ilia Mirkin wrote:
>>> Hi Marcin,
>>>
>>> You should do a git bisect rather than guessing about commits. I
>>> suspect that searching for "kernel git bisect fedora" should prove
>>> instructive if you're not sure how to do this.
>>
>> Thanks for your suggestion. I realize that I can do it at the Git level
>> and it is the ultimate way to go. However, building the kernel version
>> from sources takes some time (in addition to a regular time needed to
>> install/restart/verify which I already experienced narrowing down to a
>> "just" ~250 commits).
>>
>> Therefore, I would be really thankful for a suggestion which commits
>> could be good to check first - having 2, 4 is better than 8-10 (assuming
>> someone is right :) ).
>>
>> Marcin
>>
>>
>>
>>> On Mon, Dec 16, 2019 at 11:42 AM Marcin Zajączkowski <mszpak at wp.pl> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I've encountered a severe regression in TU116 (probably also TU117)
>>>> introduced in 5.3-rc4 (valid also for recent 5.4.2) [1]. The system
>>>> usually hangs on the subsequent graphic mode related operation (calling
>>>> xrandr after login is enough) with the following error:
>>>>
>>>>> kernel: nouveau 0000:01:00.0: fifo: SCHED_ERROR 08 []
>>>> ...
>>>>> kernel: nouveau 0000:01:00.0: DRM: failed to idle channel 0 [DRM]
>>>>> kernel: nouveau 0000:01:00.0: i2c: aux 0007: begin idle timeout ffffffff
>>>>> kernel: nouveau 0000:01:00.0: tmr: stalled at ffffffffffffffff
>>>>> kernel: ------------[ cut here ]------------
>>>>> kernel: nouveau 0000:01:00.0: timeout
>>>>> kernel: WARNING: CPU: 10 PID: 384 at drivers/gpu/drm/nouveau/nvkm/subdev/bar/g84.c:35 g84_bar_flush+0xcf/> 0xe0 [nouveau]
>>>>
>>>> (detailed log in a corresponding issue - [1])
>>>>
>>>> With earlier kernels there was no hardware acceleration for NVidia GTX
>>>> 1660 Ti, but at least I could use nouveau to disable it (to save
>>>> battery, trees and lower temperature) or even have an external output
>>>> (with Wayland). Now, the system is unusable with nouveau :(.
>>>>
>>>> I spent some time trying to narrow the scope using on the existing
>>>> kernel builds for Fedora. I was able to determine that the problem was
>>>> introduced between 5.3.0-0.rc3.git1.1 (commit 33920f1ec5bf - works fine)
>>>> and 5.3.0-0.rc4.git0.1 (tag v5.3-rc4 - fails with errors).
>>>>
>>>> It's just a few days (7-11 Aug) and "only" around 250 commits. I went
>>>> through them, but (based on the commits name) I haven't seen any nouveau
>>>> related changes and in general no very suspected drm related changes.
>>>>
>>>>> git log 33920f1ec5bf..v5.3-rc4 --stat
>>>>
>>>>
>>>> Maybe some of more nouveau/drm-experienced developers could take a look
>>>> at that to determine which commit could break it (to make it easier to
>>>> find out what should be fixed to prevent that regression)?
>>>>
>>>>
>>>> [1] -
>>>> https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/issues/516
>>>>
>>>> Thanks in advance
>>>> Marcin


More information about the Nouveau mailing list