[REGRESSION] QXL display malfunction

Linux regression tracking (Thorsten Leemhuis) regressions at leemhuis.info
Mon Jul 1 10:02:20 UTC 2024


Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
for once, to make this easily accessible to everyone.

Thomas, was there some progress wrt to fixing below regression? I might
have missed something, but from here it looks like this fall through the
cracks.

Makes me wonder if we should temporarily revert this for now to fix this
for rc7 and ensure things get at least one week of testing before the final.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

On 14.06.24 15:45, Kaplan, David wrote:
> [AMD Official Use Only - AMD Internal Distribution Only]
> 
>> -----Original Message-----
>> From: Thomas Zimmermann <tzimmermann at suse.de>
>> Sent: Wednesday, June 12, 2024 9:26 AM
>> To: Linux regressions mailing list <regressions at lists.linux.dev>
>> Cc: Petkov, Borislav <Borislav.Petkov at amd.com>;
>> zack.rusin at broadcom.com; dmitry.osipenko at collabora.com; Kaplan, David
>> <David.Kaplan at amd.com>; Koenig, Christian <Christian.Koenig at amd.com>;
>> Dave Airlie <airlied at redhat.com>; Maarten Lankhorst
>> <maarten.lankhorst at linux.intel.com>; Maxime Ripard
>> <mripard at kernel.org>; LKML <linux-kernel at vger.kernel.org>; ML dri-devel
>> <dri-devel at lists.freedesktop.org>; spice-devel at lists.freedesktop.org;
>> virtualization at lists.linux.dev
>> Subject: Re: [REGRESSION] QXL display malfunction
>>
>> Caution: This message originated from an External Source. Use proper
>> caution when opening attachments, clicking links, or responding.
>>
>>
>> Hi
>>
>> Am 12.06.24 um 14:41 schrieb Linux regression tracking (Thorsten Leemhuis):
>>> [CCing a few more people and lists that get_maintainers pointed out
>>> for qxl]
>>>
>>> Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
>>> for once, to make this easily accessible to everyone.
>>>
>>> Thomas, from here it looks like this report that apparently is caused
>>> by a change of yours that went into 6.10-rc1 (b33651a5c98dbd
>>> ("drm/qxl: Do not pin buffer objects for vmap")) fell through the
>>> cracks. Or was progress made to resolve this and I just missed this?
>>>
>>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker'
>>> hat)
>>> --
>>> Everything you wanna know about Linux kernel regression tracking:
>>> https://linux-regtracking.leemhuis.info/about/#tldr
>>> If I did something stupid, please tell me, as explained on that page.
>>>
>>> #regzbot poke
>>>
>>>
>>> On 03.06.24 04:29, Kaplan, David wrote:
>>>>> -----Original Message-----
>>>>> From: Kaplan, David
>>>>> Sent: Sunday, June 2, 2024 9:25 PM
>>>>> To: tzimmermann at suse.de; dmitry.osipenko at collabora.com; Koenig,
>>>>> Christian <Christian.Koenig at amd.com>; zach.rusin at broadcom.com
>>>>> Cc: Petkov, Borislav <Borislav.Petkov at amd.com>;
>>>>> regressions at list.linux.dev
>>>>> Subject: [REGRESSION] QXL display malfunction
>>>>>
>>>>> Hi,
>>>>>
>>>>> I am running an Ubuntu 19.10 VM with a tip kernel using QXL video
>>>>> and I've observed the VM graphics often malfunction after boot,
>>>>> sometimes failing to load the Ubuntu desktop or even immediately
>> shutting the guest down.
>>>>> When it does load, the guest dmesg log often contains errors like
>>>>>
>>>>> [    4.303586] [drm:drm_atomic_helper_commit_planes] *ERROR* head
>> 1
>>>>> wrong: 65376256x16777216+0+0
>>>>> [    4.586883] [drm:drm_atomic_helper_commit_planes] *ERROR* head
>> 1
>>>>> wrong: 65376256x16777216+0+0
>>>>> [    4.904036] [drm:drm_atomic_helper_commit_planes] *ERROR* head
>> 1
>>>>> wrong: 65335296x16777216+0+0
>>
>> I don't see how these messages are related. Did they already appear before
>> the broken commit was there?
> 
> No, I did not observe them prior to the broken commit.
> 
>>
>>>>> [    5.374347] [drm:qxl_release_from_id_locked] *ERROR* failed to find
>> id in
>>>>> release_idr
>>
>> Is there only one such message in the log? Or multiple/frequent ones.
> 
> I would usually only see one.
> 
>>
>> Could you provide a stack trace of what happens before?
> 
> Here's the top of a backtrace when the error occurs:
> #0  qxl_release_from_id_locked (qdev=qdev at entry=0xffff88810126e000, id=id at entry=262151)
>     at drivers/gpu/drm/qxl/qxl_release.c:373
> #1  0xffffffff819f5b6a in qxl_garbage_collect (qdev=0xffff88810126e000)
>     at drivers/gpu/drm/qxl/qxl_cmd.c:222
> #2  0xffffffff810e3aa8 in process_one_work (worker=worker at entry=0xffff888101680300,
>     work=0xffff88810126f340) at kernel/workqueue.c:3231
> #3  0xffffffff810e6281 in process_scheduled_works (worker=<optimized out>)
>     at kernel/workqueue.c:3312
> #4  worker_thread (__worker=0xffff888101680300) at kernel/workqueue.c:3393
> 
>>
>> We sometimes draw into the buffer object from the CPU. For accessing the
>> buffer object's pages from the CPU, only a vmap operation should be
>> necessary. It appears as if qxl also requires a pin. My guess is that the pin
>> inserts the buffer-object's host-side pages and the code around
>> qxl_release_from_id_locked() appears to be garbage-collecting them.
>> Hence without the pin, the GC complains about inconsistent state.
>>>>>
>>>>> I bisected the issue down to "drm/qxl: Do not pin buffer objects for
>> vmap"
>>>>> (b33651a5c98dbd5a919219d8c129d0674ef74299).
>>
>> Thanks for bisecting. Does it work if you revert that commit?
> 
> Yes
> 
> Thanks --David Kaplan


More information about the Spice-devel mailing list