[REGRESSION] QXL display malfunction

Kaplan, David David.Kaplan at amd.com
Fri Jun 14 13:45:21 UTC 2024


[AMD Official Use Only - AMD Internal Distribution Only]

> -----Original Message-----
> From: Thomas Zimmermann <tzimmermann at suse.de>
> Sent: Wednesday, June 12, 2024 9:26 AM
> To: Linux regressions mailing list <regressions at lists.linux.dev>
> Cc: Petkov, Borislav <Borislav.Petkov at amd.com>;
> zack.rusin at broadcom.com; dmitry.osipenko at collabora.com; Kaplan, David
> <David.Kaplan at amd.com>; Koenig, Christian <Christian.Koenig at amd.com>;
> Dave Airlie <airlied at redhat.com>; Maarten Lankhorst
> <maarten.lankhorst at linux.intel.com>; Maxime Ripard
> <mripard at kernel.org>; LKML <linux-kernel at vger.kernel.org>; ML dri-devel
> <dri-devel at lists.freedesktop.org>; spice-devel at lists.freedesktop.org;
> virtualization at lists.linux.dev
> Subject: Re: [REGRESSION] QXL display malfunction
>
> Caution: This message originated from an External Source. Use proper
> caution when opening attachments, clicking links, or responding.
>
>
> Hi
>
> Am 12.06.24 um 14:41 schrieb Linux regression tracking (Thorsten Leemhuis):
> > [CCing a few more people and lists that get_maintainers pointed out
> > for qxl]
> >
> > Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
> > for once, to make this easily accessible to everyone.
> >
> > Thomas, from here it looks like this report that apparently is caused
> > by a change of yours that went into 6.10-rc1 (b33651a5c98dbd
> > ("drm/qxl: Do not pin buffer objects for vmap")) fell through the
> > cracks. Or was progress made to resolve this and I just missed this?
> >
> > Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker'
> > hat)
> > --
> > Everything you wanna know about Linux kernel regression tracking:
> > https://linux-regtracking.leemhuis.info/about/#tldr
> > If I did something stupid, please tell me, as explained on that page.
> >
> > #regzbot poke
> >
> >
> > On 03.06.24 04:29, Kaplan, David wrote:
> >>> -----Original Message-----
> >>> From: Kaplan, David
> >>> Sent: Sunday, June 2, 2024 9:25 PM
> >>> To: tzimmermann at suse.de; dmitry.osipenko at collabora.com; Koenig,
> >>> Christian <Christian.Koenig at amd.com>; zach.rusin at broadcom.com
> >>> Cc: Petkov, Borislav <Borislav.Petkov at amd.com>;
> >>> regressions at list.linux.dev
> >>> Subject: [REGRESSION] QXL display malfunction
> >>>
> >>> Hi,
> >>>
> >>> I am running an Ubuntu 19.10 VM with a tip kernel using QXL video
> >>> and I've observed the VM graphics often malfunction after boot,
> >>> sometimes failing to load the Ubuntu desktop or even immediately
> shutting the guest down.
> >>> When it does load, the guest dmesg log often contains errors like
> >>>
> >>> [    4.303586] [drm:drm_atomic_helper_commit_planes] *ERROR* head
> 1
> >>> wrong: 65376256x16777216+0+0
> >>> [    4.586883] [drm:drm_atomic_helper_commit_planes] *ERROR* head
> 1
> >>> wrong: 65376256x16777216+0+0
> >>> [    4.904036] [drm:drm_atomic_helper_commit_planes] *ERROR* head
> 1
> >>> wrong: 65335296x16777216+0+0
>
> I don't see how these messages are related. Did they already appear before
> the broken commit was there?

No, I did not observe them prior to the broken commit.

>
> >>> [    5.374347] [drm:qxl_release_from_id_locked] *ERROR* failed to find
> id in
> >>> release_idr
>
> Is there only one such message in the log? Or multiple/frequent ones.

I would usually only see one.

>
> Could you provide a stack trace of what happens before?

Here's the top of a backtrace when the error occurs:
#0  qxl_release_from_id_locked (qdev=qdev at entry=0xffff88810126e000, id=id at entry=262151)
    at drivers/gpu/drm/qxl/qxl_release.c:373
#1  0xffffffff819f5b6a in qxl_garbage_collect (qdev=0xffff88810126e000)
    at drivers/gpu/drm/qxl/qxl_cmd.c:222
#2  0xffffffff810e3aa8 in process_one_work (worker=worker at entry=0xffff888101680300,
    work=0xffff88810126f340) at kernel/workqueue.c:3231
#3  0xffffffff810e6281 in process_scheduled_works (worker=<optimized out>)
    at kernel/workqueue.c:3312
#4  worker_thread (__worker=0xffff888101680300) at kernel/workqueue.c:3393

>
> We sometimes draw into the buffer object from the CPU. For accessing the
> buffer object's pages from the CPU, only a vmap operation should be
> necessary. It appears as if qxl also requires a pin. My guess is that the pin
> inserts the buffer-object's host-side pages and the code around
> qxl_release_from_id_locked() appears to be garbage-collecting them.
> Hence without the pin, the GC complains about inconsistent state.
> >>>
> >>> I bisected the issue down to "drm/qxl: Do not pin buffer objects for
> vmap"
> >>> (b33651a5c98dbd5a919219d8c129d0674ef74299).
>
> Thanks for bisecting. Does it work if you revert that commit?

Yes

Thanks --David Kaplan


More information about the Spice-devel mailing list