nouveau PUSHBUFFER_ERR on 5.9.0-rc2-next-20200824

Alexander Kapshuk alexander.kapshuk at gmail.com
Mon Aug 24 19:08:25 UTC 2020


Since upgrading to linux-next based on 5.9.0-rc1 and 5.9.0-rc2 I have
had my mouse pointer disappear soon after logging in, and I have
observed the system freezing temporarily when clicking on objects and
when typing text.
I have also found records of push buffer errors in dmesg output:
[ 6625.450394] nouveau 0000:01:00.0: disp: ERROR 1 [PUSHBUFFER_ERR] 02
[] chid 0 mthd 0000 data 00000400

I tried setting CONFIG_NOUVEAU_DEBUG=5 (tracing) to try and collect
further debug info, but nothing caught the eye.

The error message in question comes from nv50_disp_intr_error in
drivers/gpu/drm/nouveau/nvkm/engine/disp/nv50.c:613,645.
And nv50_disp_intr_error is called from nv50_disp_intr in the
following while block:
drivers/gpu/drm/nouveau/nvkm/engine/disp/nv50.c:647,658
void
nv50_disp_intr(struct nv50_disp *disp)
{
        struct nvkm_device *device = disp->base.engine.subdev.device;
        u32 intr0 = nvkm_rd32(device, 0x610020);
        u32 intr1 = nvkm_rd32(device, 0x610024);

        while (intr0 & 0x001f0000) {
                u32 chid = __ffs(intr0 & 0x001f0000) - 16;
                nv50_disp_intr_error(disp, chid);
                intr0 &= ~(0x00010000 << chid);
        }
...
}

Could this be in any way related to this series of commits?
commit 0a96099691c8cd1ac0744ef30b6846869dc2b566
Author: Ben Skeggs <bskeggs at redhat.com>
Date:   Tue Jul 21 11:34:07 2020 +1000

    drm/nouveau/kms/nv50-: implement proper push buffer control logic

    We had a, what was supposed to be temporary, hack in the KMS code where we'd
    completely drain an EVO/NVD channel's push buffer when wrapping to the start
    again, instead of treating it as a ring buffer.

    Let's fix that, finally.

    Signed-off-by: Ben Skeggs <bskeggs at redhat.com>

Here are my GPU details:
01:00.0 VGA compatible controller: NVIDIA Corporation GT216 [GeForce
210] (rev a1)
        Subsystem: Micro-Star International Co., Ltd. [MSI] Device 8a93
        Kernel driver in use: nouveau

The last linux-next kernel I built where the problem reported does not
manifest itself is 5.8.0-rc6-next-20200720.

I would appreciate being given any pointers on how to further debug this.
Or is git bisect the only way to proceed with this?

Thanks.


More information about the dri-devel mailing list