Drm: mgag200. Video adapter issue with 5.4.0-rc3 ; no graphics

Daniel Vetter daniel at ffwll.ch
Fri Nov 8 19:10:37 UTC 2019


On Fri, Nov 8, 2019 at 7:07 PM John Donnelly <john.p.donnelly at oracle.com> wrote:
>
>
>
> > On Nov 8, 2019, at 9:06 AM, Thomas Zimmermann <tzimmermann at suse.de> wrote:
> >
> > Hi
> >
> > Am 08.11.19 um 13:55 schrieb John Donnelly:
> >>
> >>
> >>> On Nov 8, 2019, at 1:46 AM, Thomas Zimmermann <tzimmermann at suse.de> wrote:
> >>>
> >>> Hi John
> >>>
> >>> Am 07.11.19 um 23:14 schrieb John Donnelly:
> >>>>
> >>>>
> >>>>> On Nov 7, 2019, at 10:13 AM, John Donnelly <john.p.donnelly at oracle.com> wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>>> On Nov 7, 2019, at 7:42 AM, Thomas Zimmermann <tzimmermann at suse.de> wrote:
> >>>>>>
> >>>>>> Hi John
> >>>>>>
> >>>>>> Am 07.11.19 um 14:12 schrieb John Donnelly:
> >>>>>>> Hi  Thomas ;  Thank you for reaching out.
> >>>>>>>
> >>>>>>> See inline:
> >>>>>>>
> >>>>>>>> On Nov 7, 2019, at 1:54 AM, Thomas Zimmermann <tzimmermann at suse.de> wrote:
> >>>>>>>>
> >>>>>>>> Hi John,
> >>>>>>>>
> >>>>>>>> apparently the vgaarb was not the problem.
> >>>>>>>>
> >>>>>>>> Am 07.11.19 um 03:29 schrieb John Donnelly:
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> I am investigating an issue where we lose video activity when the display is switched from from “text mode” to “graphic mode”
> >>>>>>>>> on a number of  servers using this driver.    Specifically  starting the GNOME desktop.
> >>>>>>>>
> >>>>>>>> When you say "text mode", do you mean VGA text mode or the graphical
> >>>>>>>> console that emulates text mode?
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> I call “text mode” the 24x80  ascii mode ;  - NOT GRAPHICS .       Ie : run-level 3;  So I  guess your term for it is VGA.
> >>>>>>
> >>>>>> Yes.
> >>>>>>
> >>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> When you enable graphics mode, does it set the correct resolution? A lot
> >>>>>>>> of work went into memory management recently. I could imagine that the
> >>>>>>>> driver sets the correct resolution, but then fails to display the
> >>>>>>>> correct framebuffer.
> >>>>>>>
> >>>>>>> There is no display at all ;  so there is no resolution  to mention.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>> If possible, could you try to update to the latest drm-tip and attach
> >>>>>>>> the output of
> >>>>>>>>
> >>>>>>>> /sys/kernel/debug/dri/0/vram-mm
> >>>>>>>
> >>>>>>> I don’t see that file ;   Is there something else I need to do ?
> >>>>>>
> >>>>>> That file is fairly new and maybe it's not in the mainline kernel yet.
> >>>>>> See below for how to get it.
> >>>>>
> >>>>> I  built your “tip” ;  Still no graphics displayed .
> >>>>>
> >>>>>
> >>>>> mount -t debugfs none /sys/kernel
> >>>>>
> >>>>> cat /proc/cmdline
> >>>>> BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.4.0-rc6.drm.+ root=/dev/mapper/ol_ca--dev55-root ro crashkernel=auto resume=/dev/mapper/ol_ca--dev55-swap rd.lvm.lv=ol_ca-dev55/root rd.lvm.lv=ol_ca-dev55/swap console=ttyS0,9600,8,n,1 drm.debug=0xff
> >>>>>
> >>>>>
> >>>>> cat  /sys/kernel/dri/0/vram-mm
> >>>>>
> >>>>> In VGA mode :
> >>>>>
> >>>>>
> >>>>> cat  /sys/kernel/dri/0/vram-mm
> >>>>> 0x0000000000000000-0x0000000000000300: 768: used
> >>>>> 0x0000000000000300-0x0000000000000600: 768: used
> >>>>> 0x0000000000000600-0x00000000000007ee: 494: free
> >>>>> 0x00000000000007ee-0x00000000000007ef: 1: used
> >>>>> 0x00000000000007ef-0x00000000000007f0: 1: used
> >>>>>
> >>>>>
> >>>>> In GRAPHICS mode ( if it matters )
> >>>>>
> >>>>>
> >>>>> cat  /sys/kernel/dri/0/vram-mm
> >>>>> 0x0000000000000000-0x0000000000000300: 768: used
> >>>>> 0x0000000000000300-0x0000000000000600: 768: used
> >>>>> 0x0000000000000600-0x00000000000007ee: 494: free
> >>>>> 0x00000000000007ee-0x00000000000007ef: 1: used
> >>>>> 0x00000000000007ef-0x00000000000007f0: 1: used
> >>>>> total: 2032, used 1538 free 494
> >>>>>
> >>>
> >>> This is interesting. In the graphics mode, you see two buffers of 768
> >>> pages each. That's the main framebuffers as used by X (it's double
> >>> buffered). Then there's a free area and finally two pages for cursor
> >>> images (also double buffered). That looks as expected.
> >>>
> >>> The thing is that in text mode, the areas are allocated. But the driver
> >>> shouldn't be active, so the file shouldn't exist or only show a single
> >>> free area.
> >>>
> >>
> >>      If you want me to double check this I will .    I have GNOME installed , but the machine boots to runlevel  3, then I start the desktop using init 5  I am pretty sure I took that output when the machine was in graphic’s mode   at runlevel 5 .
> >>
> >>
> >>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>>
> >>>>>>
> >>>>>>>
> >>>>>>> I’ve attached : var/lib/gdm/.local/share/xorg/Xorg.0.log. ;   instead ;
> >>>>>>
> >>>>>> Good! Looking through that log file, the card is found at line 79 and
> >>>>>> the generic X modesetting driver initializes below. That works as expected.
> >>>>>>
> >>>>>> I notices that several operations are not permitted (lines 78 and 87). I
> >>>>>> guess you're starting X from a regular user account? IIRC special
> >>>>>> permission is required to acquire control of the display. What happens
> >>>>>> if you start X as root user?
> >>>>>
> >>>>>
> >>>>>  I am starting GNOME  as  root by doing  “init 5” from either the console  session or from ssh .
> >>>>>
> >>>>> The default runlevel is 3  on boot .
> >>>>>
> >>>>> On failing session  running  your 5.4.0.rc6.
> >>>>>
> >>>>> 78 [   237.712] xf86EnableIOPorts: failed to set IOPL for I/O (Operation not permitted)
> >>>>>
> >>>>> 87 [   237.712] (EE) open /dev/fb0: Permission denied
> >>>>>
> >>>>> Booting 4.18 kernel yields the same error results in: /var/lib/gdm/.local/share/xorg/Xorg.0.log
> >>>>>
> >>>>> 78 [   101.334] xf86EnableIOPorts: failed to set IOPL for I/O (Operation not permitted)
> >>>>>
> >>>>> 87 [   101.334] (EE) open /dev/fb0: Permission denied
> >>>>>
> >>>>>
> >>>>> What is strange the X logs  ( bad and Ok ) files essentially appear as if GNOME started !
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> <Xorg.0.log.bad><Xorg.0.log.Ok>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>>
> >>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> Here is my cmdline  -  I just tested 5.3.0 and it fails too  ( my last test was 5.3.8 and it failed also ) .
> >>>>>>>
> >>>>>>> # cat /proc/cmdline
> >>>>>>> BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.3.0+ root=/dev/mapper/ol_ca--dev55-root ro crashkernel=auto resume=/dev/mapper/ol_ca--dev55-swap rd.lvm.lv=ol_ca-dev55/root rd.lvm.lv=ol_ca-dev55/swap console=ttyS0,9600,8,n,1 drm.debug=0xff
> >>>>>>>
> >>>>>>> When you say “tip”. - Are you referring to a specific kernel  ?  I can build a  5.4.0.rc6  ;   The problem appears to have been introduced around 5.3 time frame.
> >>>>>>
> >>>>>> The latest and greatest DRM code is in the drm-tip branch at
> >>>>>>
> >>>>>> git://anongit.freedesktop.org/drm/drm-tip
> >>>>>>
> >>>>>> If you build this version you should find
> >>>>>>
> >>>>>> /sys/kernel/debug/dri/0/vram-mm
> >>>>>>
> >>>>>> on the device. You have to build with debugfs enabled and
> >>>>>> maybe have to mount debugfs at /sys/kernel/debug.
> >>>>>>
> >>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>> before and after switching to graphics mode. The file lists the
> >>>>>>>> allocated regions of the VRAM.
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> This adapter is  Server Engines  Integrated Remote Video Acceleration Subsystem (RVAS)  and is used as remote console in iLO/DRAC environments.
> >>>>>>>>>
> >>>>>>>>> I don’t see any specific errors in the gdm logs or message file other than this:
> >>>>>>>>
> >>>>>>>> You can boot with drm.debug=0xff on the kernel command line to enable
> >>>>>>>> more warnings.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Could you please attach the output of lspci -v for the VGA adapter?
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> Here is the output from the current machine; The previous addresses were from another model using the same SE device:
> >>>>>>>
> >>>>>>>
> >>>>>>> Nov  7 04:42:50 ca-dev55 kernel: mgag200 0000:3d:00.0: remove_conflicting_pci_framebuffers: bar 0: 0xc5000000 -> 0xc5ffffff
> >>>>>>> Nov  7 04:42:50 ca-dev55 kernel: mgag200 0000:3d:00.0: remove_conflicting_pci_framebuffers: bar 1: 0xc6810000 -> 0xc6813fff
> >>>>>>> Nov  7 04:42:50 ca-dev55 kernel: mgag200 0000:3d:00.0: remove_conflicting_pci_framebuffers: bar 2: 0xc6000000 -> 0xc67fffff
> >>>>>>> Nov  7 04:42:50 ca-dev55 kernel: mgag200 0000:3d:00.0: vgaarb: deactivate vga console
> >>>>>>>
> >>>>>>>
> >>>>>>> lspci -s 3d:00.0 -vvv -k
> >>>>>>> 3d:00.0 VGA compatible controller: Matrox Electronics Systems Ltd. MGA G200e [Pilot] ServerEngines (SEP1) (rev 05) (prog-if 00 [VGA controller])
> >>>>>>>         Subsystem: Oracle/SUN Device 4852
> >>>>>>>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
> >>>>>>>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> >>>>>>>         Latency: 0, Cache Line Size: 64 bytes
> >>>>>>>         Interrupt: pin A routed to IRQ 16
> >>>>>>>         NUMA node: 0
> >>>>>>>         Region 0: Memory at c5000000 (32-bit, non-prefetchable) [size=16M]
> >>>>>>>         Region 1: Memory at c6810000 (32-bit, non-prefetchable) [size=16K]
> >>>>>>>         Region 2: Memory at c6000000 (32-bit, non-prefetchable) [size=8M]
> >>>>>>>         Expansion ROM at 000c0000 [disabled] [size=128K]
> >>>>>>>         Capabilities: [dc] Power Management version 2
> >>>>>>>                 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
> >>>>>>>                 Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
> >>>>>>>         Capabilities: [e4] Express (v1) Legacy Endpoint, MSI 00
> >>>>>>>                 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
> >>>>>>>                         ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
> >>>>>>>                 DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported-
> >>>>>>>                         RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> >>>>>>>                         MaxPayload 128 bytes, MaxReadReq 128 bytes
> >>>>>>>                 DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
> >>>>>>>                 LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Exit Latency L0s <64ns
> >>>>>>>                         ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
> >>>>>>>                 LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
> >>>>>>>                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> >>>>>>>                 LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> >>>>>>>         Capabilities: [54] MSI: Enable- Count=1/1 Maskable- 64bit-
> >>>>>>>                 Address: 00000000  Data: 0000
> >>>>>>>         Kernel driver in use: mgag200
> >>>>>>>         Kernel modules: mgag200
> >>>>>>
> >>>>>> Looks all normal.
> >>>>>>
> >>>>>> Best regards
> >>>>>> Thomas
> >>>>>>
> >>>>
> >>>> ==============  Snip  ===========
> >>>>
> >>>>
> >>>> Hi Thomas
> >>>> ,
> >>>> I hopefully narrowed down the breakage between these up-stream commits,  which is v5.2 and 5.3.0-rc1:
> >>>>
> >>>>
> >>>> between :  0ecfebd2b524 2019-07-07 | Linux 5.2      to :   5f9e832c1370 2019-07-21 | Linus 5.3-rc1
> >>>>
> >>>>
> >>>> I started to bisect this range on by date, by day ,  based on the changes done in :
> >>>>
> >>>> drivers/gpu/drm/
> >>>>
> >>>> fec88ab0af97 2019-07-14 | Merge tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma  ;  works
> >>>>
> >>>> Hopefully something in drivers/gpu/drm/ between the date range of 2019-07-14 to 2019-07-21 will surface tomorrow.
> >>>
> >>> Great, thanks for bisecting.
> >>>
> >>> Could you attach your kernel config file? I'd like to compare with my
> >>> config and try to reproduce the issue.
> >>>
> >>> Best regards
> >>> Thomas
> >>
> >>  Hi.
> >>
> >>  Here are config files generated after a “ make oldconfig “     that started with an original .config file from a master file  we use for 5.4.0.-rc4. :
> >>
> >>     config.5.2.21 -  work with that flavor
> >>    config.5.3.   fails with 5.3 and later.
> >>
> >>  Do you have access to mgag200 style adapter ?
> >
> > I do.
> >
> > I think I've been able to reproduce the issue. Buffers seem to remain in
> > video ram after they have been pinned there. I'll investigate next week.
> > I hope your bisecting session can point to the cause.
> >
> > Best regards
> > Thomas
>
> Hi Thomas,
>
>
>  Wonderful!
>
>  I think I have narrowed down the merge to this build which is : vmlinuz-5.2.0-rc5+ :
>
>
> be8454afc50f 2019-07-15 | Merge tag 'drm-next-2019-07-16' of git://anongit.freedesktop.org/drm/drm

Are you bisecting by hand or is git bisect somehow giving you all
these merge commits by chance? Ime always use git bisect, it's a lot
better at accurately splitting the history down the middle. Also, I
never bother with filtering for only "relevant" commits, since drm is
so big nowadays that all you safe is 2-3 commits at most. And worst
case the regression is outside of drm, and then you wasted booting
into a _lot_ of kernels for not much gain.
-Daniel

>
>   Specifically this merge included these two changes :
>
>   94dc57b10399 2019-06-13 | drm/mgag200: Rewrite cursor handling
>   f4ce5af71bc2 2019-06-13 | drm/mgag200: Pin framebuffer BO during dirty update
>
>
> I  tried reverting them and the resultant driver  doesn’t build afterwards due to drm calls.
>
> If I build a kernel from :
>
> fec88ab0af97 2019-07-14 | Merge tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
>
> That is posted  day prior to  be8454afc50f - the GNOME desktop works.
>
>
>
>
>
>
> _______________________________________________
> dri-devel mailing list
> dri-devel at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel



-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch


More information about the dri-devel mailing list