Drm: mgag200. Video adapter issue with 5.4.0-rc3 ; no graphics

Daniel Vetter daniel at ffwll.ch
Tue Nov 12 20:14:45 UTC 2019


On Tue, Nov 12, 2019 at 8:13 PM John Donnelly
<john.p.donnelly at oracle.com> wrote:
>
>
>
> > On Nov 11, 2019, at 9:57 AM, Thomas Zimmermann <tzimmermann at suse.de> wrote:
> >
> > Hi John
> >
> > Am 08.11.19 um 19:07 schrieb John Donnelly:
> >>
> >>
> >>> On Nov 8, 2019, at 9:06 AM, Thomas Zimmermann <tzimmermann at suse.de> wrote:
> >>>
> >>> Hi
> >>>
> >>> Am 08.11.19 um 13:55 schrieb John Donnelly:
> >>>>
> >>>>
> >>>>> On Nov 8, 2019, at 1:46 AM, Thomas Zimmermann <tzimmermann at suse.de> wrote:
> >>>>>
> >>>>> Hi John
> >>>>>
> >>>>> Am 07.11.19 um 23:14 schrieb John Donnelly:
> >>>>>>
> >>>>>>
> >>>>>>> On Nov 7, 2019, at 10:13 AM, John Donnelly <john.p.donnelly at oracle.com> wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> On Nov 7, 2019, at 7:42 AM, Thomas Zimmermann <tzimmermann at suse.de> wrote:
> >>>>>>>>
> >>>>>>>> Hi John
> >>>>>>>>
> >>>>>>>> Am 07.11.19 um 14:12 schrieb John Donnelly:
> >>>>>>>>> Hi  Thomas ;  Thank you for reaching out.
> >>>>>>>>>
> >>>>>>>>> See inline:
> >>>>>>>>>
> >>>>>>>>>> On Nov 7, 2019, at 1:54 AM, Thomas Zimmermann <tzimmermann at suse.de> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Hi John,
> >>>>>>>>>>
> >>>>>>>>>> apparently the vgaarb was not the problem.
> >>>>>>>>>>
> >>>>>>>>>> Am 07.11.19 um 03:29 schrieb John Donnelly:
> >>>>>>>>>>> Hi,
> >>>>>>>>>>>
> >>>>>>>>>>> I am investigating an issue where we lose video activity when the display is switched from from “text mode” to “graphic mode”
> >>>>>>>>>>> on a number of  servers using this driver.    Specifically  starting the GNOME desktop.
> >>>>>>>>>>
> >>>>>>>>>> When you say "text mode", do you mean VGA text mode or the graphical
> >>>>>>>>>> console that emulates text mode?
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I call “text mode” the 24x80  ascii mode ;  - NOT GRAPHICS .       Ie : run-level 3;  So I  guess your term for it is VGA.
> >>>>>>>>
> >>>>>>>> Yes.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>> When you enable graphics mode, does it set the correct resolution? A lot
> >>>>>>>>>> of work went into memory management recently. I could imagine that the
> >>>>>>>>>> driver sets the correct resolution, but then fails to display the
> >>>>>>>>>> correct framebuffer.
> >>>>>>>>>
> >>>>>>>>> There is no display at all ;  so there is no resolution  to mention.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> If possible, could you try to update to the latest drm-tip and attach
> >>>>>>>>>> the output of
> >>>>>>>>>>
> >>>>>>>>>> /sys/kernel/debug/dri/0/vram-mm
> >>>>>>>>>
> >>>>>>>>> I don’t see that file ;   Is there something else I need to do ?
> >>>>>>>>
> >>>>>>>> That file is fairly new and maybe it's not in the mainline kernel yet.
> >>>>>>>> See below for how to get it.
> >>>>>>>
> >>>>>>> I  built your “tip” ;  Still no graphics displayed .
> >>>>>>>
> >>>>>>>
> >>>>>>> mount -t debugfs none /sys/kernel
> >>>>>>>
> >>>>>>> cat /proc/cmdline
> >>>>>>> BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.4.0-rc6.drm.+ root=/dev/mapper/ol_ca--dev55-root ro crashkernel=auto resume=/dev/mapper/ol_ca--dev55-swap rd.lvm.lv=ol_ca-dev55/root rd.lvm.lv=ol_ca-dev55/swap console=ttyS0,9600,8,n,1 drm.debug=0xff
> >>>>>>>
> >>>>>>>
> >>>>>>> cat  /sys/kernel/dri/0/vram-mm
> >>>>>>>
> >>>>>>> In VGA mode :
> >>>>>>>
> >>>>>>>
> >>>>>>> cat  /sys/kernel/dri/0/vram-mm
> >>>>>>> 0x0000000000000000-0x0000000000000300: 768: used
> >>>>>>> 0x0000000000000300-0x0000000000000600: 768: used
> >>>>>>> 0x0000000000000600-0x00000000000007ee: 494: free
> >>>>>>> 0x00000000000007ee-0x00000000000007ef: 1: used
> >>>>>>> 0x00000000000007ef-0x00000000000007f0: 1: used
> >>>>>>>
> >>>>>>>
> >>>>>>> In GRAPHICS mode ( if it matters )
> >>>>>>>
> >>>>>>>
> >>>>>>> cat  /sys/kernel/dri/0/vram-mm
> >>>>>>> 0x0000000000000000-0x0000000000000300: 768: used
> >>>>>>> 0x0000000000000300-0x0000000000000600: 768: used
> >>>>>>> 0x0000000000000600-0x00000000000007ee: 494: free
> >>>>>>> 0x00000000000007ee-0x00000000000007ef: 1: used
> >>>>>>> 0x00000000000007ef-0x00000000000007f0: 1: used
> >>>>>>> total: 2032, used 1538 free 494
> >>>>>>>
> >>>>>
> >>>>> This is interesting. In the graphics mode, you see two buffers of 768
> >>>>> pages each. That's the main framebuffers as used by X (it's double
> >>>>> buffered). Then there's a free area and finally two pages for cursor
> >>>>> images (also double buffered). That looks as expected.
> >>>>>
> >>>>> The thing is that in text mode, the areas are allocated. But the driver
> >>>>> shouldn't be active, so the file shouldn't exist or only show a single
> >>>>> free area.
> >>>>>
> >>>>
> >>>>     If you want me to double check this I will .    I have GNOME installed , but the machine boots to runlevel  3, then I start the desktop using init 5  I am pretty sure I took that output when the machine was in graphic’s mode   at runlevel 5 .
> >>>>
> >>>>
> >>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> I’ve attached : var/lib/gdm/.local/share/xorg/Xorg.0.log. ;   instead ;
> >>>>>>>>
> >>>>>>>> Good! Looking through that log file, the card is found at line 79 and
> >>>>>>>> the generic X modesetting driver initializes below. That works as expected.
> >>>>>>>>
> >>>>>>>> I notices that several operations are not permitted (lines 78 and 87). I
> >>>>>>>> guess you're starting X from a regular user account? IIRC special
> >>>>>>>> permission is required to acquire control of the display. What happens
> >>>>>>>> if you start X as root user?
> >>>>>>>
> >>>>>>>
> >>>>>>> I am starting GNOME  as  root by doing  “init 5” from either the console  session or from ssh .
> >>>>>>>
> >>>>>>> The default runlevel is 3  on boot .
> >>>>>>>
> >>>>>>> On failing session  running  your 5.4.0.rc6.
> >>>>>>>
> >>>>>>> 78 [   237.712] xf86EnableIOPorts: failed to set IOPL for I/O (Operation not permitted)
> >>>>>>>
> >>>>>>> 87 [   237.712] (EE) open /dev/fb0: Permission denied
> >>>>>>>
> >>>>>>> Booting 4.18 kernel yields the same error results in: /var/lib/gdm/.local/share/xorg/Xorg.0.log
> >>>>>>>
> >>>>>>> 78 [   101.334] xf86EnableIOPorts: failed to set IOPL for I/O (Operation not permitted)
> >>>>>>>
> >>>>>>> 87 [   101.334] (EE) open /dev/fb0: Permission denied
> >>>>>>>
> >>>>>>>
> >>>>>>> What is strange the X logs  ( bad and Ok ) files essentially appear as if GNOME started !
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> <Xorg.0.log.bad><Xorg.0.log.Ok>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Here is my cmdline  -  I just tested 5.3.0 and it fails too  ( my last test was 5.3.8 and it failed also ) .
> >>>>>>>>>
> >>>>>>>>> # cat /proc/cmdline
> >>>>>>>>> BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.3.0+ root=/dev/mapper/ol_ca--dev55-root ro crashkernel=auto resume=/dev/mapper/ol_ca--dev55-swap rd.lvm.lv=ol_ca-dev55/root rd.lvm.lv=ol_ca-dev55/swap console=ttyS0,9600,8,n,1 drm.debug=0xff
> >>>>>>>>>
> >>>>>>>>> When you say “tip”. - Are you referring to a specific kernel  ?  I can build a  5.4.0.rc6  ;   The problem appears to have been introduced around 5.3 time frame.
> >>>>>>>>
> >>>>>>>> The latest and greatest DRM code is in the drm-tip branch at
> >>>>>>>>
> >>>>>>>> git://anongit.freedesktop.org/drm/drm-tip
> >>>>>>>>
> >>>>>>>> If you build this version you should find
> >>>>>>>>
> >>>>>>>> /sys/kernel/debug/dri/0/vram-mm
> >>>>>>>>
> >>>>>>>> on the device. You have to build with debugfs enabled and
> >>>>>>>> maybe have to mount debugfs at /sys/kernel/debug.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> before and after switching to graphics mode. The file lists the
> >>>>>>>>>> allocated regions of the VRAM.
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> This adapter is  Server Engines  Integrated Remote Video Acceleration Subsystem (RVAS)  and is used as remote console in iLO/DRAC environments.
> >>>>>>>>>>>
> >>>>>>>>>>> I don’t see any specific errors in the gdm logs or message file other than this:
> >>>>>>>>>>
> >>>>>>>>>> You can boot with drm.debug=0xff on the kernel command line to enable
> >>>>>>>>>> more warnings.
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Could you please attach the output of lspci -v for the VGA adapter?
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Here is the output from the current machine; The previous addresses were from another model using the same SE device:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Nov  7 04:42:50 ca-dev55 kernel: mgag200 0000:3d:00.0: remove_conflicting_pci_framebuffers: bar 0: 0xc5000000 -> 0xc5ffffff
> >>>>>>>>> Nov  7 04:42:50 ca-dev55 kernel: mgag200 0000:3d:00.0: remove_conflicting_pci_framebuffers: bar 1: 0xc6810000 -> 0xc6813fff
> >>>>>>>>> Nov  7 04:42:50 ca-dev55 kernel: mgag200 0000:3d:00.0: remove_conflicting_pci_framebuffers: bar 2: 0xc6000000 -> 0xc67fffff
> >>>>>>>>> Nov  7 04:42:50 ca-dev55 kernel: mgag200 0000:3d:00.0: vgaarb: deactivate vga console
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> lspci -s 3d:00.0 -vvv -k
> >>>>>>>>> 3d:00.0 VGA compatible controller: Matrox Electronics Systems Ltd. MGA G200e [Pilot] ServerEngines (SEP1) (rev 05) (prog-if 00 [VGA controller])
> >>>>>>>>>       Subsystem: Oracle/SUN Device 4852
> >>>>>>>>>       Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
> >>>>>>>>>       Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
> >>>>>>>>>       Latency: 0, Cache Line Size: 64 bytes
> >>>>>>>>>       Interrupt: pin A routed to IRQ 16
> >>>>>>>>>       NUMA node: 0
> >>>>>>>>>       Region 0: Memory at c5000000 (32-bit, non-prefetchable) [size=16M]
> >>>>>>>>>       Region 1: Memory at c6810000 (32-bit, non-prefetchable) [size=16K]
> >>>>>>>>>       Region 2: Memory at c6000000 (32-bit, non-prefetchable) [size=8M]
> >>>>>>>>>       Expansion ROM at 000c0000 [disabled] [size=128K]
> >>>>>>>>>       Capabilities: [dc] Power Management version 2
> >>>>>>>>>               Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
> >>>>>>>>>               Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
> >>>>>>>>>       Capabilities: [e4] Express (v1) Legacy Endpoint, MSI 00
> >>>>>>>>>               DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
> >>>>>>>>>                       ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
> >>>>>>>>>               DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported-
> >>>>>>>>>                       RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
> >>>>>>>>>                       MaxPayload 128 bytes, MaxReadReq 128 bytes
> >>>>>>>>>               DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
> >>>>>>>>>               LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Exit Latency L0s <64ns
> >>>>>>>>>                       ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
> >>>>>>>>>               LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
> >>>>>>>>>                       ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> >>>>>>>>>               LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> >>>>>>>>>       Capabilities: [54] MSI: Enable- Count=1/1 Maskable- 64bit-
> >>>>>>>>>               Address: 00000000  Data: 0000
> >>>>>>>>>       Kernel driver in use: mgag200
> >>>>>>>>>       Kernel modules: mgag200
> >>>>>>>>
> >>>>>>>> Looks all normal.
> >>>>>>>>
> >>>>>>>> Best regards
> >>>>>>>> Thomas
> >>>>>>>>
> >>>>>>
> >>>>>> ==============  Snip  ===========
> >>>>>>
> >>>>>>
> >>>>>> Hi Thomas
> >>>>>> ,
> >>>>>> I hopefully narrowed down the breakage between these up-stream commits,  which is v5.2 and 5.3.0-rc1:
> >>>>>>
> >>>>>>
> >>>>>> between :  0ecfebd2b524 2019-07-07 | Linux 5.2      to :   5f9e832c1370 2019-07-21 | Linus 5.3-rc1
> >>>>>>
> >>>>>>
> >>>>>> I started to bisect this range on by date, by day ,  based on the changes done in :
> >>>>>>
> >>>>>> drivers/gpu/drm/
> >>>>>>
> >>>>>> fec88ab0af97 2019-07-14 | Merge tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma  ;  works
> >>>>>>
> >>>>>> Hopefully something in drivers/gpu/drm/ between the date range of 2019-07-14 to 2019-07-21 will surface tomorrow.
> >>>>>
> >>>>> Great, thanks for bisecting.
> >>>>>
> >>>>> Could you attach your kernel config file? I'd like to compare with my
> >>>>> config and try to reproduce the issue.
> >>>>>
> >>>>> Best regards
> >>>>> Thomas
> >>>>
> >>>> Hi.
> >>>>
> >>>> Here are config files generated after a “ make oldconfig “     that started with an original .config file from a master file  we use for 5.4.0.-rc4. :
> >>>>
> >>>>    config.5.2.21 -  work with that flavor
> >>>>   config.5.3.   fails with 5.3 and later.
> >>>>
> >>>> Do you have access to mgag200 style adapter ?
> >>>
> >>> I do.
> >>>
> >>> I think I've been able to reproduce the issue. Buffers seem to remain in
> >>> video ram after they have been pinned there. I'll investigate next week.
> >>> I hope your bisecting session can point to the cause.
> >>>
> >>> Best regards
> >>> Thomas
> >>
> >> Hi Thomas,
> >>
> >>
> >> Wonderful!
> >>
> >> I think I have narrowed down the merge to this build which is : vmlinuz-5.2.0-rc5+ :
> >>
> >>
> >> be8454afc50f 2019-07-15 | Merge tag 'drm-next-2019-07-16' of git://anongit.freedesktop.org/drm/drm
> >>
> >>  Specifically this merge included these two changes :
> >>
> >>  94dc57b10399 2019-06-13 | drm/mgag200: Rewrite cursor handling
> >>  f4ce5af71bc2 2019-06-13 | drm/mgag200: Pin framebuffer BO during dirty update
> >>
> >>
> >> I  tried reverting them and the resultant driver  doesn’t build afterwards due to drm calls.
> >>
> >> If I build a kernel from :
> >>
> >> fec88ab0af97 2019-07-14 | Merge tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
> >>
> >> That is posted  day prior to  be8454afc50f - the GNOME desktop works.
> >
> > I thought I could reproduce the problem, but I'm not so sure now.
> >
> > Please bisect the range between the two merges as described by Daniel to
> > find the broken commit. Doing
> >
> >  git bisect start
> >  git bisect bad be8454afc50f
> >  git bisect good fec88ab0af97
> >
> > should start the session.
>
>
> In short .  I started   with :
>
> git bisect start
>
> git bisect bad be8454afc50f
>
>  git bisect good fec88ab0af97
>
> And at the  the end of   bisects  showed this was the offending commit :
>
> c0a74c732568
>
> commit c0a74c732568ad347f7b3de281922808dab30504 (refs/bisect/bad)
> Author: Jani Nikula <jani.nikula at intel.com>
> Date:   Fri May 24 20:35:22 2019 +0300
>
>     drm/i915: Update DRIVER_DATE to 20190524
>
>     Signed-off-by: Jani Nikula <jani.nikula at intel.com>
>
> That does not have any real relevance
>
>
> I am not sure if I did  the  bisects correctly .   After each test I did :
>
>
> #1  git bisect bad 827440a90146
>
> #2  git bisect bad f5b07b04e5f0
>
> #3  git bisect bad c0a74c732568
>
> #4  git bisect good 818f5cb3e8fb
>
> #5  git bisect good 6cfe7ec02e85
>
> #6 git bisect good f71e01a78bee
>
> #7  git bisect good 09a93ef3d60f
>
> #8  git bisect good f1e6b336bafa
>
> #9 git bisect good eaf20e6933dc
>
> #10  git bisect good 63e8dcdb4f8e
>
> #11  git bisect good 397049a03022
>
> I’ve restarted the bisect without appending the  <commit-id> after a  the “bad|good “  ,  and so far git  is showing the same selections.

Well you're saying that c0a74c732568 is bad and that
397049a03022702defa65694c23 (its immediate ancestor) is good, so
clearly c0a74 is the bad commit per your test results. If this is
correct (please retest to make sure) then git bisect is pointing at
the right commit. If not, then you did a mixup somewhere in your
testing. It could also be that there's a timing change, but given that
the bisected commit has no real code change that should be impossible.

btw for testing it's good to enable CONFIG_LOCALVERSION_AUTO, then you
can double-check the sha1 of the commit you're testing before running
git bisect good/bad.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch


More information about the dri-devel mailing list