Drm: mgag200. Video adapter issue with 5.4.0-rc3 ; no graphics

Thomas Zimmermann tzimmermann at suse.de
Tue Nov 12 10:02:33 UTC 2019


Hi

just a few more comments.

Am 11.11.19 um 18:40 schrieb John Donnelly:
> On 11/11/19 9:57 AM, Thomas Zimmermann wrote:
>> Hi John
>>
>> Am 08.11.19 um 19:07 schrieb John Donnelly:
>>>
>>>
>>>> On Nov 8, 2019, at 9:06 AM, Thomas Zimmermann <tzimmermann at suse.de>
>>>> wrote:
>>>>
>>>> Hi
>>>>
>>>> Am 08.11.19 um 13:55 schrieb John Donnelly:
>>>>>
>>>>>
>>>>>> On Nov 8, 2019, at 1:46 AM, Thomas Zimmermann
>>>>>> <tzimmermann at suse.de> wrote:
>>>>>>
>>>>>> Hi John
>>>>>>
>>>>>> Am 07.11.19 um 23:14 schrieb John Donnelly:
>>>>>>>
>>>>>>>
>>>>>>>> On Nov 7, 2019, at 10:13 AM, John Donnelly
>>>>>>>> <john.p.donnelly at oracle.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> On Nov 7, 2019, at 7:42 AM, Thomas Zimmermann
>>>>>>>>> <tzimmermann at suse.de> wrote:
>>>>>>>>>
>>>>>>>>> Hi John
>>>>>>>>>
>>>>>>>>> Am 07.11.19 um 14:12 schrieb John Donnelly:
>>>>>>>>>> Hi  Thomas ;  Thank you for reaching out.
>>>>>>>>>>
>>>>>>>>>> See inline:
>>>>>>>>>>
>>>>>>>>>>> On Nov 7, 2019, at 1:54 AM, Thomas Zimmermann
>>>>>>>>>>> <tzimmermann at suse.de> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi John,
>>>>>>>>>>>
>>>>>>>>>>> apparently the vgaarb was not the problem.
>>>>>>>>>>>
>>>>>>>>>>> Am 07.11.19 um 03:29 schrieb John Donnelly:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> I am investigating an issue where we lose video activity
>>>>>>>>>>>> when the display is switched from from “text mode” to
>>>>>>>>>>>> “graphic mode”
>>>>>>>>>>>> on a number of  servers using this driver.    Specifically 
>>>>>>>>>>>> starting the GNOME desktop.
>>>>>>>>>>>
>>>>>>>>>>> When you say "text mode", do you mean VGA text mode or the
>>>>>>>>>>> graphical
>>>>>>>>>>> console that emulates text mode?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I call “text mode” the 24x80  ascii mode ;  - NOT GRAPHICS
>>>>>>>>>> .       Ie : run-level 3;  So I  guess your term for it is VGA.
>>>>>>>>>
>>>>>>>>> Yes.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> When you enable graphics mode, does it set the correct
>>>>>>>>>>> resolution? A lot
>>>>>>>>>>> of work went into memory management recently. I could imagine
>>>>>>>>>>> that the
>>>>>>>>>>> driver sets the correct resolution, but then fails to display
>>>>>>>>>>> the
>>>>>>>>>>> correct framebuffer.
>>>>>>>>>>
>>>>>>>>>> There is no display at all ;  so there is no resolution  to
>>>>>>>>>> mention.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> If possible, could you try to update to the latest drm-tip
>>>>>>>>>>> and attach
>>>>>>>>>>> the output of
>>>>>>>>>>>
>>>>>>>>>>> /sys/kernel/debug/dri/0/vram-mm
>>>>>>>>>>
>>>>>>>>>> I don’t see that file ;   Is there something else I need to do ?
>>>>>>>>>
>>>>>>>>> That file is fairly new and maybe it's not in the mainline
>>>>>>>>> kernel yet.
>>>>>>>>> See below for how to get it.
>>>>>>>>
>>>>>>>> I  built your “tip” ;  Still no graphics displayed .
>>>>>>>>
>>>>>>>>
>>>>>>>> mount -t debugfs none /sys/kernel
>>>>>>>>
>>>>>>>> cat /proc/cmdline
>>>>>>>> BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.4.0-rc6.drm.+
>>>>>>>> root=/dev/mapper/ol_ca--dev55-root ro crashkernel=auto
>>>>>>>> resume=/dev/mapper/ol_ca--dev55-swap rd.lvm.lv=ol_ca-dev55/root
>>>>>>>> rd.lvm.lv=ol_ca-dev55/swap console=ttyS0,9600,8,n,1 drm.debug=0xff
>>>>>>>>
>>>>>>>>
>>>>>>>> cat  /sys/kernel/dri/0/vram-mm
>>>>>>>>
>>>>>>>> In VGA mode :
>>>>>>>>
>>>>>>>>
>>>>>>>> cat  /sys/kernel/dri/0/vram-mm
>>>>>>>> 0x0000000000000000-0x0000000000000300: 768: used
>>>>>>>> 0x0000000000000300-0x0000000000000600: 768: used
>>>>>>>> 0x0000000000000600-0x00000000000007ee: 494: free
>>>>>>>> 0x00000000000007ee-0x00000000000007ef: 1: used
>>>>>>>> 0x00000000000007ef-0x00000000000007f0: 1: used
>>>>>>>>
>>>>>>>>
>>>>>>>> In GRAPHICS mode ( if it matters )
>>>>>>>>
>>>>>>>>
>>>>>>>> cat  /sys/kernel/dri/0/vram-mm
>>>>>>>> 0x0000000000000000-0x0000000000000300: 768: used
>>>>>>>> 0x0000000000000300-0x0000000000000600: 768: used
>>>>>>>> 0x0000000000000600-0x00000000000007ee: 494: free
>>>>>>>> 0x00000000000007ee-0x00000000000007ef: 1: used
>>>>>>>> 0x00000000000007ef-0x00000000000007f0: 1: used
>>>>>>>> total: 2032, used 1538 free 494

Reconsidering this output, it actually makes sense. X11 only allocates a
single framebuffer and uses an additional shadow buffer for its
rendering. So the memory map is OK.

I'm having some problems with running Gnome 3.34 (3.32 is fine), which
makes it hard to distinguish Gnome errors from driver errors. I guess
I'm back to step 1. :(

Best regards
Thomas

>>>>>>>>
>>>>>>
>>>>>> This is interesting. In the graphics mode, you see two buffers of 768
>>>>>> pages each. That's the main framebuffers as used by X (it's double
>>>>>> buffered). Then there's a free area and finally two pages for cursor
>>>>>> images (also double buffered). That looks as expected.
>>>>>>
>>>>>> The thing is that in text mode, the areas are allocated. But the
>>>>>> driver
>>>>>> shouldn't be active, so the file shouldn't exist or only show a
>>>>>> single
>>>>>> free area.
>>>>>>
>>>>>
>>>>>       If you want me to double check this I will .    I have GNOME
>>>>> installed , but the machine boots to runlevel  3, then I start the
>>>>> desktop using init 5  I am pretty sure I took that output when the
>>>>> machine was in graphic’s mode   at runlevel 5 .
>>>>>
>>>>>
>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I’ve attached : var/lib/gdm/.local/share/xorg/Xorg.0.log. ;  
>>>>>>>>>> instead ;
>>>>>>>>>
>>>>>>>>> Good! Looking through that log file, the card is found at line
>>>>>>>>> 79 and
>>>>>>>>> the generic X modesetting driver initializes below. That works
>>>>>>>>> as expected.
>>>>>>>>>
>>>>>>>>> I notices that several operations are not permitted (lines 78
>>>>>>>>> and 87). I
>>>>>>>>> guess you're starting X from a regular user account? IIRC special
>>>>>>>>> permission is required to acquire control of the display. What
>>>>>>>>> happens
>>>>>>>>> if you start X as root user?
>>>>>>>>
>>>>>>>>
>>>>>>>>   I am starting GNOME  as  root by doing  “init 5” from either
>>>>>>>> the console  session or from ssh .
>>>>>>>>
>>>>>>>> The default runlevel is 3  on boot .
>>>>>>>>
>>>>>>>> On failing session  running  your 5.4.0.rc6.
>>>>>>>>
>>>>>>>> 78 [   237.712] xf86EnableIOPorts: failed to set IOPL for I/O
>>>>>>>> (Operation not permitted)
>>>>>>>>
>>>>>>>> 87 [   237.712] (EE) open /dev/fb0: Permission denied
>>>>>>>>
>>>>>>>> Booting 4.18 kernel yields the same error results in:
>>>>>>>> /var/lib/gdm/.local/share/xorg/Xorg.0.log
>>>>>>>>
>>>>>>>> 78 [   101.334] xf86EnableIOPorts: failed to set IOPL for I/O
>>>>>>>> (Operation not permitted)
>>>>>>>>
>>>>>>>> 87 [   101.334] (EE) open /dev/fb0: Permission denied
>>>>>>>>
>>>>>>>>
>>>>>>>> What is strange the X logs  ( bad and Ok ) files essentially
>>>>>>>> appear as if GNOME started !
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> <Xorg.0.log.bad><Xorg.0.log.Ok>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Here is my cmdline  -  I just tested 5.3.0 and it fails too  (
>>>>>>>>>> my last test was 5.3.8 and it failed also ) .
>>>>>>>>>>
>>>>>>>>>> # cat /proc/cmdline
>>>>>>>>>> BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.3.0+
>>>>>>>>>> root=/dev/mapper/ol_ca--dev55-root ro crashkernel=auto
>>>>>>>>>> resume=/dev/mapper/ol_ca--dev55-swap
>>>>>>>>>> rd.lvm.lv=ol_ca-dev55/root rd.lvm.lv=ol_ca-dev55/swap
>>>>>>>>>> console=ttyS0,9600,8,n,1 drm.debug=0xff
>>>>>>>>>>
>>>>>>>>>> When you say “tip”. - Are you referring to a specific kernel 
>>>>>>>>>> ?  I can build a  5.4.0.rc6  ;   The problem appears to have
>>>>>>>>>> been introduced around 5.3 time frame.
>>>>>>>>>
>>>>>>>>> The latest and greatest DRM code is in the drm-tip branch at
>>>>>>>>>
>>>>>>>>> git://anongit.freedesktop.org/drm/drm-tip
>>>>>>>>>
>>>>>>>>> If you build this version you should find
>>>>>>>>>
>>>>>>>>> /sys/kernel/debug/dri/0/vram-mm
>>>>>>>>>
>>>>>>>>> on the device. You have to build with debugfs enabled and
>>>>>>>>> maybe have to mount debugfs at /sys/kernel/debug.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> before and after switching to graphics mode. The file lists the
>>>>>>>>>>> allocated regions of the VRAM.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> This adapter is  Server Engines  Integrated Remote Video
>>>>>>>>>>>> Acceleration Subsystem (RVAS)  and is used as remote console
>>>>>>>>>>>> in iLO/DRAC environments.
>>>>>>>>>>>>
>>>>>>>>>>>> I don’t see any specific errors in the gdm logs or message
>>>>>>>>>>>> file other than this:
>>>>>>>>>>>
>>>>>>>>>>> You can boot with drm.debug=0xff on the kernel command line
>>>>>>>>>>> to enable
>>>>>>>>>>> more warnings.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Could you please attach the output of lspci -v for the VGA
>>>>>>>>>>> adapter?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Here is the output from the current machine; The previous
>>>>>>>>>> addresses were from another model using the same SE device:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Nov  7 04:42:50 ca-dev55 kernel: mgag200 0000:3d:00.0:
>>>>>>>>>> remove_conflicting_pci_framebuffers: bar 0: 0xc5000000 ->
>>>>>>>>>> 0xc5ffffff
>>>>>>>>>> Nov  7 04:42:50 ca-dev55 kernel: mgag200 0000:3d:00.0:
>>>>>>>>>> remove_conflicting_pci_framebuffers: bar 1: 0xc6810000 ->
>>>>>>>>>> 0xc6813fff
>>>>>>>>>> Nov  7 04:42:50 ca-dev55 kernel: mgag200 0000:3d:00.0:
>>>>>>>>>> remove_conflicting_pci_framebuffers: bar 2: 0xc6000000 ->
>>>>>>>>>> 0xc67fffff
>>>>>>>>>> Nov  7 04:42:50 ca-dev55 kernel: mgag200 0000:3d:00.0: vgaarb:
>>>>>>>>>> deactivate vga console
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> lspci -s 3d:00.0 -vvv -k
>>>>>>>>>> 3d:00.0 VGA compatible controller: Matrox Electronics Systems
>>>>>>>>>> Ltd. MGA G200e [Pilot] ServerEngines (SEP1) (rev 05) (prog-if
>>>>>>>>>> 00 [VGA controller])
>>>>>>>>>>     Subsystem: Oracle/SUN Device 4852
>>>>>>>>>>     Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV-
>>>>>>>>>> VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
>>>>>>>>>>     Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast
>>>>>>>>>> >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>>>>>>>>>>     Latency: 0, Cache Line Size: 64 bytes
>>>>>>>>>>     Interrupt: pin A routed to IRQ 16
>>>>>>>>>>     NUMA node: 0
>>>>>>>>>>     Region 0: Memory at c5000000 (32-bit, non-prefetchable)
>>>>>>>>>> [size=16M]
>>>>>>>>>>     Region 1: Memory at c6810000 (32-bit, non-prefetchable)
>>>>>>>>>> [size=16K]
>>>>>>>>>>     Region 2: Memory at c6000000 (32-bit, non-prefetchable)
>>>>>>>>>> [size=8M]
>>>>>>>>>>     Expansion ROM at 000c0000 [disabled] [size=128K]
>>>>>>>>>>     Capabilities: [dc] Power Management version 2
>>>>>>>>>>         Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
>>>>>>>>>> PME(D0-,D1-,D2-,D3hot-,D3cold-)
>>>>>>>>>>         Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
>>>>>>>>>>     Capabilities: [e4] Express (v1) Legacy Endpoint, MSI 00
>>>>>>>>>>         DevCap:    MaxPayload 256 bytes, PhantFunc 0, Latency
>>>>>>>>>> L0s <64ns, L1 <1us
>>>>>>>>>>             ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
>>>>>>>>>>         DevCtl:    Report errors: Correctable+ Non-Fatal+
>>>>>>>>>> Fatal+ Unsupported-
>>>>>>>>>>             RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
>>>>>>>>>>             MaxPayload 128 bytes, MaxReadReq 128 bytes
>>>>>>>>>>         DevSta:    CorrErr+ UncorrErr- FatalErr- UnsuppReq+
>>>>>>>>>> AuxPwr- TransPend-
>>>>>>>>>>         LnkCap:    Port #0, Speed 2.5GT/s, Width x1, ASPM L0s,
>>>>>>>>>> Exit Latency L0s <64ns
>>>>>>>>>>             ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
>>>>>>>>>>         LnkCtl:    ASPM Disabled; RCB 64 bytes Disabled- CommClk+
>>>>>>>>>>             ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>>>>>>>>>>         LnkSta:    Speed 2.5GT/s, Width x1, TrErr- Train-
>>>>>>>>>> SlotClk+ DLActive- BWMgmt- ABWMgmt-
>>>>>>>>>>     Capabilities: [54] MSI: Enable- Count=1/1 Maskable- 64bit-
>>>>>>>>>>         Address: 00000000  Data: 0000
>>>>>>>>>>     Kernel driver in use: mgag200
>>>>>>>>>>     Kernel modules: mgag200
>>>>>>>>>
>>>>>>>>> Looks all normal.
>>>>>>>>>
>>>>>>>>> Best regards
>>>>>>>>> Thomas
>>>>>>>>>
>>>>>>>
>>>>>>> ==============  Snip  ===========
>>>>>>>
>>>>>>>
>>>>>>> Hi Thomas
>>>>>>> ,
>>>>>>> I hopefully narrowed down the breakage between these up-stream
>>>>>>> commits,  which is v5.2 and 5.3.0-rc1:
>>>>>>>
>>>>>>>
>>>>>>> between :  0ecfebd2b524 2019-07-07 | Linux 5.2      to :  
>>>>>>> 5f9e832c1370 2019-07-21 | Linus 5.3-rc1
>>>>>>>
>>>>>>>
>>>>>>> I started to bisect this range on by date, by day ,  based on the
>>>>>>> changes done in :
>>>>>>>
>>>>>>> drivers/gpu/drm/
>>>>>>>
>>>>>>> fec88ab0af97 2019-07-14 | Merge tag 'for-linus-hmm' of
>>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma  ;  works
>>>>>>>
>>>>>>> Hopefully something in drivers/gpu/drm/ between the date range of
>>>>>>> 2019-07-14 to 2019-07-21 will surface tomorrow.
>>>>>>
>>>>>> Great, thanks for bisecting.
>>>>>>
>>>>>> Could you attach your kernel config file? I'd like to compare with my
>>>>>> config and try to reproduce the issue.
>>>>>>
>>>>>> Best regards
>>>>>> Thomas
>>>>>
>>>>>   Hi.
>>>>>
>>>>>   Here are config files generated after a “ make oldconfig “    
>>>>> that started with an original .config file from a master file  we
>>>>> use for 5.4.0.-rc4. :
>>>>>
>>>>>      config.5.2.21 -  work with that flavor
>>>>>     config.5.3.   fails with 5.3 and later.
>>>>>
>>>>>   Do you have access to mgag200 style adapter ?
>>>>
>>>> I do.
>>>>
>>>> I think I've been able to reproduce the issue. Buffers seem to
>>>> remain in
>>>> video ram after they have been pinned there. I'll investigate next
>>>> week.
>>>> I hope your bisecting session can point to the cause.
>>>>
>>>> Best regards
>>>> Thomas
>>>
>>> Hi Thomas,
>>>
>>>
>>>   Wonderful!
>>>
>>>   I think I have narrowed down the merge to this build which is :
>>> vmlinuz-5.2.0-rc5+ :
>>>
>>>
>>> be8454afc50f 2019-07-15 | Merge tag 'drm-next-2019-07-16' of
>>> git://anongit.freedesktop.org/drm/drm
>>>
>>>    Specifically this merge included these two changes :
>>>
>>>    94dc57b10399 2019-06-13 | drm/mgag200: Rewrite cursor handling
>>>    f4ce5af71bc2 2019-06-13 | drm/mgag200: Pin framebuffer BO during
>>> dirty update
>>>
>>>
>>> I  tried reverting them and the resultant driver  doesn’t build
>>> afterwards due to drm calls.
>>>
>>> If I build a kernel from :
>>>
>>> fec88ab0af97 2019-07-14 | Merge tag 'for-linus-hmm' of
>>> git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
>>>
>>> That is posted  day prior to  be8454afc50f - the GNOME desktop works.
>>
>> I thought I could reproduce the problem, but I'm not so sure now.
>>
>> Please bisect the range between the two merges as described by Daniel to
>> find the broken commit. Doing
>>
>>    git bisect start
>>    git bisect bad be8454afc50f
>>    git bisect good fec88ab0af97
>>
>> should start the session.
>>
> Hi,
> 
> I am OoO today . I will start this exercise tomorrow.
> 
> 
> 
> 

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20191112/d475ce33/attachment-0001.sig>


More information about the dri-devel mailing list