Drm: mgag200. Video adapter issue with 5.4.0-rc3 ; no graphics
Thomas Zimmermann
tzimmermann at suse.de
Fri Nov 8 15:06:54 UTC 2019
Hi
Am 08.11.19 um 13:55 schrieb John Donnelly:
>
>
>> On Nov 8, 2019, at 1:46 AM, Thomas Zimmermann <tzimmermann at suse.de> wrote:
>>
>> Hi John
>>
>> Am 07.11.19 um 23:14 schrieb John Donnelly:
>>>
>>>
>>>> On Nov 7, 2019, at 10:13 AM, John Donnelly <john.p.donnelly at oracle.com> wrote:
>>>>
>>>>
>>>>
>>>>> On Nov 7, 2019, at 7:42 AM, Thomas Zimmermann <tzimmermann at suse.de> wrote:
>>>>>
>>>>> Hi John
>>>>>
>>>>> Am 07.11.19 um 14:12 schrieb John Donnelly:
>>>>>> Hi Thomas ; Thank you for reaching out.
>>>>>>
>>>>>> See inline:
>>>>>>
>>>>>>> On Nov 7, 2019, at 1:54 AM, Thomas Zimmermann <tzimmermann at suse.de> wrote:
>>>>>>>
>>>>>>> Hi John,
>>>>>>>
>>>>>>> apparently the vgaarb was not the problem.
>>>>>>>
>>>>>>> Am 07.11.19 um 03:29 schrieb John Donnelly:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I am investigating an issue where we lose video activity when the display is switched from from “text mode” to “graphic mode”
>>>>>>>> on a number of servers using this driver. Specifically starting the GNOME desktop.
>>>>>>>
>>>>>>> When you say "text mode", do you mean VGA text mode or the graphical
>>>>>>> console that emulates text mode?
>>>>>>>
>>>>>>
>>>>>>
>>>>>> I call “text mode” the 24x80 ascii mode ; - NOT GRAPHICS . Ie : run-level 3; So I guess your term for it is VGA.
>>>>>
>>>>> Yes.
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>> When you enable graphics mode, does it set the correct resolution? A lot
>>>>>>> of work went into memory management recently. I could imagine that the
>>>>>>> driver sets the correct resolution, but then fails to display the
>>>>>>> correct framebuffer.
>>>>>>
>>>>>> There is no display at all ; so there is no resolution to mention.
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> If possible, could you try to update to the latest drm-tip and attach
>>>>>>> the output of
>>>>>>>
>>>>>>> /sys/kernel/debug/dri/0/vram-mm
>>>>>>
>>>>>> I don’t see that file ; Is there something else I need to do ?
>>>>>
>>>>> That file is fairly new and maybe it's not in the mainline kernel yet.
>>>>> See below for how to get it.
>>>>
>>>> I built your “tip” ; Still no graphics displayed .
>>>>
>>>>
>>>> mount -t debugfs none /sys/kernel
>>>>
>>>> cat /proc/cmdline
>>>> BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.4.0-rc6.drm.+ root=/dev/mapper/ol_ca--dev55-root ro crashkernel=auto resume=/dev/mapper/ol_ca--dev55-swap rd.lvm.lv=ol_ca-dev55/root rd.lvm.lv=ol_ca-dev55/swap console=ttyS0,9600,8,n,1 drm.debug=0xff
>>>>
>>>>
>>>> cat /sys/kernel/dri/0/vram-mm
>>>>
>>>> In VGA mode :
>>>>
>>>>
>>>> cat /sys/kernel/dri/0/vram-mm
>>>> 0x0000000000000000-0x0000000000000300: 768: used
>>>> 0x0000000000000300-0x0000000000000600: 768: used
>>>> 0x0000000000000600-0x00000000000007ee: 494: free
>>>> 0x00000000000007ee-0x00000000000007ef: 1: used
>>>> 0x00000000000007ef-0x00000000000007f0: 1: used
>>>>
>>>>
>>>> In GRAPHICS mode ( if it matters )
>>>>
>>>>
>>>> cat /sys/kernel/dri/0/vram-mm
>>>> 0x0000000000000000-0x0000000000000300: 768: used
>>>> 0x0000000000000300-0x0000000000000600: 768: used
>>>> 0x0000000000000600-0x00000000000007ee: 494: free
>>>> 0x00000000000007ee-0x00000000000007ef: 1: used
>>>> 0x00000000000007ef-0x00000000000007f0: 1: used
>>>> total: 2032, used 1538 free 494
>>>>
>>
>> This is interesting. In the graphics mode, you see two buffers of 768
>> pages each. That's the main framebuffers as used by X (it's double
>> buffered). Then there's a free area and finally two pages for cursor
>> images (also double buffered). That looks as expected.
>>
>> The thing is that in text mode, the areas are allocated. But the driver
>> shouldn't be active, so the file shouldn't exist or only show a single
>> free area.
>>
>
> If you want me to double check this I will . I have GNOME installed , but the machine boots to runlevel 3, then I start the desktop using init 5 I am pretty sure I took that output when the machine was in graphic’s mode at runlevel 5 .
>
>
>>
>>>>
>>>>
>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> I’ve attached : var/lib/gdm/.local/share/xorg/Xorg.0.log. ; instead ;
>>>>>
>>>>> Good! Looking through that log file, the card is found at line 79 and
>>>>> the generic X modesetting driver initializes below. That works as expected.
>>>>>
>>>>> I notices that several operations are not permitted (lines 78 and 87). I
>>>>> guess you're starting X from a regular user account? IIRC special
>>>>> permission is required to acquire control of the display. What happens
>>>>> if you start X as root user?
>>>>
>>>>
>>>> I am starting GNOME as root by doing “init 5” from either the console session or from ssh .
>>>>
>>>> The default runlevel is 3 on boot .
>>>>
>>>> On failing session running your 5.4.0.rc6.
>>>>
>>>> 78 [ 237.712] xf86EnableIOPorts: failed to set IOPL for I/O (Operation not permitted)
>>>>
>>>> 87 [ 237.712] (EE) open /dev/fb0: Permission denied
>>>>
>>>> Booting 4.18 kernel yields the same error results in: /var/lib/gdm/.local/share/xorg/Xorg.0.log
>>>>
>>>> 78 [ 101.334] xf86EnableIOPorts: failed to set IOPL for I/O (Operation not permitted)
>>>>
>>>> 87 [ 101.334] (EE) open /dev/fb0: Permission denied
>>>>
>>>>
>>>> What is strange the X logs ( bad and Ok ) files essentially appear as if GNOME started !
>>>>
>>>>
>>>>
>>>>
>>>> <Xorg.0.log.bad><Xorg.0.log.Ok>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Here is my cmdline - I just tested 5.3.0 and it fails too ( my last test was 5.3.8 and it failed also ) .
>>>>>>
>>>>>> # cat /proc/cmdline
>>>>>> BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.3.0+ root=/dev/mapper/ol_ca--dev55-root ro crashkernel=auto resume=/dev/mapper/ol_ca--dev55-swap rd.lvm.lv=ol_ca-dev55/root rd.lvm.lv=ol_ca-dev55/swap console=ttyS0,9600,8,n,1 drm.debug=0xff
>>>>>>
>>>>>> When you say “tip”. - Are you referring to a specific kernel ? I can build a 5.4.0.rc6 ; The problem appears to have been introduced around 5.3 time frame.
>>>>>
>>>>> The latest and greatest DRM code is in the drm-tip branch at
>>>>>
>>>>> git://anongit.freedesktop.org/drm/drm-tip
>>>>>
>>>>> If you build this version you should find
>>>>>
>>>>> /sys/kernel/debug/dri/0/vram-mm
>>>>>
>>>>> on the device. You have to build with debugfs enabled and
>>>>> maybe have to mount debugfs at /sys/kernel/debug.
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> before and after switching to graphics mode. The file lists the
>>>>>>> allocated regions of the VRAM.
>>>>>>>
>>>>>>>>
>>>>>>>> This adapter is Server Engines Integrated Remote Video Acceleration Subsystem (RVAS) and is used as remote console in iLO/DRAC environments.
>>>>>>>>
>>>>>>>> I don’t see any specific errors in the gdm logs or message file other than this:
>>>>>>>
>>>>>>> You can boot with drm.debug=0xff on the kernel command line to enable
>>>>>>> more warnings.
>>>>>>>
>>>>>>>
>>>>>>> Could you please attach the output of lspci -v for the VGA adapter?
>>>>>>>
>>>>>>
>>>>>>
>>>>>> Here is the output from the current machine; The previous addresses were from another model using the same SE device:
>>>>>>
>>>>>>
>>>>>> Nov 7 04:42:50 ca-dev55 kernel: mgag200 0000:3d:00.0: remove_conflicting_pci_framebuffers: bar 0: 0xc5000000 -> 0xc5ffffff
>>>>>> Nov 7 04:42:50 ca-dev55 kernel: mgag200 0000:3d:00.0: remove_conflicting_pci_framebuffers: bar 1: 0xc6810000 -> 0xc6813fff
>>>>>> Nov 7 04:42:50 ca-dev55 kernel: mgag200 0000:3d:00.0: remove_conflicting_pci_framebuffers: bar 2: 0xc6000000 -> 0xc67fffff
>>>>>> Nov 7 04:42:50 ca-dev55 kernel: mgag200 0000:3d:00.0: vgaarb: deactivate vga console
>>>>>>
>>>>>>
>>>>>> lspci -s 3d:00.0 -vvv -k
>>>>>> 3d:00.0 VGA compatible controller: Matrox Electronics Systems Ltd. MGA G200e [Pilot] ServerEngines (SEP1) (rev 05) (prog-if 00 [VGA controller])
>>>>>> Subsystem: Oracle/SUN Device 4852
>>>>>> Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx-
>>>>>> Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>>>>>> Latency: 0, Cache Line Size: 64 bytes
>>>>>> Interrupt: pin A routed to IRQ 16
>>>>>> NUMA node: 0
>>>>>> Region 0: Memory at c5000000 (32-bit, non-prefetchable) [size=16M]
>>>>>> Region 1: Memory at c6810000 (32-bit, non-prefetchable) [size=16K]
>>>>>> Region 2: Memory at c6000000 (32-bit, non-prefetchable) [size=8M]
>>>>>> Expansion ROM at 000c0000 [disabled] [size=128K]
>>>>>> Capabilities: [dc] Power Management version 2
>>>>>> Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
>>>>>> Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
>>>>>> Capabilities: [e4] Express (v1) Legacy Endpoint, MSI 00
>>>>>> DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
>>>>>> ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
>>>>>> DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported-
>>>>>> RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
>>>>>> MaxPayload 128 bytes, MaxReadReq 128 bytes
>>>>>> DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
>>>>>> LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s, Exit Latency L0s <64ns
>>>>>> ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
>>>>>> LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
>>>>>> ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>>>>>> LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>>>>>> Capabilities: [54] MSI: Enable- Count=1/1 Maskable- 64bit-
>>>>>> Address: 00000000 Data: 0000
>>>>>> Kernel driver in use: mgag200
>>>>>> Kernel modules: mgag200
>>>>>
>>>>> Looks all normal.
>>>>>
>>>>> Best regards
>>>>> Thomas
>>>>>
>>>
>>> ============== Snip ===========
>>>
>>>
>>> Hi Thomas
>>> ,
>>> I hopefully narrowed down the breakage between these up-stream commits, which is v5.2 and 5.3.0-rc1:
>>>
>>>
>>> between : 0ecfebd2b524 2019-07-07 | Linux 5.2 to : 5f9e832c1370 2019-07-21 | Linus 5.3-rc1
>>>
>>>
>>> I started to bisect this range on by date, by day , based on the changes done in :
>>>
>>> drivers/gpu/drm/
>>>
>>> fec88ab0af97 2019-07-14 | Merge tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma ; works
>>>
>>> Hopefully something in drivers/gpu/drm/ between the date range of 2019-07-14 to 2019-07-21 will surface tomorrow.
>>
>> Great, thanks for bisecting.
>>
>> Could you attach your kernel config file? I'd like to compare with my
>> config and try to reproduce the issue.
>>
>> Best regards
>> Thomas
>
> Hi.
>
> Here are config files generated after a “ make oldconfig “ that started with an original .config file from a master file we use for 5.4.0.-rc4. :
>
> config.5.2.21 - work with that flavor
> config.5.3. fails with 5.3 and later.
>
> Do you have access to mgag200 style adapter ?
I do.
I think I've been able to reproduce the issue. Buffers seem to remain in
video ram after they have been pinned there. I'll investigate next week.
I hope your bisecting session can point to the cause.
Best regards
Thomas
>
>
>
>
>
>
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/dri-devel/attachments/20191108/2635eba0/attachment.sig>
More information about the dri-devel
mailing list