[Nouveau] [PATCH] nouveau: nv46: Change mc subdev oclass from nv44 to nv4c

Hans de Goede hdegoede at redhat.com
Wed Jul 29 08:36:22 PDT 2015


Hi,

On 28-07-15 09:26, Ben Skeggs wrote:
> On 28 July 2015 at 01:52, Hans de Goede <hdegoede at redhat.com> wrote:
>> Hi,
>>
>> On 24-07-15 04:32, Ben Skeggs wrote:
>>>
>>> On 24 July 2015 at 01:20, Hans de Goede <hdegoede at redhat.com> wrote:
>>>>
>>>> MSI interrupts appear to not work for nv46 based cards. Change the mc
>>>> subdev oclass for these cards from nv44 to nv4c, the nv4c mc code is
>>>> identical to the nv44 mc code except that it does not use msi
>>>> (it does not define a msi_rearm callback).
>>>
>>> I'm fine with this, but it'd be nice to check that the binary driver
>>> doesn't/can't use MSI on these too (there might be an alternate method
>>> we need to use).
>>>
>>> Would you be able to grab the latest proprietary driver that works on
>>> nv4x, and do a mmiotrace of it?
>>
>>
>> I've grabbed 304.125
>>
>>> You *might* need to use "modprobe
>>> nvidia NVreg_EnableMSI=1", because at some point NVIDIA didn't use it
>>> by default anywhere.
>>
>>
>> You're right I needed to specify NVreg_EnableMSI=1, with that set
>> /proc/interrupts shows that MSI is used.
>>
>> Here is an of running glxgears with the binary driver using msi interrupts
>> mmiotrace:
>>
>> https://fedorapeople.org/~jwrdegoede/nvidia-bin-nv46-msi-on-glxgears.mmiotrace.gz
>>
>> AFAIK there are some nouveau tools to parse this a bit, right ? I'm going
>> to call it a day for today, if you can give me some pointers what to do with
>> the
>> mmiotrace to find a potential fix for the msi issues, that would be
>> appreciated.

I've run demmio on this as suggested by Ilia, I've checked all the writes
to the pmc pbus and pci ranges, and I've been unable to find anything which
helps I'm afraid. I've also checked the interrupt regs of the crt block, and
those are correct, and the interrupt flag for vblank is set.

So I'm all out of clues I'm afraid. One thing which does stand out is that
lspci -vvv shows the following differences between nouveau vs nvidea output:

@@ -361,23 +361,23 @@
         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Ste
         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
         Latency: 0
-       Interrupt: pin A routed to IRQ 28
+       Interrupt: pin A routed to IRQ 29
         Region 0: Memory at fd000000 (32-bit, non-prefetchable) [size=16M]
         Region 1: Memory at d0000000 (64-bit, prefetchable) [size=256M]
         Region 3: Memory at fc000000 (64-bit, non-prefetchable) [size=16M]
-       Expansion ROM at fe9e0000 [disabled] [size=128K]
+       [virtual] Expansion ROM at fe9e0000 [disabled] [size=128K]
         Capabilities: [60] Power Management version 2
                 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3ho
                 Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
         Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
-               Address: 00000000fee0300c  Data: 41a2
+               Address: 00000000fee0300c  Data: 41c2
         Capabilities: [78] Express (v1) Endpoint, MSI 00
                 DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <256ns,
                         ExtTag- AttnBtn- AttnInd- PwrInd- RBE- FLReset-
                 DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupport
                         RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                         MaxPayload 128 bytes, MaxReadReq 512 bytes
-               DevSta: CorrErr- UncorrErr+ FatalErr- UnsuppReq+ AuxPwr- TransP
+               DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransP
                 LnkCap: Port #0, Speed 2.5GT/s, Width x16, ASPM L0s L1, Exit La
                         ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
                 LnkCtl: ASPM Disabled; RCB 128 bytes Disabled- CommClk+
@@ -393,7 +393,7 @@
                         Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=01
                         Status: NegoPending- InProgress-
         Capabilities: [128 v1] Power Budgeting <?>
-       Kernel driver in use: nouveau
+       Kernel driver in use: nvidia
         Kernel modules: nouveau, nvidia

The DevSta shows that we are sending some commands the device does not like.

At first I thought this would be the culprit, but as discussed before on some
boots things just work and on others they do not (when using nouveau). I've
checked a boot with nouveau where things just happen to work, and there
UncorrErr+ and UnsuppReq+ are still set when things just work.

So I'm officially giving up on this, and I'm going to continue to work on
the nv46 with msi disabled.

Note that when things do not work, we do get some interrupts, they just stop
coming at one point shortly after boot.

>> BTW I had to build my own kernel with mmiotrace enabled in Kconfig, as this
>> is disabled in the Fedora kernels by default. Do you know if there is a good
>> reason to have this disabled by default, or should I ask the Fedora
>> kernel maintainers to enable it by default ?
> The -debug kernel has it enabled already.  However, it's also
> problematic in that it enables various lockdep debugging stuff that
> causes the binary driver kernel module to end up depending on GPL-only
> symbols, which you have to hack around by changing the
> MODULE_LICENSE() for the binary driver to "GPL"... Which is clearly a
> pain :)  So, I guess if you want a slightly more straight-forward
> approach, it'd be good to enable in the non-debug kernels too.

Ok, before I submit a patch to the Fedora kernel devs for this, mmiotrace
uses live patching like the other ftrace stuff, so no performance impact
unless actually used, right ?

>
>>
>>
>> Slightly offtopic:
>>
>> I decided to be bold and try gnome-shell on the nv46 with msi disabled,
>> which sofar was a guaranteed way to freeze the system, and it now works
>> somewhat (latest kernel, ddx and mesa). I see something which shows
>> horizontal lines which are small parts from my desktop background, and
>> things change significantly when I switch to the overview mode.
>>
>> But other then that the display is completely wrong, it looks a bit
>> like a framebuffer pitch problem, but then different. I think it
>> is likely some tiling problem or some such.
>>
>> Note that metacity + glxgears works, this only shows with
>> gnome-shell, any hints where to start looking wrt debugging this?

> These are the main issues that I'd like to see resolved :)

Agreed getting gnome-shell running is really the minimum level we should
support cards at.

>> Or should I first try to run piglet and see if some tests there
>> point out the culprit?
> I think this is a good place to start.

Ok, will do.

Regards,

Hans


More information about the Nouveau mailing list