[Nouveau] Second copy engine on GF116

Ilia Mirkin imirkin at alum.mit.edu
Tue Nov 25 13:12:06 PST 2014


On Tue, Nov 25, 2014 at 4:05 PM, Andy Ritger <aritger at nvidia.com> wrote:
> On Tue, Nov 25, 2014 at 10:57:44AM -0500, Ilia Mirkin wrote:
>> On Mon, Nov 24, 2014 at 8:33 PM, Andy Ritger <aritger at nvidia.com> wrote:
>> > On Fri, Nov 21, 2014 at 01:39:55AM -0500, Ilia Mirkin wrote:
>> >> On Fri, Nov 21, 2014 at 1:16 AM, Andy Ritger <aritger at nvidia.com> wrote:
>> >> > Hi Ilia,
>> >> >
>> >> > Actually 0x90b8 is different than copy engine.  I'm not very familiar
>> >> > with it, but 0x90b8 is an engine for performing LZO decompression as
>> >> > part of performing the copy.  It has a variety of limitations (e.g.,
>> >> > cannot handle blocklinear format), and was only in a few Fermi chips,
>> >> > as I understand it.
>> >>
>> >> According to our driver source, GF100, GF104, GF110, GF114, and GF116
>> >> all have it. [So GF106, GF108, GF117, GF119 don't have it.] We've only
>> >> had problems reported against GF116... and only for some people.
>> >
>> > Hmm, some of our internal documentation is inconsistent about whether it
>> > applies to GF100, but otherwise what I see matches your list.  I guess
>> > "few" was not entirely accurate.
>> >
>> >> > It is probably easiest to just ignore it.  You can distinguish this
>> >> > decompress engine from normal copy engine by looking at the CE capability
>> >> > register on falcon (0x00000650).  If bit 2 is '1', then the falcon is
>> >> > a decompress engine.
>> >>
>> >> I presume you mean a +0x650 register on the pcopy engines (0x104000
>> >> and 0x105000). I only have access to the GF108 right now, which
>> >> returns 3 for 0x104650 and 4 for 0x105650. We're using the engine at
>> >> 0x104000 for copy on the GF108...
>> >
>> > Yes, 0x104650 and 0x105650 are the right addresses, from what I can tell.
>> >
>> > FWIW, the other capability bits are:
>> > bit 0: "DMACOPY_SUPPORTED"
>> > bit 1: "PIXREMAP_SUPPORTED"
>> >
>> > (I think PIXREMAP_SUPPORTED is in reference to the component remapping
>> > controlled by methods 0x00000700, 0x00000704, and 0x00000708 in the
>> > copy engine class).
>>
>> Neat. We went around and grabbed that 0x650 register on a bunch of
>> GPUs, see the CE* columns at:
>>
>> http://envytools.readthedocs.org/en/latest/hw/gpu.html#fermi-kepler-maxwell-family
>
> I don't see the 0x650 register values on that page.  Maybe I'm not
> looking at the right place?

No, you're looking in the right place. Someone who shall remain
nameless killed something in the formatting... hopefully it'll get
fixed shortly, but in the meanwhile:

https://github.com/envytools/envytools/commit/5344c92108227ab7138d5130afc0203fa79b4f3c

Look at the CE0/CE1 columns.

>
>> It looks like it's actually returning 0 on both "copy" engines for a
>> bunch of those cards -- GF100, GF104, GF114, probably GF110. But other
>> cards have them as either 3 or 4. I'm guessing that '0' should be
>> treated as if it were a '3' (or a '7')?
>
> That's curious.  If I can get the table of where that reads zero, I can
> try to investigate how to interpret that.
>
>> Curiously, a GF116 card that I thought was working fine on nouveau
>> actually has 3 for the first engine and 4 for the second. Perhaps it
>> just had enough VRAM that I never triggered the conditions required
>> for nouveau to use that second copy engine (we use it, when available,
>> for drm-initiated buffer moves).
>
> Interesting.  Would that explain why this hasn't manifested on configs
> other than the GF116 user reports?

Well, all the other GPU's where we try to use the secondary copy
engine report 0 for both +0x650 registers.

  -ilia


More information about the Nouveau mailing list