[Nouveau] Second copy engine on GF116

Wed Nov 26 17:05:12 PST 2014

On Wed, Nov 26, 2014 at 02:18:25AM +0100, Marcin Kościelnicki wrote:
[...]
> >>http://envytools.readthedocs.org/en/latest/hw/gpu.html#fermi-kepler-maxwell-family
> >
> >I don't see the 0x650 register values on that page.  Maybe I'm not
> >looking at the right place?
> 
> The table at the bottom, CE0-CE2 columns.

Thanks.

> >>It looks like it's actually returning 0 on both "copy" engines for a
> >>bunch of those cards -- GF100, GF104, GF114, probably GF110. But other
> >>cards have them as either 3 or 4. I'm guessing that '0' should be
> >>treated as if it were a '3' (or a '7')?
> >
> >That's curious.  If I can get the table of where that reads zero, I can
> >try to investigate how to interpret that.
> 
> GF100, GF110, GF104, GF114.
> 
> Sounds obvious to me - the caps register wasn't needed before GF106
> and thus didn't exist.
> 
> I don't think there's any more need for information here - we know
> how to tell apart a decompression engine by the caps register, *and*
> we know which cards have it (GF106, GF116, GF108 - unless someone
> resurrected it on GKsomething or GMsomething).

I cannot find any information to suggest that the decompress engine
exists on anything >= Kepler.  I think your list of GPUs that had the
decompress engine is accurate, and the capability register wasn't added
until GF106 when the decompress engine was added.

> We also know the
> difference between a normal copy engine and a decompression engine
> (basically: all dedicated copy hw is missing and replaced by
> dedicated decompression hw - effectively a completely different
> engine). In fact, given the decomp engine's simplicity, it shouldn't
> be hard at all to write firmware for it.

Enjoy :)

> We are, however, quite curious about the purpose of an LZO1X
> decompression engine on a GPU...

>From what I can tell, the motivation was to better utilize bandwidth
across limited PCIe buses (e.g., PCIe 1x configurations, like on
some notebooks).  I don't believe we ever attempted to use it in the
OpenGL driver, but the DX driver tried it.  I think the intent was
for the driver to compress content in sysmem using the CPU, then use
the decompress engine to transfer to vidmem and decompress inflight.
I don't know LZO compression performance characteristics, but I'm a
little suspicious of that CPU/bandwidth tradeoff.

Anyway, it seems like it was somewhat of a failed experiment and we
eventually gave up on the decompress engine.

> Fun fact, I knew of the existence of decompression engines for some
> time, but never managed to locate them - I guess I didn't consider
> copy engines to warrant a second look on all possible GPUs...

:)

> Which brings me to ask: are there any more FIFO engines we somehow
> missed on Fermi+? There's apparently a new VIC class (0xa0b6), but
> I've never seen a VIC other than the MCP89 one (0x86b6).

VIC is supposed to be pretty good for reducing power usage.  I don't
have first hand experience programming it.  I don't know why it wasn't
included in any other subsequent GPU, but it made a rebirth for Tegra
(I'm pretty sure it is in Tegra K1).  Off hand, I'm not sure if it saw
any method interface changes between MCP and Tegra.

We haven't implemented anything to take advantage of it in the proprietary
X driver (yet? -- we'll probably need to eventually), but I'm pretty
sure it is used somewhere in the Android stack.

> AFAICS there's also one unknown enum value in NVRM's FIFO engine
> enum... (I know of GRAPH, CE0, CE1, CE2, VP1/VP2/MSPDEC, MSRCH/ME,
> MSPPP, BSP/MSVLD/MSDEC, MPEG, SOFTWARE, CIPHER/SEC, VIC, MSENC).

I'll see what information I can dig up.

Thanks,
- Andy

> >>Curiously, a GF116 card that I thought was working fine on nouveau
> >>actually has 3 for the first engine and 4 for the second. Perhaps it
> >>just had enough VRAM that I never triggered the conditions required
> >>for nouveau to use that second copy engine (we use it, when available,
> >>for drm-initiated buffer moves).
> >
> >Interesting.  Would that explain why this hasn't manifested on configs
> >other than the GF116 user reports?
> >
> >Thanks,
> >- Andy
> >
> >>>> From my admittedly limited understanding, both 0x104000 and 0x105000
> >>>>appear to be falcon engines, where the fuc is presumably able to drive
> >>>>some underlying hardware. The actual fifo methods are implemented in
> >>>>the fuc, which in turn does iowr/etc commands.
> >>>>
> >>>>Are you saying that the "decompress" engine (at 0x105000 right?) has a
> >>>>different piece of hardware behind it than the copy engine at
> >>>>0x104000, or does NVIDIA simply provide different fuc for it that
> >>>>exposes somewhat different functionality via FIFO methods?
> >>>
> >>>There is definitely a falcon at the frontend, and there is different
> >>>falcon ucode for "normal" copy engine versus the "decompress" engine.
> >>>But, I don't know off hand what dedicated hardware, if any, is behind it.
> >>
> >>Seems likely that the HW is different, since it'd be madness to try to
> >>do decompression in the falcon code itself. (Not to say that the ISA
> >>isn't suited to it, just they have relatively slow clocks.) mwk is in
> >>the process of working it all out.
> >>
> >>   -ilia
> >_______________________________________________
> >Nouveau mailing list
> >Nouveau at lists.freedesktop.org
> >http://lists.freedesktop.org/mailman/listinfo/nouveau
> >
>