[Nouveau] Second copy engine on GF116

Marcin Koƛcielnicki koriakin at 0x04.net
Tue Nov 25 10:28:59 PST 2014


For what it's worth, I managed to get the engine to work in the simplest 
mode (ie. decompressing LZO1X bytestream). Triggering the operation is 
dead simple, and the whole thing is done in hw:

1. Destination and source have to be 0x100-byte aligned
2. Destination bufffer length is in bytes, but it's rounded up to a 
multiple of 0x100
3. Poke source address >> 8 to base+0xa00
4. Poke source length, in bytes, to base+0xa04
5. Poke destination address >> 8 to base+0xa20
6. Poke destination buffer length, in bytes, to base+0xa24
7. Poke 1 to base+0xa1c

However, I haven't figured out error handling, or other operation modes 
(there is at least one, judging by nv hardware - raw copy without 
decompression, perhaps?).

The whole thing has a grand total of 17 MMIO registers, 9 of them 
writable. Shouldn't be that hard to figure it out...

Marcin Koƛcielnicki

On 25/11/14 02:33, Andy Ritger wrote:
> On Fri, Nov 21, 2014 at 01:39:55AM -0500, Ilia Mirkin wrote:
>> On Fri, Nov 21, 2014 at 1:16 AM, Andy Ritger <aritger at nvidia.com> wrote:
>>> Hi Ilia,
>>>
>>> Actually 0x90b8 is different than copy engine.  I'm not very familiar
>>> with it, but 0x90b8 is an engine for performing LZO decompression as
>>> part of performing the copy.  It has a variety of limitations (e.g.,
>>> cannot handle blocklinear format), and was only in a few Fermi chips,
>>> as I understand it.
>>
>> According to our driver source, GF100, GF104, GF110, GF114, and GF116
>> all have it. [So GF106, GF108, GF117, GF119 don't have it.] We've only
>> had problems reported against GF116... and only for some people.
>
> Hmm, some of our internal documentation is inconsistent about whether it
> applies to GF100, but otherwise what I see matches your list.  I guess
> "few" was not entirely accurate.
>
>>> It is probably easiest to just ignore it.  You can distinguish this
>>> decompress engine from normal copy engine by looking at the CE capability
>>> register on falcon (0x00000650).  If bit 2 is '1', then the falcon is
>>> a decompress engine.
>>
>> I presume you mean a +0x650 register on the pcopy engines (0x104000
>> and 0x105000). I only have access to the GF108 right now, which
>> returns 3 for 0x104650 and 4 for 0x105650. We're using the engine at
>> 0x104000 for copy on the GF108...
>
> Yes, 0x104650 and 0x105650 are the right addresses, from what I can tell.
>
> FWIW, the other capability bits are:
> bit 0: "DMACOPY_SUPPORTED"
> bit 1: "PIXREMAP_SUPPORTED"
>
> (I think PIXREMAP_SUPPORTED is in reference to the component remapping
> controlled by methods 0x00000700, 0x00000704, and 0x00000708 in the
> copy engine class).
>
>>  From my admittedly limited understanding, both 0x104000 and 0x105000
>> appear to be falcon engines, where the fuc is presumably able to drive
>> some underlying hardware. The actual fifo methods are implemented in
>> the fuc, which in turn does iowr/etc commands.
>>
>> Are you saying that the "decompress" engine (at 0x105000 right?) has a
>> different piece of hardware behind it than the copy engine at
>> 0x104000, or does NVIDIA simply provide different fuc for it that
>> exposes somewhat different functionality via FIFO methods?
>
> There is definitely a falcon at the frontend, and there is different
> falcon ucode for "normal" copy engine versus the "decompress" engine.
> But, I don't know off hand what dedicated hardware, if any, is behind it.
>
> - Andy
>
>
>>>
>>> I hope that helps,
>>> - Andy
>>>
>>>
>>> On Thu, Nov 20, 2014 at 02:18:02PM -0500, Ilia Mirkin wrote:
>>>> Hello,
>>>>
>>>> There's a long-standing bug on nouveau (this is a sample bug, but the
>>>> issue has been around for a while:
>>>> https://bugs.freedesktop.org/show_bug.cgi?id=85465) whereby we attempt
>>>> to use the second PCOPY engine on GF116, and it is sometimes does
>>>> nothing, despite mmio register 22500 saying that it's not disabled
>>>> (0x22500 == 0 for this user). In the bug you can see a dump from
>>>> 22400..22600, and all values after 22440 are read as 0. The issue
>>>> appears to be more common on mobile GF116's, but I don't know that the
>>>> correlation is 100%. No errors are reported by the FIFO or invalid
>>>> mmio reads, but the data transfer just does not happen. Switching to
>>>> using the first copy engine resolves things, so it's unlikely to be a
>>>> more systemic issue in nouveau's usage of the copy engine.
>>>>
>>>> To be clear, when I'm talking about the second PCOPY engine, I'm
>>>> talking about the engine at mmio 0x105000, and whose fifo class id is
>>>> 0x90b8.
>>>>
>>>> Any information on properly detecting that the engine is, in fact,
>>>> missing, would be greatly appreciated. Or, conversely, an assurance
>>>> that the engine _is_ there on all GF116's and we're just not
>>>> initializing something properly, along with perhaps some suggestions
>>>> as to what we might be missing.
>>>>
>>>> Thanks,
>>>>
>>>> Ilia Mirkin
>>>> imirkin at alum.mit.edu
>>>> _______________________________________________
>>>> Nouveau mailing list
>>>> Nouveau at lists.freedesktop.org
>>>> http://lists.freedesktop.org/mailman/listinfo/nouveau
> _______________________________________________
> Nouveau mailing list
> Nouveau at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/nouveau
>



More information about the Nouveau mailing list