[Nouveau] data error enum documentation

Ilia Mirkin imirkin at alum.mit.edu
Wed Apr 30 10:29:29 PDT 2014


On Wed, Apr 30, 2014 at 11:54 AM, Andy Ritger <aritger at nvidia.com> wrote:
> Sorry for the very slow response to this, Ilia.
>
> For the specific error you mentioned: the error code
> 0x51 is "ErrorSrcLineExceedsPitch", and error code 0x53 is
> "ErrorDstLineExceedsPitch". It looks like class 0x9039 will generate
> those errors under the following conditions:
>
>        if ((NV9039_LAUNCH_DMA_SRC_MEMORY_LAYOUT == PITCH) &&
>            (NV9039_LAUNCH_DMA_SRC_INLINE == FALSE) &&
>            (NV9039_LINE_COUNT_VALUE > 1) &&
>            (NV9039_PITCH_IN_VALUE >= 0) &&
>            (NV9039_LINE_LENGTH_IN_VALUE > NV9039_PITCH_IN_VALUE)) {
>            return ErrorSrcLineExceedsPitch;
>        }
>
>        if ((NV9039_LAUNCH_DMA_DST_MEMORY_LAYOUT == PITCH) &&
>            (NV9039_LINE_COUNT_VALUE > 1) &&
>            (NV9039_PITCH_OUT_VALUE >= 0) &&
>            (NV9039_LINE_LENGTH_IN_VALUE > NV9039_PITCH_OUT_VALUE)) {
>            return ErrorDstLineExceedsPitch;
>        }
>
> Where those NV9039_* method values are defined as:
>
> #define NV9039_LAUNCH_DMA                                                                                 0x0300
> #define NV9039_LAUNCH_DMA_SRC_INLINE                                                                         0:0
> #define NV9039_LAUNCH_DMA_SRC_INLINE_FALSE                                                            0x00000000
> #define NV9039_LAUNCH_DMA_SRC_INLINE_TRUE                                                             0x00000001
> #define NV9039_LAUNCH_DMA_SRC_MEMORY_LAYOUT                                                                  4:4
> #define NV9039_LAUNCH_DMA_SRC_MEMORY_LAYOUT_BLOCKLINEAR                                               0x00000000
> #define NV9039_LAUNCH_DMA_SRC_MEMORY_LAYOUT_PITCH                                                     0x00000001
> #define NV9039_LAUNCH_DMA_DST_MEMORY_LAYOUT                                                                  8:8
> #define NV9039_LAUNCH_DMA_DST_MEMORY_LAYOUT_BLOCKLINEAR                                               0x00000000
> #define NV9039_LAUNCH_DMA_DST_MEMORY_LAYOUT_PITCH                                                     0x00000001
>
> #define NV9039_PITCH_IN                                                                                   0x0314
> #define NV9039_PITCH_IN_VALUE                                                                               31:0
>
> #define NV9039_PITCH_OUT                                                                                  0x0318
> #define NV9039_PITCH_OUT_VALUE                                                                              31:0
>
> #define NV9039_LINE_LENGTH_IN                                                                             0x031c
> #define NV9039_LINE_LENGTH_IN_VALUE                                                                         31:0
>
> #define NV9039_LINE_COUNT                                                                                 0x0320
> #define NV9039_LINE_COUNT_VALUE                                                                             31:0

Very helpful info, thanks! That should help narrow the source of the problem.

>
> As far as I can tell, these checks are not GF106-specific, so I'm not
> sure why the problem is only showing up there.  Maybe there is something
> else unique about the GF106 user's configuration that causes this to
> be triggered?

Perhaps. I've also observed that different GPU's are differently
sensitive to invalid values. For example we had a bug that manifested
itself in G80-G94 yelling at us about out-of-bounds X/Y coordinates,
while G96+ happily took the illegal values (and probably did nasty
things with them like overwriting memory it wasn't supposed to touch).
It is odd that _only_ GF106 would have that logic, but... whatever.
I'm also missing GF104, GF110, GF117 results, so who knows, perhaps
they would have also reported the issue. I guess another possibility I
hadn't previously considered is that this user's GF106 could just be
somehow busted, his is the only one I know of, so I couldn't
cross-check with a different one. But the problem is sufficiently
restricted that it seems unlikely to be a bad part, and more likely a
driver bug.

Anyways, now that we know what to look for, it should be much easier
to identify in a command stream dump.

Thanks again,

  -ilia

>
> Thanks,
> - Andy
>
>
> On Tue, Mar 18, 2014 at 06:44:30AM -0700, Ilia Mirkin wrote:
>> Hello,
>>
>> A user on an NVC3 card (GF106) is running into data errors on m2mf
>> (class 0x9039) that we haven't seen before:
>>
>> http://people.freedesktop.org/~imirkin/nvc0-comparison/nvc3-2014-03-17-agashlin/glean/fbo.html
>> http://people.freedesktop.org/~imirkin/nvc0-comparison/nvc3-2014-03-17-agashlin/spec/!OpenGL%201.1/copyteximage%201D.html
>>
>> Specifically the data errors 0x51 and 0x53, when running method 0x300
>> ("EXEC"). Any chance you could let us know what those errors are? (Or,
>> even better, provide the full table so that we'll have a better idea
>> in future cases as well.)
>>
>> Here are a few that we know about, so you know exactly what table I'm
>> talking about (our full list at
>> https://github.com/envytools/envytools/blob/master/rnndb/nv50_defs.xml#L192):
>>
>> 0x04: INVALID_VALUE
>> 0x05: INVALID_ENUM
>> 0x08: INVALID_OBJECT
>> 0x0c: INVALID_BITFIELD
>> 0x3f: PRIMITIVE_ID_NEEDS_GP
>>
>> We read this data error value from mmio reg 0x400110.
>>
>> Furthermore, if you could provide any insight as to why we would see
>> those errors on GF106 but not any other Fermi/Kepler that we've tested
>> (which should all run exactly the same code paths), that would be
>> extremely helpful as well. You can see the Fermi piglit runs we have
>> on file at http://people.freedesktop.org/~imirkin/nvc0-comparison/problems.html
>>
>> Thanks,
>>
>>   -ilia


More information about the Nouveau mailing list