[Nouveau] Documentation request for MP warp error 0x10

Robert Morell rmorell at nvidia.com
Fri Oct 2 15:14:56 PDT 2015


Hi Ilia,

On Fri, Oct 02, 2015 at 06:05:21PM -0400, Ilia Mirkin wrote:
> Hi Robert,
> 
> Thanks for the quick response! That goes in line with my observations
> which is that these things happen when using an ATOM/RED instruction.
> I've checked and rechecked that I'm generating ops with identical bits
> as what the proprietary driver does, however (and nvdisasm prints
> identical output). Could you advise what the proper way of indicating
> that the memory is "global" to the op? I'm sure I'm just missing
> something simple. If you show me what to look for in SM35 I can
> probably find it on my own for SM20/SM30/SM50.

Unfortunately this isn't something I know a lot about, so I'm going to
have do some research and get back to you, hopefully within a few days.

> In case you're interested in looking at the mesa code, It's available
> on my atomic2 branch at:
> https://github.com/imirkin/mesa/commits/atomic2 . However I hardly
> expect you to debug my buggy code :) The SUREDP stuff is about surface
> RED ops, the existing code uses it but I'm going to leave it for image
> support and break direct buffer accesses directly into OP_ATOM (and in
> NVIDIA terminology RED is just ATOM without a destination).

Neat.  I'll take a look at it.

Thanks,
Robert

> Thanks,
> 
>   -ilia
> 
> On Fri, Oct 2, 2015 at 5:48 PM, Robert Morell <rmorell at nvidia.com> wrote:
> > Hi Ilia,
> >
> > Error 0x10 is INVALID_ADDR_SPACE. It is triggered when an ATOM or RED [1]
> > instruction accesses local or shared memory. Global memory accesses are the
> > only allowed accesses for ATOM and RED instructions.
> >
> >
> > Note that SM30 also has this restriction that ATOM and RED should only be used
> > on global memory, but it is not error-checked in hardware until SM35.
> >
> >
> > [1] What is documented as RED internally appears to be called SUREDP in Mesa.
> >
> >
> > - Robert
> >
> > On Wed, Sep 30, 2015 at 03:14:47PM -0400, Ilia Mirkin wrote:
> >> Hello,
> >>
> >> I've recently come across an error reported by the GPU and would like
> >> to know what it means and especially what causes it to be triggered.
> >> Any information would be very useful:
> >>
> >> I'm seeing MP warp error 0x10 (appears in MP register 0x48). This is
> >> what we currently have in nouveau:
> >>
> >> <reg32 offset="0x048" name="TRAP_WARP_ERROR"> <!-- ctx-switched -->
> >> <bitfield high="15" low="0" name="ID">
> >> <value value="1" name="STACK_MISMATCH"/>
> >> <value value="5" name="MISALIGNED_PC"/>
> >> <value value="8" name="MISALIGNED_GPR"/>
> >> <value value="9" name="INVALID_OPCODE"/>
> >> <value value="13" name="GPR_OUT_OF_BOUNDS"/>
> >> <value value="14" name="MEM_OUT_OF_BOUNDS"/>
> >> <value value="17" name="INVALID_PARAM"/>
> >> </bitfield>
> >> </reg32>
> >>
> >> [Additionally it seems like 15 = UNALIGNED_MEM_ACCESS]
> >>
> >> It seems to happen whenever I try to access global memory on kepler
> >> (potentially only atomics, not sure at this point). Knowing precisely
> >> what causes the error to get triggered (and esp what we need to do in
> >> order not to trigger it) would be most useful.
> >>
> >> For reference, my shader looks something like this (for SM35):
> >>
> >>         /*0018*/                   LDC.64 R0, c[0xf][0x1c0];
> >>   /* 0x7ca80780e01ffc02 */
> >>         /*0020*/                   ATOM.E.ADD R2, [R0], R2;
> >>   /* 0x68080000011c000a */
> >>         /*0028*/                   LD.E.CG R0, [R0];
> >>   /* 0xcc800000001c0000 */
> >>
> >> I know that the proprietary drivers are a lot more sophisticated and
> >> only do the atomic add from a single lane, but I was assuming that was
> >> not required.
> >>
> >> Thanks,
> >>
> >>   -ilia
> >>
> 


More information about the Nouveau mailing list