[Nouveau] Question about nv40_draw_array

Thu Dec 17 13:26:43 PST 2009

On 17.12.2009 19:11, Krzysztof Smiechowicz wrote:
> Christoph Bumiller pisze:
>   
>> Hi.
>>     
> Hi, thanks for the quick feedback. :)
>
>   
>> Most probably the state tracker calls pipe_buffer_map on the vertex
>> buffer which (if it was not created as a user buffer) causes an mmap
>> of it to the user's address space (so either GART system memory pages or
>> VRAM pages through the FB aperture get mapped, whatever was selected in
>> drivers/nouveau/nouveau_screen.c), then just writes the data and
>> subsequently unmaps again.
>>     
> This is what I found "inside" the nv40_draw_elements_swtnl, but I can't 
> find this in case of hardware path.
>
>   
If the state tracker uses a buffer with PIPE_BUFFER_USAGE_VERTEX,
the data will be in GPU accessible memory already, i.e. there will be
a nouveau_bo_kalloc'd / TTM buffer object.
If it uses a user buffer, a kernel side bo is allocate in bo_emit_buffer
and the data is memcpy'd there.
>>> I went through the software path "nv40_draw_elements_swtnl" and found a 
>>> place in draw module where the buffer storage address is obtained and 
>>> data from buffer is used direclty by software rendering. I cannot
>>> however find a similar place for hardware path. I would like to learn 
>>> where is the code that copies this data to gfx card or, if this is done 
>>> by card reading into computer's memory, what code triggers the read, how 
>>> does the gfx card know from which address in RAM to copy the data and 
>>> what code indicates that the read finished.
>>>       
>> The vertex buffers are set up in nv40_vbo_validate, which records a
>> state object to be emitted on validation.
>> The address is set with method NV40TCL_VTXBUF_ADDRESS(i). We output a
>> relocation, 
>>     
> I assume by relocation you mean this code:
>
> nv40_vbo.c, 536
>
> 		so_reloc(vtxbuf, nouveau_bo(vb->buffer),
> 				 vb->buffer_offset + ve->src_offset,
> 				 vb_flags | NOUVEAU_BO_LOW | NOUVEAU_BO_OR,
> 				 0, NV40TCL_VTXBUF_ADDRESS_DMA1);
>
>
>   
> so the kernel side fills in the appropriate address for us,
>
> Can you tell me where this filling happens? Where does the kernel put 
> this address (some buffer, card registers?) - maybe I can read it and 
> validate? I assume it should put the actually memory address at which 
> read should start? Or maybe address of beginning of buffer and offset in 
> another "argument"?
>
>   
The address is put into the FIFO == command buffer, which is
submitted to the kernel for emission on FIRE_RING.
The address is the start address of the buffer + the data
argument (in the above case vb->buffer_offset + ve->src_offset)
for nouveau_pushbuf_emit_reloc.
The drm_nouveau_gem_pushbuf_bo struct contains that
data and an index in the command buffer the address is to
be placed, this will also be handed to the kernel.
Look at nouveau_gem_ioctl_pushbuf_* in the kernel's
nouveau_gem.c, there's nouveau_gem_pushbuf_reloc_apply
and friends which will handle relocation.
I don't know how their exact proceedings.

The reloc above contains the actual start address of the
vertex element, so there's no other method/reg used to
set an additional offset as far as I can see.
>> The read is triggered by NV40TCL_BEGIN_END + NV40TCL_VB_VERTEX_BATCH,
>> and it will probably be done reading when the GET pointer of the FIFO
>> has moved past the command.
>>     
> I assume the read will happen after pipe->flush() and not immediatelly 
> after:
>
> 			BEGIN_RING(curie, NV40TCL_VB_VERTEX_BATCH, 1);
> 			OUT_RING  (((nr - 1) << 24) | start);
>
>
>
>   
Right.
> Let me describe the bug I'm facing - my suspicion is that this is caused 
> by bug in my porting of TTM and not a bug in nouveau itself. I have some 
> parts of code still commented out as they are linux specific and no 
> immediate mapping to AROS structures could be made. Also I didn't had 
> these problems on "old" drm port.
>
> I see this problem on morph3d demo. What it does is: for each frame 
> create a call list and then call it 4 times.
>
> ADDR	VRAM OFFSET
> A	X
> B	Y
> C	X
>
> A,B,C is the memory offset of 32kb buffer created for vertex buffer when 
> call lists are compiled. X,Y is the VRAM OFFSET (bo.mem.mm_node.start)
>
> First buffer is created (X,A). When it gets full (after around 3 frames) 
> second buffer is created (Y,B). Then first one is freed. When second 
> buffer is full, third is created (X,C) - here the problem start:
> according to my observations, the card seems to read vertexes not from 
> address C but from address A as if it somehow remembered the initial 
> address binding.
>
>   
So the actual VRAM (or GART) addresses are X + A, Y + B, X + C;
If they're all different and you get the vertices from X + A either
you have a bug or if X + A == X + C stuff is not written, or maybe
the GPU has cached that area somehow (even if the CPU has flushed).
I know that the kernel should take care of all cache flushing, but,
I made the same observation on nv50 that this somehow doesn't
seem to always work.
But then, maybe I was too quick to blame caches and should
try to find out if it isn't actually some other bug ...