[Nouveau] Memory corruption on Gallium window resize, diagnosed?

Tue Nov 11 17:43:35 PST 2008

Am Mittwoch, den 12.11.2008, 00:26 +0200 schrieb Pekka Paalanen:
> Hi,
> 
> I've been playing with nouveau/mesa branch gallium-0.1, trying to get
> trivial/tri working on nv20 (with nv10 code). When ever I resize the window,
> it ends up in an assert failure:
Hey,

I fixed nv4x recently, gallium-0.2 branch...

Ben.
> 
> #0  0xf790744f in _debug_assert_fail (expr=0xf791908f "0", file=0xf7919050 "nv20_state_emit.c", line=139, 
>     function=0xf7919034 "nv20_state_emit_framebuffer") at p_debug.c:335
> #1  0xf76fe723 in nv20_state_emit_framebuffer (nv20=0x805ef68) at nv20_state_emit.c:139
> #2  0xf76fef2e in nv20_emit_hw_state (nv20=0x805ef68) at nv20_state_emit.c:255
> #3  0xf76ff70b in nv20_draw_elements (pipe=0x805ef68, indexBuffer=0x0, indexSize=0, prim=4, start=0, count=3) at nv20_vbo.c:20
> #4  0xf76ff922 in nv20_draw_arrays (pipe=0x805ef68, prim=4, start=0, count=3) at nv20_vbo.c:73
> #5  0xf7743ac8 in st_draw_vbo (ctx=0x8074d18, arrays=0x80ade38, prims=0x80ac994, nr_prims=1, ib=0x0, min_index=0, max_index=2)
>     at state_tracker/st_draw.c:634
> #6  0xf782c140 in vbo_exec_vtx_flush (exec=0x80ac870) at vbo/vbo_exec_draw.c:248
> #7  0xf782a858 in vbo_exec_FlushVertices (ctx=0x8074d18, flags=1) at vbo/vbo_exec_api.c:752
> #8  0xf7761f39 in _mesa_Flush () at main/context.c:1815
> #9  0xf7e00ae6 in glFlush () at ../../../src/mesa/glapi/glapitemp.h:1170
> #10 0x08048dba in Draw () at tri.c:83
> #11 0xf7ecdc54 in processWindowWorkList (window=0x8051200) at glut_event.c:1302
> #12 0xf7ecdd3a in __glutProcessWindowWorkLists () at glut_event.c:1354
> #13 0xf7ecddb1 in glutMainLoop () at glut_event.c:1375
> #14 0x08048fd8 in main (argc=1, argv=0xfff760e4) at tri.c:132
> 
> The struct pipe_surface used there contains garbage:
> 
> (gdb) print *fb->cbufs[0]
> $4 = {buffer = 0xf7d901a0, format = -136773216, status = 1, clear_value = 0, width = 189, height = 181, block = {size = 4, width = 1, 
>     height = 1}, nblocksx = 189, nblocksy = 181, stride = 768, layout = 0, offset = 0, refcount = 0, usage = 12, winsys = 0x0, 
>   texture = 0x0, face = 0, level = 0, zslice = 88}
> 
> I tried to track what writes the insane value to the format field,
> and always came up with malloc and free, which didn't make any sense.
> 
> Then I ran it with valgrind:
> 
> ==5322== Invalid read of size 4
> ==5322==    at 0x7B6887C: pipe_surface_reference (p_inlines.h:93)
> ==5322==    by 0x7B6881F: copy_framebuffer_state (cso_context.c:781)
> ==5322==    by 0x7B68962: cso_set_framebuffer (cso_context.c:791)
> ==5322==    by 0x7AB5441: update_framebuffer_state (st_atom_framebuffer.c:147)
> ==5322==    by 0x7AB3E41: st_validate_state (st_atom.c:188)
> ==5322==    by 0x7ABDA2E: st_clear (st_cb_clear.c:517)
> ==5322==    by 0x7B0BED5: _mesa_Clear (clear.c:177)
> ==5322==    by 0x47F25FA: glClear (glapitemp.h:1100)
> ==5322==    by 0x8048CE9: Draw (tri.c:72)
> ==5322==    by 0x46F2C53: processWindowWorkList (glut_event.c:1302)
> ==5322==    by 0x46F2D39: __glutProcessWindowWorkLists (glut_event.c:1354)
> ==5322==    by 0x46F2DB0: glutMainLoop (glut_event.c:1375)
> ==5322==  Address 0x7638a0c is 68 bytes inside a block of size 84 free'd
> ==5322==    at 0x46D99EC: free (vg_replace_malloc.c:323)
> ==5322==    by 0x797E17D: nv20_miptree_surface_release (nv20_miptree.c:144)
> ==5322==    by 0x7AC0C69: pipe_surface_reference (p_inlines.h:95)
> ==5322==    by 0x7AC07D3: st_renderbuffer_alloc_storage (st_cb_fbo.c:97)
> ==5322==    by 0x7A07844: _mesa_resize_framebuffer (framebuffer.c:292)
> ==5322==    by 0x79C2F65: st_resize_framebuffer (st_framebuffer.c:147)
> ==5322==    by 0x7950CB3: nouveau_context_bind (nouveau_context.c:324)
> ==5322==    by 0x794A6E4: DoBindContext (dri_util.c:380)
> ==5322==    by 0x794A78E: driBindContext (dri_util.c:413)
> ==5322==    by 0x47C14B2: BindContextWrapper (glxext.c:1621)
> ==5322==    by 0x47C15F2: MakeContextCurrent (glxext.c:1675)
> ==5322==    by 0x47C1927: glXMakeCurrent (glxext.c:1797)
> ==5322== 
> 
> and a couple more invalid reads, and a segfault.
> My theory is: when the window resize occurs, somewhere along the path
> 
> st_renderbuffer_alloc_storage(GLcontext * ctx, struct gl_renderbuffer *rb,
>                               GLenum internalFormat,
>                               GLuint width, GLuint height)
> {
>    struct pipe_context *pipe = ctx->st->pipe;
>    struct st_renderbuffer *strb = st_renderbuffer(rb);
>    struct pipe_texture template;
>    unsigned surface_usage;
> 
>    /* Free the old surface and texture
>     */
>    pipe_surface_reference( &strb->surface, NULL );
>    pipe_texture_reference( &strb->texture, NULL );
> 
> is executed, which AFAICT frees the pipe_surface. My backtraces
> show it really is freed, since the pipe_surface::refcount = 1
> and decreases to zero. Then the new pipe_surface is allocated
> and things are looking good so far. This is seen in the
> "freed" part of the valgrind stack trace.
> 
> Then we get to the first glClear after resize, and hit the CSO
> cache. The cache still has a pointer to the already freed pipe_surface
> (well, the memory address is the same), and frees it again. See the
> valgring stack trace, the invalid read happens in three places
> 
> pipe_surface_reference(struct pipe_surface **ptr, struct pipe_surface *surf)
> {
>    /* bump the refcount first */
>    if (surf) 
>       surf->refcount++;
> 
>    if (*ptr) {
> 
>       /* There are currently two sorts of surfaces... This needs to be
>        * fixed so that all surfaces are views into a texture.
>        */
> #->   if ((*ptr)->texture) {
>          struct pipe_screen *screen = (*ptr)->texture->screen;
>          screen->tex_surface_release( screen, ptr );
>       }
>       else {
> #->      struct pipe_winsys *winsys = (*ptr)->winsys;
> #->      winsys->surface_release(winsys, ptr);
>       }
> 
> and then segfaults. So there's a free too much, or something
> forgets to increment the reference count. Or something else.
> 
> What do you say, how do we fix it, is this the bug?
> 
> Note, that nouveau/mesa gallium-0.1 has been last synced to mesa/mesa
> Sept 18th, so maybe you have already fixed it? Or not?
> 
> I looked at the diffs and logs, but didn't see a fix.
> 
> 
> Thanks.
>