[Mesa-dev] [Mesa3d-dev] Softpipe and OpenMP

Mon Jun 8 19:00:29 PDT 2015

Am 08.06.2015 um 09:33 schrieb Chih-Sheng Lin:
> Hi,
> 
> I am working on parallelizing softpipe of Mesa-9.1 by using OpenMP.
> 
> My idea is trying to do sp_setup_tri in parallel with multiple threads.
> 
> So first in the function sp_vbuf_draw_elements, I duplicate
> setup_context for avoiding race condition
> 
> And then sp_setup_tri is set to OpenMP parallel sections with the
> separate auguments
> 
> The code is as follows:
> 
> sp_vbuf_draw_elements(struct vbuf_render *vbr, const ushort *indices,
> uint nr) 
> {
>    struct softpipe_vbuf_render *cvbr = softpipe_vbuf_render(vbr);
>    struct softpipe_context *softpipe = cvbr->softpipe;
>    const unsigned stride = softpipe->vertex_info_vbuf.size * sizeof(float);
>    const void *vertex_buffer = cvbr->vertex_buffer;
>    struct setup_context *setup = cvbr->setup;
> 
> /*duplicate setup context*/
> struct setup_context *setup_0, *setup_1;
> if(cvbr->prim == PIPE_PRIM_TRIANGLES)
> {
>     setup_0 = sp_setup_create_context(cvbr->setup);
>     setup_1 = sp_setup_create_context(cvbr->setup);
> }
> 
> ...
> 
>    case PIPE_PRIM_TRIANGLES:
> /*
>       for (i = 2; i < nr; i += 3) {
>          sp_setup_tri( setup,
>                        get_vert(vertex_buffer, indices[i-2], stride),
>                        get_vert(vertex_buffer, indices[i-1], stride),
>                        get_vert(vertex_buffer, indices[i-0], stride) );
>       }
> */
> 
> /*parallelizing sp_setup_tri by OpenMP parallel sections*/
> if(nr==6)
> {
>     #pragma omp parallel sections
>     {
>         #pragma omp section
>         sp_setup_tri(setup_0, get_vert(vertex_buffer, indices[0],
> stride), get_vert(vertex_buffer, indices[1],
> stride),get_vert(vertex_buffer, indices[2], stride));
>         #pragma omp section
>         sp_setup_tri(setup_1, get_vert(vertex_buffer, indices[3],
> stride), get_vert(vertex_buffer, indices[4],
> stride),get_vert(vertex_buffer, indices[5], stride));        
>     }
> }
>     sp_setup_destroy_context(setup_0);
>     sp_setup_destroy_context(setup_1);
> 
>       break;
> 
> And I rebuild mesa and run one test called CPUOverheadTest_onscreen from
> GFXBench 2.7.5
> 
> I set OMP_NUM_THREADS=2 and the resolution is 1280x900, it gains 10%
> performance improvements comparing to OMP_NUM_THREADS=1
> 
> However, the test result is not correct, but it works very well
> comparing to my first attempt on parallelization 
> 
> Could someone give me a guide or suggestions for this work? 
> 
> Thanks in advance!
> 
> 
> Patrick Lin

I'm not sure if that was an issue with what you tested, but note that
tris can (and often will) overlap and you need to guarantee strict ordering.
Probably easier if you'd try to parallelize the things inside
sp_setup_tri, that is the pixel shader load (each shader is ultimately
run for 2x2 pixels, so you could for instance have 4 threads, and try to
run 4 2x2 pixel shaders in parallel).
That said, trying to make softpipe faster by using OpenMP might be a
nice theoretical excercise but is probably futile. If you want somewhat
useful performance with a sw renderer try llvmpipe - which already does
have parallel pixel shader threads (though in practice the scaling is
not quite optimal once you go past a couple threads, one reason probably
being that only things after triangle setup can be run in parallel,
there's also some "unnecessary" waits).

Roland