[Mesa-dev] i965 CS state tracking

Wed Sep 3 12:20:49 PDT 2014

On Wednesday, September 03, 2014 11:34:33 AM Jordan Justen wrote:
> On 2014-09-03 10:14:34, Kenneth Graunke wrote:
> > On Wednesday, September 03, 2014 09:22:12 AM Jordan Justen wrote:
> > > > 15:22:24 Kayden | jljusten: so how does the compute shader state upload work?
> > > > 15:22:35 Kayden | I know you have to issue PIPELINE_SELECT to pick the
> > > >                 | GPGPU pipe instead of 3D
> > > > 15:22:53 Kayden | but does the GPGPU pipe use a lot of the same commands?
> > > > 15:22:57 Kayden | do you have to reprogram them when switching?
> > > > 15:30:19 Kayden | I don't understand why we have to have separate
> > > >                 | dirty bit sets for compute vs. 3D
> > > > 15:30:52 Kayden | It looks like if you're called by glDispatchCompute,
> > > >                 | we should PIPELINE_SELECT to GPGPU, and do whatever
> > > >                 | state upload you need for compute
> > > > 15:31:13 Kayden | and if you're called by glDrawWhatever(), we should
> > > >                 | PIPELINE_SELECT to 3D and do the existing 3D state
> > > >                 | upload
> > > > 15:31:52 Kayden | unless things like 3DSTATE_BINDING_TABLE_POINTERS
> > > >                 | only takes effect on one pipe or the other, I
> > > >                 | guess...
> > > > 15:32:54 Kayden | but if most of the commands are different between
> > > >                 | the two pipes...
> > > > 15:34:20 Kayden | based on reading Paul's notes I don't see why he did
> > > >                 | that.
> > > > 15:55:08    krh | Kayden: there's about 3-4 commands you need to set up
> > > >                 | gpgpu
> > > > 15:55:18    krh | none of which are shared with the 3d pipeline
> > > > 15:55:36    krh | aside from the pipeline select, it's completely
> > > >                 | orthogonal
> > > > 15:56:29    krh | I guess I don't understand why we need to track all the
> > > >                 | state for all the things either
> > > > 15:57:05    krh | but we do need to make sure update_state doesn't emit
> > > >                 | dirty 3d state when we're trying to do cs and vice verse
> > > > 15:57:08    krh | versa
> > > > 15:57:41    krh | it seems like a pipeline mask should be enough
> > > > 15:58:11    krh | ie, mask the dirty flags against a
> > > >                 | pipeline_3d_flag_mask when we're emitting state for 3d
> > > 
> > > I'm not sure of Paul's thoughts.
> > > 
> > > One thought I have is, what if we decide to upload some state
> > > differently depending on the pipeline? Then when we switch, we'd need
> > > to make sure we re-emit it again for the other pipeline.
> > 
> > That's easy to handle.  In brw_upload_state, we can check
> > brw->state.current_pipeline != pipeline to know when we've changed
> > pipelines.  We can even detect if we're going from 3D -> GPGPU or
> > vice versa by checking the existing value of
> > brw->state.current_pipeline.  When the pipeline changes, we just
> > need to flag the dirty bits for any state that gets implicitly
> > clobbered.
> 
> Maintaining the list of 'any state that gets implicitly clobbered'
> separate from the atoms seems like a hassle.
>
> What about 'flagging all overlapping' state bits when changing
> pipelines? (We could scan the two state atom lists to find the
> overlap.)

Honestly, it sounds like the 3D and GPGPU pipes don't actually share any 3DSTATE commands at all.  3D requires SURFACE_STATE, SAMPLER_STATE, and a binding table to be emitted for enabled shader stages (VS, GS, HS, DS, FS).  Compute requires SURFACE_STATE, maybe SAMPLER_STATE, and a binding table to be emitted for the compute shader.  They don't actually clobber each other or overlap at all AFAICS.

So the crux of the problem boils down to this:

If we include brw_cs_abo_surfaces only in the compute atoms list, and brw_{vs,gs,fs}_abo_surfaces only in the 3D atoms list, then...when BRW_NEW_ATOMIC_BUFFER happens, either CS _or_ VS/GS/FS will get emitted, but not both...yet we'll clear the flag, since we've supposedly "handled" it.

We could solve this pretty easily: just include surface setup for all stages in both atoms lists.  We'd have to update the VS/FS atoms to do nothing if there's no currently bound vertex/fragment shader, since they'd be optional - just like we've already done with GS.

I know that you can link together VS/GS/FS/HS/DS and CS together in a single GLSL linked program, and it just uses the 3D shaders or the compute shader depending whether you use glDraw* or glDispatchCompute.  So, if you did that, we'd additionally emit the CS stage's binding table when doing 3D drawing, and the 3D binding tables when doing compute.

I might be wrong, but presumably if you link together 3D and compute shaders into a single program like that, you intend to use them together - i.e., do compute, then do 3D drawing using the results, or vice versa.  If I'm right about that, then the extra upload is not wasted - it's just done a little earlier.

> But, I think this might cause excess re-emittion in some
> scenarios. For example:
> * Change atomics from GL, mark atomics dirty
> * Render, atomics state is emitted and clean
> * Pipeline=>Compute, mark atomics dirty and re-emit
> * Pipeline=>Render, mark atomics dirty and re-emit
>   - Did we need to re-emit here?
> 
> And, how about this:
> * atomics are not used
> * Render
> * Pipeline=>Compute, mark atomics dirty and emit
>   - Did we need to emit here?
> 
> I guess these scenarios seem to indicate the 2 sets of dirty bits
> might instead be used to prevent excess state emission.
> 
> > > Does this seem explanation seem plausible?
> > 
> > No, and I think we should revert the patch.  Having two state atom
> > lists seems sensible, and passing an "enum brw_pipeline" parameter
> > to brw_upload_state also seems sensible.  But having two sets of
> > dirty bits doesn't yet make sense to me.
> 
> Can we discuss and implement an alternative approach rather than
> reverting it? I don't think it is wrong, but maybe could (perhaps) be
> done better.

There's clearly debate about whether this is the right approach, and we don't normally land patches when reviewers have strong objections until those get sorted out.  As is, it complicates the 3D state upload code, and does not add any additional benefit until ARB_compute_shader lands.  That is still several months away, as we can't enable it until ARB_image_load_store or ARB_shader_storage_buffer_object lands, which is a huge amount of complicated prerequisite work.  So I think it's premature to land this.

--Ken
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part.
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20140903/6d6a04e3/attachment.sig>