[Mesa-dev] [PATCH 3/8] i965: Store the atoms directly in the context

Ian Romanick idr at freedesktop.org
Mon Dec 22 13:45:27 PST 2014


On 12/21/2014 09:11 PM, Kenneth Graunke wrote:
> On Friday, December 19, 2014 02:20:54 PM Ian Romanick wrote:
>> From: Ian Romanick <ian.d.romanick at intel.com>
>>
>> Instead of having an extra pointer indirection in one of the hottest
>> loops in the driver.
>>
>> On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic
>> for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects
>> Gl32Batch7:
>>
>> 32-bit: Difference at 95.0% confidence 1.98515% +/- 0.20814% (n=40)
>> 64-bit: Difference at 95.0% confidence 1.5163% +/- 0.811016% (n=60)
>>
>> Signed-off-by: Ian Romanick <ian.d.romanick at intel.com>
>> ---
>>  src/mesa/drivers/dri/i965/brw_context.h      |  2 +-
>>  src/mesa/drivers/dri/i965/brw_state_upload.c | 19 ++++++++++++++++---
>>  2 files changed, 17 insertions(+), 4 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h
>> index 92eb022..5f5f807 100644
>> --- a/src/mesa/drivers/dri/i965/brw_context.h
>> +++ b/src/mesa/drivers/dri/i965/brw_context.h
>> @@ -1384,7 +1384,7 @@ struct brw_context
>>     } perfmon;
>>  
>>     int num_atoms;
>> -   const struct brw_tracked_state **atoms;
>> +   const struct brw_tracked_state atoms[64];
> 
> Making this atoms[57] would save memory, and the static asserts make it easy
> to remember to increase the size if we ever add additional atoms.

I can change that easily enough.

> Honestly, it seems like there should be a better way to do this.  Currently,
> the atoms list only depends on the generation number, and thus is identical
> for all GL contexts.  Storing an additional 1.5 kB of redundant data in each
> GL context seems a bit wasteful.

That's an interesting idea that hadn't occurred to me. :)

> Would it help to make brw_upload_state cache the array and size?
> 
> void
> brw_upload_state(struct brw_context *brw)
> {
>    static const struct brw_tracked_state *atoms = brw->atoms;
>    static const int num_atoms = brw->num_atoms;
> 
>    ...
> 
>    for (i = 0; i < num_atoms; i++) {
>       ...use atoms[i]...
>    }
> 
>    ...
> }
> 
> It seems like that would achieve the same effect - one indirection.
> 
> One downside to that approach is that we couldn't have separate atom
> lists for API_CORE vs. API_COMPAT (while we could using your approach).
> (We could skip stippling for core contexts, and skip GS/HS/DS for legacy
> contexts, which might help a tiny bit...)
> 
>>     /* If (INTEL_DEBUG & DEBUG_BATCH) */
>>     struct {
>> diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c b/src/mesa/drivers/dri/i965/brw_state_upload.c
>> index a579781..9a5d49a 100644
>> --- a/src/mesa/drivers/dri/i965/brw_state_upload.c
>> +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c
>> @@ -357,6 +357,11 @@ void brw_init_state( struct brw_context *brw )
>>     const struct brw_tracked_state **atoms;
>>     int num_atoms;
>>  
>> +   STATIC_ASSERT(ARRAY_SIZE(gen4_atoms) <= ARRAY_SIZE(brw->atoms));
>> +   STATIC_ASSERT(ARRAY_SIZE(gen6_atoms) <= ARRAY_SIZE(brw->atoms));
>> +   STATIC_ASSERT(ARRAY_SIZE(gen7_atoms) <= ARRAY_SIZE(brw->atoms));
>> +   STATIC_ASSERT(ARRAY_SIZE(gen8_atoms) <= ARRAY_SIZE(brw->atoms));
> 
> I do like the static asserts.
> 
>> +
>>     brw_init_caches(brw);
>>  
>>     if (brw->gen >= 8) {
>> @@ -373,9 +378,17 @@ void brw_init_state( struct brw_context *brw )
>>        num_atoms = ARRAY_SIZE(gen4_atoms);
>>     }
>>  
>> -   brw->atoms = atoms;
>>     brw->num_atoms = num_atoms;
>>  
>> +   /* This is to work around brw_context::atoms being declared const.  We want
>> +    * it to be const, but it needs to be initialized somehow!
>> +    */
>> +   struct brw_tracked_state *context_atoms =
>> +      (struct brw_tracked_state *) &brw->atoms[0];
> 
> Ugly, but... not sure what else to suggest.

I was going to s/context_atoms/brw->atoms[i]/, but I wanted the patch to
be easier to review.  There was also a follow-on patch that uses some
SSE3 instructions in the loop, but it was giving mixed results.  I'm
holding that back to send out later.

> I suppose this patch is fine.  It does offer a small CPU overhead reduction in
> the hottest path, for a small memory usage penalty.  I think we can do better,
> but I don't know that it makes sense to block your patch over nebulous ideas
> for future improvements.  We can do those in the future...
> 
> Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>
> 
>> +
>> +   for (int i = 0; i < num_atoms; i++)
>> +      context_atoms[i] = *atoms[i];
>> +
>>     while (num_atoms--) {
>>        assert((*atoms)->dirty.mesa | (*atoms)->dirty.brw);
>>        assert((*atoms)->emit);
>> @@ -607,7 +620,7 @@ void brw_upload_state(struct brw_context *brw)
>>        prev = *state;
>>  
>>        for (i = 0; i < brw->num_atoms; i++) {
>> -	 const struct brw_tracked_state *atom = brw->atoms[i];
>> +	 const struct brw_tracked_state *atom = &brw->atoms[i];
>>  	 struct brw_state_flags generated;
>>  
>>  	 if (check_state(state, &atom->dirty)) {
>> @@ -627,7 +640,7 @@ void brw_upload_state(struct brw_context *brw)
>>     }
>>     else {
>>        for (i = 0; i < brw->num_atoms; i++) {
>> -	 const struct brw_tracked_state *atom = brw->atoms[i];
>> +	 const struct brw_tracked_state *atom = &brw->atoms[i];
>>  
>>  	 if (check_state(state, &atom->dirty)) {
>>  	    atom->emit(brw);


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: OpenPGP digital signature
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20141222/d364f0f2/attachment.sig>


More information about the mesa-dev mailing list