[Mesa-dev] [PATCH 3/8] i965: Store the atoms directly in the context
Ian Romanick
idr at freedesktop.org
Mon Dec 22 13:45:27 PST 2014
On 12/21/2014 09:11 PM, Kenneth Graunke wrote:
> On Friday, December 19, 2014 02:20:54 PM Ian Romanick wrote:
>> From: Ian Romanick <ian.d.romanick at intel.com>
>>
>> Instead of having an extra pointer indirection in one of the hottest
>> loops in the driver.
>>
>> On Bay Trail-D using Fedora 20 compile flags (-m64 -O2 -mtune=generic
>> for 64-bit and -m32 -march=i686 -mtune=atom for 32-bit), affects
>> Gl32Batch7:
>>
>> 32-bit: Difference at 95.0% confidence 1.98515% +/- 0.20814% (n=40)
>> 64-bit: Difference at 95.0% confidence 1.5163% +/- 0.811016% (n=60)
>>
>> Signed-off-by: Ian Romanick <ian.d.romanick at intel.com>
>> ---
>> src/mesa/drivers/dri/i965/brw_context.h | 2 +-
>> src/mesa/drivers/dri/i965/brw_state_upload.c | 19 ++++++++++++++++---
>> 2 files changed, 17 insertions(+), 4 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_context.h b/src/mesa/drivers/dri/i965/brw_context.h
>> index 92eb022..5f5f807 100644
>> --- a/src/mesa/drivers/dri/i965/brw_context.h
>> +++ b/src/mesa/drivers/dri/i965/brw_context.h
>> @@ -1384,7 +1384,7 @@ struct brw_context
>> } perfmon;
>>
>> int num_atoms;
>> - const struct brw_tracked_state **atoms;
>> + const struct brw_tracked_state atoms[64];
>
> Making this atoms[57] would save memory, and the static asserts make it easy
> to remember to increase the size if we ever add additional atoms.
I can change that easily enough.
> Honestly, it seems like there should be a better way to do this. Currently,
> the atoms list only depends on the generation number, and thus is identical
> for all GL contexts. Storing an additional 1.5 kB of redundant data in each
> GL context seems a bit wasteful.
That's an interesting idea that hadn't occurred to me. :)
> Would it help to make brw_upload_state cache the array and size?
>
> void
> brw_upload_state(struct brw_context *brw)
> {
> static const struct brw_tracked_state *atoms = brw->atoms;
> static const int num_atoms = brw->num_atoms;
>
> ...
>
> for (i = 0; i < num_atoms; i++) {
> ...use atoms[i]...
> }
>
> ...
> }
>
> It seems like that would achieve the same effect - one indirection.
>
> One downside to that approach is that we couldn't have separate atom
> lists for API_CORE vs. API_COMPAT (while we could using your approach).
> (We could skip stippling for core contexts, and skip GS/HS/DS for legacy
> contexts, which might help a tiny bit...)
>
>> /* If (INTEL_DEBUG & DEBUG_BATCH) */
>> struct {
>> diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c b/src/mesa/drivers/dri/i965/brw_state_upload.c
>> index a579781..9a5d49a 100644
>> --- a/src/mesa/drivers/dri/i965/brw_state_upload.c
>> +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c
>> @@ -357,6 +357,11 @@ void brw_init_state( struct brw_context *brw )
>> const struct brw_tracked_state **atoms;
>> int num_atoms;
>>
>> + STATIC_ASSERT(ARRAY_SIZE(gen4_atoms) <= ARRAY_SIZE(brw->atoms));
>> + STATIC_ASSERT(ARRAY_SIZE(gen6_atoms) <= ARRAY_SIZE(brw->atoms));
>> + STATIC_ASSERT(ARRAY_SIZE(gen7_atoms) <= ARRAY_SIZE(brw->atoms));
>> + STATIC_ASSERT(ARRAY_SIZE(gen8_atoms) <= ARRAY_SIZE(brw->atoms));
>
> I do like the static asserts.
>
>> +
>> brw_init_caches(brw);
>>
>> if (brw->gen >= 8) {
>> @@ -373,9 +378,17 @@ void brw_init_state( struct brw_context *brw )
>> num_atoms = ARRAY_SIZE(gen4_atoms);
>> }
>>
>> - brw->atoms = atoms;
>> brw->num_atoms = num_atoms;
>>
>> + /* This is to work around brw_context::atoms being declared const. We want
>> + * it to be const, but it needs to be initialized somehow!
>> + */
>> + struct brw_tracked_state *context_atoms =
>> + (struct brw_tracked_state *) &brw->atoms[0];
>
> Ugly, but... not sure what else to suggest.
I was going to s/context_atoms/brw->atoms[i]/, but I wanted the patch to
be easier to review. There was also a follow-on patch that uses some
SSE3 instructions in the loop, but it was giving mixed results. I'm
holding that back to send out later.
> I suppose this patch is fine. It does offer a small CPU overhead reduction in
> the hottest path, for a small memory usage penalty. I think we can do better,
> but I don't know that it makes sense to block your patch over nebulous ideas
> for future improvements. We can do those in the future...
>
> Reviewed-by: Kenneth Graunke <kenneth at whitecape.org>
>
>> +
>> + for (int i = 0; i < num_atoms; i++)
>> + context_atoms[i] = *atoms[i];
>> +
>> while (num_atoms--) {
>> assert((*atoms)->dirty.mesa | (*atoms)->dirty.brw);
>> assert((*atoms)->emit);
>> @@ -607,7 +620,7 @@ void brw_upload_state(struct brw_context *brw)
>> prev = *state;
>>
>> for (i = 0; i < brw->num_atoms; i++) {
>> - const struct brw_tracked_state *atom = brw->atoms[i];
>> + const struct brw_tracked_state *atom = &brw->atoms[i];
>> struct brw_state_flags generated;
>>
>> if (check_state(state, &atom->dirty)) {
>> @@ -627,7 +640,7 @@ void brw_upload_state(struct brw_context *brw)
>> }
>> else {
>> for (i = 0; i < brw->num_atoms; i++) {
>> - const struct brw_tracked_state *atom = brw->atoms[i];
>> + const struct brw_tracked_state *atom = &brw->atoms[i];
>>
>> if (check_state(state, &atom->dirty)) {
>> atom->emit(brw);
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: OpenPGP digital signature
URL: <http://lists.freedesktop.org/archives/mesa-dev/attachments/20141222/d364f0f2/attachment.sig>
More information about the mesa-dev
mailing list