[Intel-gfx] [PATCH 3/3] drm/i915: Do not serialize forcewake acquire across domains

Tue Apr 5 09:02:28 UTC 2016

On 04/04/16 20:07, Chris Wilson wrote:
> On Mon, Apr 04, 2016 at 05:51:11PM +0100, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>
>> On platforms with multiple forcewake domains it seems more efficient
>> to request all desired ones and then to wait for acks to avoid
>> needlessly serializing on each domain.
>
> Not convinced since we have more machines with one domain than two. What
> I did was to compact the domains array so that we only iterated over the
> known set - but that feels overkill when we only have two domains today.
>
> For the same reason (only one machine with two domains), I didn't think
> seperate functions to iterate over one domain and another to iterate
> over all was worth it.
>
> What you can do though is remove an excess posting read from
> fw_domains_put.
>
> Compared to the cost of a register access (the spinlock irq mostly) the
> iterator doesn't strike me as being that worthwhile an optimisation
> target.

Correct, I thought we agreed that the majority of the CPU time 
attributed to fw_domains_get is from the busy spinning while waiting on 
the ack from the GPU.

This patch is not optimising the iterator, but requests all domains to 
be woken up and then waits for acks. It changes the time spent busy 
spinning from Td1 + ... + Td2 to max(Td1...Tdn).

Yes it is only interesting for platforms with more than one fw domain. 
But since we agreed iterator is not significant, the fact that it adds 
two loops* over the array should not be noticeable vs. the gain for 
multi-fw domain machines (which will be more and more of as time goes by).

Regards,

Tvrtko

* Also because 2/3 from this serious has shrunk the iterator 
considerably, even with two iterations fw_domains_get remains pretty 
much the same size now with two loops, vs one loop before it.