[Intel-gfx] [PATCH 3/3] drm/i915: Do not serialize forcewake acquire across domains
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Tue Apr 5 09:02:28 UTC 2016
On 04/04/16 20:07, Chris Wilson wrote:
> On Mon, Apr 04, 2016 at 05:51:11PM +0100, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>>
>> On platforms with multiple forcewake domains it seems more efficient
>> to request all desired ones and then to wait for acks to avoid
>> needlessly serializing on each domain.
>
> Not convinced since we have more machines with one domain than two. What
> I did was to compact the domains array so that we only iterated over the
> known set - but that feels overkill when we only have two domains today.
>
> For the same reason (only one machine with two domains), I didn't think
> seperate functions to iterate over one domain and another to iterate
> over all was worth it.
>
> What you can do though is remove an excess posting read from
> fw_domains_put.
>
> Compared to the cost of a register access (the spinlock irq mostly) the
> iterator doesn't strike me as being that worthwhile an optimisation
> target.
Correct, I thought we agreed that the majority of the CPU time
attributed to fw_domains_get is from the busy spinning while waiting on
the ack from the GPU.
This patch is not optimising the iterator, but requests all domains to
be woken up and then waits for acks. It changes the time spent busy
spinning from Td1 + ... + Td2 to max(Td1...Tdn).
Yes it is only interesting for platforms with more than one fw domain.
But since we agreed iterator is not significant, the fact that it adds
two loops* over the array should not be noticeable vs. the gain for
multi-fw domain machines (which will be more and more of as time goes by).
Regards,
Tvrtko
* Also because 2/3 from this serious has shrunk the iterator
considerably, even with two iterations fw_domains_get remains pretty
much the same size now with two loops, vs one loop before it.
More information about the Intel-gfx
mailing list