[Intel-gfx] Question regarding forcewake in i915
Dave Gordon
david.s.gordon at intel.com
Mon Jan 12 09:47:10 PST 2015
On 07/01/15 08:13, Jaeyong Yoo wrote:
> Thanks a lot. It is very helpful. Couple of follow-up questions below.
>
>> -----Original Message-----
>> From: Dave Gordon [mailto:david.s.gordon at intel.com]
>> Sent: Wednesday, January 07, 2015 12:19 AM
>> To: jaeyong.yoo at samsung.com
>> Cc: intel-gfx at lists.freedesktop.org
>> Subject: Re: [Intel-gfx] Question regarding forcewake in i915
>>
>> On 22/12/14 12:26, 유재용 wrote:
>>> Hello intel-gfx,
>>>
>>> I'm reading i915 gpu drivers and find myself quite hard to understand
>>> about forcewake concepts.
>>>
>>> I understand that it is something with the energy efficiency so
>>> related to ACPI. And it looks like forcewake is working as a pair (get
>>> and put).
>>> In the "get" part, what it first does it waiting on FORCEWAKE_ACK_HSW
>>> register (in case of haswell).
>>> And then, it writes something to FORCEWAKE_MT register, read from ECOBUS.
>>> And again, it waits on FORCEWAKE_ACK_HSW again!
>>> It becomes more confusing when it comes to put.
>>> In the "put" part, what it does it writing to FORCEWAKE_MT register
>>> and read from ECOBUS.
>>>
>>> I tried to find some good reading materials about this forcewake, but
>>> what I found was a series of patches in this mailing list. (which are
>>> quite hard to follow from the begining) Could you explain about the
>>> concept of FORCEWAKE and possibly the magic tricks on these get and
>>> put?
>>>
>>> Thanks,
>>> Jaeyong
>>
>> Hi Jaeyong,
>>
>> FORCEWAKE details vary a little from one chip to another, so this is only
>> a general description, but essentially setting one or more bits in the
>> FORCEWAKE register(s) prevents some or all of the power domains from going
>> into the deeper idle (sleep) states (and forces them out of the sleep
>> state if they're already asleep). Clearing the bit(s) allows the affected
>> parts to go to sleep again.
>>
>> The FORCEWAKE_ACK register(s) contain one or more bits which reflect the
>> internal state, and so acknowledge that the most recent write to the
>> corresponding FORCEWAKE register has been accepted and acted upon. It can
>> take a while for a portion of the chip to wake up, so after setting a
>> FORCEWAKE bit we have to spin-wait until it's taken effect.
>>
>> So, the general algorithm for accessing some part of the chip that may be
>> asleep is:
>> 1) set the relevant bit of (a) FORCEWAKE register
>> 2) poll (matching) FORCEWAKE_ACK until the write is acknowledged
>> 3) access the chip (this can encompass several reads and writes)
>> 4) clear the FORCEWAKE bit that we set earlier
>> 5) poll FORCEWAKE_ACK again until this write is acknowledged
>>
>> Now for extra confusion, there are a few more details:
>> * because reads and writes can in some cases be reordered, we
>> need to force the write to FORCEWAKE to complete before the
>> busy-polling of FORCEWAKE_ACK. This is the sole purpose of the
>> read of the ECOBUS register, which is used just because it
>> happens to lie in the same cacheline as FORCEWAKE.
>>
>> * we can choose not to poll for FORCEWAKE_ACK clear in step (5).
>> Instead, we can just leave the chip to go back to sleep while
>> we get on with other things. But in that case, we might come
>> back and try to wake the chip again before it's finished
>> responding to the write in step (4). So if we don't poll at
>> the end of the sequence, we have to poll at the beginning
>> instead; in other words, move step (5) to before step (1).
>
> I see we can move step (5) before step (1). But, I don't understand why
> we have to do this. For instance, if we put step (5) right after step (4),
> does the chip have to wake up for processing the polling (5)?
No, polling these registers doesn't affect the wake state; they're in a
power domain that's not itself controlled by FORCEWAKE.
Either sequence (1-2-3-4-5 or 5-1-2-3-4) is valid. But if we use the
former sequence, the CPU will be busy polling in step 5 for however long
it takes the GPU to finish whatever else it's doing internally and then
acknowledge the write to FORCEWAKE, which could take a while.
By moving step 5 to the beginning of the sequence, the CPU can get on
with unrelated tasks during this time, so in general by the time it gets
round to needing to access FORCEWAKE again the previous write will have
completed and the CPU will see the ACK on the first read. So the
modified sequence allows greater parallelism between GPU and CPU.
[Aside]: if the driver needs to make a whole sequence of accesses, then
it's better to turn on FORCEWAKE once and hold it across the whole
sequence and then release it at the end, rather than setting it around
each access individually. See execlists_elsp_write() in intel_lrc.c for
an example.
[/Aside]
> And, additionally, I saw calling "__gen6_gt_wait_for_thread_c0" after step (2).
> Does it mean after FORCEWAKE_ACK is acknowledged, the hardware (possible ACPI)
> Sets the thread to C0 state?
> And is it noticible via GEN6_GT_THREAD_STATUS_REG (0x13805c)?
>
> Thanks,
> Jaeyong
I'm not the expert on that, but it /looks/ like the intent is for the
driver to wait when waking up the GPU not only for the write to the
FORCEWAKE register to be acknowledged, but also for the GT unit to be
fully active (which might take longer).
It polls the register you mention, and the comment suggests that if we
don't wait here, other registers read /via/ the GT unit might appear to
be zero when they aren't really.
I note that it's described as a workaround on SNB/IVB/HSW only, so it
may be that the original expectation was that seeing FORCEWAKE_ACK set
meant that the chip was ready for access, but it turned out that in some
circumstances the GT unit took longer than expected to become ready and
so it had to be polled separately.
.Dave.
More information about the Intel-gfx
mailing list