[Intel-gfx] [RFC] drm/i915: Temporarily go realtime when polling PCODE

Thu Feb 23 13:00:32 UTC 2017

On 23/02/2017 12:01, Imre Deak wrote:
> On Thu, Feb 23, 2017 at 09:37:29AM +0000, Tvrtko Ursulin wrote:
>> [...]
>> Having read the spec I think I see both sides now.
>>
>> Spec is actually suggesting we should busy-retry the pcode request for 3ms
>> in this case.
>
> Well, retry for 3ms without setting any minimum for the number of
> requests. That couldn't be guaranteed anyway due to scheduling etc, and
> would be a strange ABI. Later Art Runyan clarified this in the way it's
> described in the code comment: What is required is two requests at
> least 3ms apart. The first request is queued by the firmware and the
> second request signals completion.

Why is our loop then spamming the hardware every 10us with requests? 
Perhaps it could be counter-productive? A single sleeping loop with a 
long timeout and a 3ms period wouldn't work? Like:

	ret = _wait_for(COND, 50 * 1000, timeout_base_ms * 1000)

?

>>
>> It doesn't say how many retries we are supposed to do and how it internally
>> operates, which makes me unsure if our first more relaxed polling is perhaps
>> causing or contributing to the issue.
>>
>> One thing where we don't follow the spec is the timeout for the
>> GEN6_PCODE_READY poll which spec says should be 150us and not 500ms. I don't
>> know if this timeout was trigger in the bug reports?
>
> No this PCODE_READY poll always succeeds, it's the reply/reply_mask
> response which doesn't get set in time.

Yes I know, I was just thinking if it takes more than 2us it then falls 
back to scheduling & usleep_range. That was at the time I was thinking 
it is really important to poll rapidly. Since you explained above it is 
just the opposite I agree this part is not a problem. It still may make 
sense to wait for that bit for a shorter period as per bspec.

[snip]

>> But regardless, the fact that the fallback busy loop needs up to 34ms as
>> well makes the last bit from the above a bit uncertain. Only if the
>> non-compliant polling we do somehow confuses the hardware and then we end up
>> having to busy poll longer than we normally would. Probably unlikely.
>
> I'm trying to get more info based on all this (in particular the KBL
> problem) from Art. Until that I'd suggest increasing the WA timeout to
> 50ms, since that solved the problem for the bug reporter. We could fix
> things/add more scaffolding if more evidence comes up, or there is a new
> bug report.

Yes sure I think I replied before that it is fine by me to push a 50ms 
fix for stable.

Regards,

Tvrtko