[Intel-gfx] [RFC] drm/i915: Temporarily go realtime when polling PCODE

Thu Feb 23 12:01:40 UTC 2017

On Thu, Feb 23, 2017 at 09:37:29AM +0000, Tvrtko Ursulin wrote:
> [...]
> Having read the spec I think I see both sides now.
> 
> Spec is actually suggesting we should busy-retry the pcode request for 3ms
> in this case.

Well, retry for 3ms without setting any minimum for the number of
requests. That couldn't be guaranteed anyway due to scheduling etc, and
would be a strange ABI. Later Art Runyan clarified this in the way it's
described in the code comment: What is required is two requests at
least 3ms apart. The first request is queued by the firmware and the
second request signals completion.

> 
> It doesn't say how many retries we are supposed to do and how it internally
> operates, which makes me unsure if our first more relaxed polling is perhaps
> causing or contributing to the issue.
> 
> One thing where we don't follow the spec is the timeout for the
> GEN6_PCODE_READY poll which spec says should be 150us and not 500ms. I don't
> know if this timeout was trigger in the bug reports?

No this PCODE_READY poll always succeeds, it's the reply/reply_mask
response which doesn't get set in time.

> If not then it is not
> the direct issue. But could be a contributing one, so the question is why we
> decided to do it and shouldn't we change this one to the 150us busy wait
> instead (add wait_for_register_fw_us)?

Haven't noticed this, but I doubt this is a problem, see below.

> 
> Another thing is the 10-20us retry for the top level PCODE retry - spec does
> not mention we should wait before retrying so is this our decision to be
> nicer to the system?
> 
> In either case, if the poll for GEN6_PCODE_READY is >2us (busy spin limit
> before going to sleeping poll), and the higher level PCODE retry ends up
> much longer than the 10-20us written in the code, first due hardware taking
> longer than 2us to respond, and both due overall CPU load and scheduling
> latencies, we would be drifting away from what is prescribed in the spec.

IIUC all of the above boils down to whether or not the ABI requires us
some exact timing on when we are sending the requests and exactly how
long we retry. I think the kernel can't provide such guarantees (at
least not in an obvious way) and the only - sane - requirement is the
above two requests (queue+completion check) with a _minimum_ amount of
retry time.

So, I suspect the problem is in the firmware where it is either occupied
with something for more than the specified 3ms timeout, or that it goes
idle taking a long time to wake to service our request unless we send a
burst of requests to keep it awake. That's what our WA is trying to do.
There could be some other trick that would prevent this issue, like
disabling CPU C states for the duration of the poll, I haven't tried
this yet.

> But regardless, the fact that the fallback busy loop needs up to 34ms as
> well makes the last bit from the above a bit uncertain. Only if the
> non-compliant polling we do somehow confuses the hardware and then we end up
> having to busy poll longer than we normally would. Probably unlikely.

I'm trying to get more info based on all this (in particular the KBL
problem) from Art. Until that I'd suggest increasing the WA timeout to
50ms, since that solved the problem for the bug reporter. We could fix
things/add more scaffolding if more evidence comes up, or there is a new
bug report.

--Imre