[PATCH] drm/amdgpu: Adding wait time before reading upll control register

Tue Jun 30 07:01:10 UTC 2020

Am 30.06.20 um 00:46 schrieb Luben Tuikov:
> On 2020-06-26 1:04 p.m., Christian König wrote:
>> Am 26.06.20 um 18:12 schrieb Alex Jivin:
>>> Adding a delay between writing to UVD control register and reading from it.
>>> This is to allow the HW to process the write command.
>>>
>>> Signed-off-by: Alex Jivin <alex.jivin at amd.com>
>>> Suggested-By: Luben Tukov <luben.tuikov at amd.com>
>>> ---
>>>    drivers/gpu/drm/amd/amdgpu/si.c | 6 ++++++
>>>    1 file changed, 6 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/si.c b/drivers/gpu/drm/amd/amdgpu/si.c
>>> index 9d7b4ccd17b8..42cdc14fb79d 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/si.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/si.c
>>> @@ -1435,6 +1435,12 @@ static int si_uvd_send_upll_ctlreq(struct amdgpu_device *adev,
>>>    	/* Assert UPLL_CTLREQ */
>>>    	WREG32_P(cg_upll_func_cntl, UPLL_CTLREQ_MASK, ~UPLL_CTLREQ_MASK);
>>>    
>>> +	/* Waiting for HW to process the previous write.
>>> +	 * This is to give a chance to HW to act before
>>> +	 * the first read is done.
>>> +	 */
>>> +	mdelay(1);
>>> +
>> Mhm, that is most likely not a good idea.
>>
>> We need to issue a read after the write to make sure that the stuff is
>> send out to the hardware.
> Tracing down WREG32_P(), it seems to be writing and then reading the register,

Why do you think so?

What the macro should do is to read, apply mask and value and then write 
it back to the register:

#define WREG32_P(reg, val, mask)                \
     do {                            \
         uint32_t tmp_ = RREG32(reg);            \
         tmp_ &= (mask);                    \
         tmp_ |= ((val) & ~(mask));            \
         WREG32(reg, tmp_);                \
     } while (0)

> back to back, twice over, when deferring to PCIe space, and just writel() when in mmio
> space. (There is separate thread on this as it doesn't seem correct.)

That indeed sounds fishy, but all those registers should be in mmio space.

>
>> Adding a delay here is probably just postponing that. Do we have some
>> note in the documentation that this is necessary?
> Flushing the write buffer (by issuing a read if necessary) is different
> than waiting for the hardware to process the request.

It's not flushing the write buffer. The hardware is often build in a way 
that the next read is triggering the action instead of the write.

That's done because writes should be accepted by the hardware block 
immediately while reads have a timeout associated with it.

> The current code does flush the write buffer by reading back,
> and then delaying (both in the loop), which does achieve this,
> but it leaves a corner case as I wrote in my review.
> The corner case is that if the status
> register changes to "done" while in the last "delay()"
> we then check the loop invariant and exit, as opposed to reading
> the register and determining that it is done successfully.
>
> Current, all in pseudo-code:
>
> write()
>
> for (count = 0; count < MAX_COUNT; count++) {
> 	res = read()
> 	if (res is success) break
> 	mdelay(10)              <-- if it changes here, we miss it on the last iteration
> }
>
> Optimal:
>
> write()   ; assume write buffer flush
> mdelay(9)
> do {
> 	mdelay(1)
> 	res = read()
> } while (res != success && ++count < MAX_COUNT)
>
> This solves the corner case, and ensures a time delay of 10 for
> the hardware to process its job, but a delay of 1 for polling
> the status, as it could be done "anytime now."

That seems to assume that the ACK bits are not immediately cleared on 
the write. How do you got to that conclusion?

Processing the request can take even longer than 10ms depending on what 
is done, which reference clock is selected, the temperature of the 
hardware, sleep mode etc etc...

For example switching to bypass mode usually comes immediately, e.g. you 
write the CNTL register and you immediately get an ack back.

But as far as I know switching from bypass back to normal means that we 
need to wait for both the reference, feedback and output signal to be 
logical low and then high again at the same time.

Depending on the frequencies we got here that can take a bit of time.

> Assuming that WREG32_P() flushes the write buffer, as it seems
> that it does, the idea here is to give the hardware some time to process
> the request (from writing a value to it), but when polling to poll
> a shorter amount of time.

Not a bad idea, but this is based on the diagnostic code and already 
used for nearly a decade.

In other words this code is what the hardware engineers recommended, is 
used *much* more often outside the driver and known to be working well.

> I would've also liked to see the mutex business fixed
> as well from my original patch review, as it is easy to prove
> that it is always taken, so no need to embed it inside the if().

That is indeed a software problem which should be fixed.

Regards,
Christian.

>
> Regards,
> Luben
>
>> Christian.
>>
>>>    	/* Wait for CTLACK and CTLACK2 to get asserted */
>>>    	for (i = 0; i < SI_MAX_CTLACKS_ASSERTION_WAIT; ++i) {
>>>    		uint32_t mask = UPLL_CTLACK_MASK | UPLL_CTLACK2_MASK;