[PATCH 1/1] drm/amdgpu: Fix per-IB secure flag GFX hang

Thu Feb 27 21:15:28 UTC 2020

On 2020-02-27 6:56 a.m., Huang Rui wrote:
> On Thu, Feb 27, 2020 at 06:39:03AM +0800, Tuikov, Luben wrote:
>> Since commit "Move to a per-IB secure flag (TMZ)",
>> we've been seeing hangs in GFX. Ray H. pointed out
>> by sending a patch that we need to send FRAME
>> CONTROL stop/start back-to-back, every time we
>> flip the TMZ flag as per each IB we submit. That
>> is, when we transition from TMZ to non-TMZ we have
>> to send a stop with TMZ followed by a start with
>> non-TMZ, and similarly for transitioning from
>> non-TMZ into TMZ.
>>
>> This patch implements this, thus fixing the GFX
>> hang.
>>
>> Signed-off-by: Luben Tuikov <luben.tuikov at amd.com>
>> ---
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c   | 87 +++++++++++++++++-------
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |  5 +-
>>  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c   | 15 ++--
>>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c    | 13 ++--
>>  4 files changed, 79 insertions(+), 41 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>> index 4b2342d11520..16d6df3304d3 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>> @@ -216,40 +216,75 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs,
>>  		amdgpu_ring_emit_cntxcntl(ring, status);
>>  	}
>>  
>> -	secure = false;
>> +	/* Find the first non-preamble IB.
>> +	 */
>>  	for (i = 0; i < num_ibs; ++i) {
>>  		ib = &ibs[i];
>>  
>>  		/* drop preamble IBs if we don't have a context switch */
>> -		if ((ib->flags & AMDGPU_IB_FLAG_PREAMBLE) &&
>> -		    skip_preamble &&
>> -		    !(status & AMDGPU_PREAMBLE_IB_PRESENT_FIRST) &&
>> -		    !amdgpu_mcbp &&
>> -		    !amdgpu_sriov_vf(adev)) /* for SRIOV preemption, Preamble CE ib must be inserted anyway */
>> -			continue;
>> -
>> -		/* If this IB is TMZ, add frame TMZ start packet,
>> -		 * else, turn off TMZ.
>> -		 */
>> -		if (ib->flags & AMDGPU_IB_FLAGS_SECURE && ring->funcs->emit_tmz) {
>> -			if (!secure) {
>> -				secure = true;
>> -				amdgpu_ring_emit_tmz(ring, true);
>> -			}
>> -		} else if (secure) {
>> +		if (!(ib->flags & AMDGPU_IB_FLAG_PREAMBLE) ||
>> +		    !skip_preamble ||
>> +		    (status & AMDGPU_PREAMBLE_IB_PRESENT_FIRST) ||
>> +		    amdgpu_mcbp ||
>> +		    amdgpu_sriov_vf(adev)) /* for SRIOV preemption, Preamble CE ib must be inserted anyway */
>> +			break;
>> +	}
>> +	if (i >= num_ibs)
>> +		goto Done;
>> +	/* Setup initial TMZiness and send it off.
>> +	 */
>> +	secure = false;
>> +	if (job && ring->funcs->emit_frame_cntl) {
>> +		if (ib->flags & AMDGPU_IB_FLAGS_SECURE)
>> +			secure = true;
>> +		else
>>  			secure = false;
>> -			amdgpu_ring_emit_tmz(ring, false);
>> -		}
>> -
>> -		amdgpu_ring_emit_ib(ring, job, ib, status);
>> -		status &= ~AMDGPU_HAVE_CTX_SWITCH;
>> +		amdgpu_ring_emit_frame_cntl(ring, true, secure);
>>  	}
>> +	amdgpu_ring_emit_ib(ring, job, ib, status);
>> +	status &= ~AMDGPU_HAVE_CTX_SWITCH;
>> +	i += 1;
>> +	/* Send the rest of the IBs.
>> +	 */
>> +	if (job && ring->funcs->emit_frame_cntl) {
>> +		for ( ; i < num_ibs; ++i) {
>> +			ib = &ibs[i];
>> +
>> +			/* drop preamble IBs if we don't have a context switch */
>> +			if ((ib->flags & AMDGPU_IB_FLAG_PREAMBLE) &&
>> +			    skip_preamble &&
>> +			    !(status & AMDGPU_PREAMBLE_IB_PRESENT_FIRST) &&
>> +			    !amdgpu_mcbp &&
>> +			    !amdgpu_sriov_vf(adev)) /* for SRIOV preemption, Preamble CE ib must be inserted anyway */
>> +				continue;
> 
> Snip.
> 
>> +
>> +			if (!!secure ^ !!(ib->flags & AMDGPU_IB_FLAGS_SECURE)) {
>> +				amdgpu_ring_emit_frame_cntl(ring, false, secure);
>> +				secure = !secure;
>> +				amdgpu_ring_emit_frame_cntl(ring, true, secure);
>> +			}
> 
> That's pretty good optimization! I spend quit a few time to understand this.

I know. I know you did. It's called experience.

When I saw you v1, it was a cringe. Seriously?

> 
>>  
>> -	if (secure) {
>> -		secure = false;
>> -		amdgpu_ring_emit_tmz(ring, false);
>> +			amdgpu_ring_emit_ib(ring, job, ib, status);
>> +			status &= ~AMDGPU_HAVE_CTX_SWITCH;
>> +		}
>> +		amdgpu_ring_emit_frame_cntl(ring, false, secure);
>> +	} else {
>> +		for ( ; i < num_ibs; ++i) {
>> +			ib = &ibs[i];
>> +
>> +			/* drop preamble IBs if we don't have a context switch */
>> +			if ((ib->flags & AMDGPU_IB_FLAG_PREAMBLE) &&
>> +			    skip_preamble &&
>> +			    !(status & AMDGPU_PREAMBLE_IB_PRESENT_FIRST) &&
>> +			    !amdgpu_mcbp &&
>> +			    !amdgpu_sriov_vf(adev)) /* for SRIOV preemption, Preamble CE ib must be inserted anyway */
>> +				continue;
>> +
>> +			amdgpu_ring_emit_ib(ring, job, ib, status);
>> +			status &= ~AMDGPU_HAVE_CTX_SWITCH;
> 
> To pull the checking "job && ring->funcs->emit_frame_cntl" out of the loop,
> that make the implemenation more duplicated like below, we have to write

Yes, that is exactly what we want.

As I explained before and will explain again, and again, and again,
"job && ring->funcs->emit_frame_cntl" is static to the body of the loop,
and as such can be pulled OUT of the loop and it should.

This is a *formulaic* optimization exercise. Compilers and optimizers do it first.

To extrapolate, consider that what you'd eventually have is a HUGE long long loop
and everything inside it. Not good.

Regards,
Luben

> the same codes multiple times. I hope the implementation is more simple and
> readable. Please see my V2 patch that I just send out. We can try to use
> minimum changes to fix the issue.
> 
> 		for ( ; i < num_ibs; ++i) {
> 			ib = &ibs[i];
> 
> 			/* drop preamble IBs if we don't have a context switch */
> 			if ((ib->flags & AMDGPU_IB_FLAG_PREAMBLE) &&
> 			    skip_preamble &&
> 			    !(status & AMDGPU_PREAMBLE_IB_PRESENT_FIRST) &&
> 			    !amdgpu_mcbp &&
> 			    !amdgpu_sriov_vf(adev)) /* for SRIOV preemption, Preamble CE ib must be inserted anyway */
> 				continue;
> 
>                         ...
> 			amdgpu_ring_emit_ib(ring, job, ib, status);
> 
> Thanks,
> Ray
>