[PATCH] drm/msm/adreno: De-spaghettify the use of memory barriers

Tue Jun 4 17:35:04 UTC 2024

On 5/14/24 20:38, Akhil P Oommen wrote:
> On Wed, May 08, 2024 at 07:46:31PM +0200, Konrad Dybcio wrote:
>> Memory barriers help ensure instruction ordering, NOT time and order
>> of actual write arrival at other observers (e.g. memory-mapped IP).
>> On architectures employing weak memory ordering, the latter can be a
>> giant pain point, and it has been as part of this driver.
>>
>> Moreover, the gpu_/gmu_ accessors already use non-relaxed versions of
>> readl/writel, which include r/w (respectively) barriers.
>>
>> Replace the barriers with a readback that ensures the previous writes
>> have exited the write buffer (as the CPU must flush the write to the
>> register it's trying to read back) and subsequently remove the hack
>> introduced in commit b77532803d11 ("drm/msm/a6xx: Poll for GBIF unhalt
>> status in hw_init").
>>
>> Fixes: b77532803d11 ("drm/msm/a6xx: Poll for GBIF unhalt status in hw_init")
>> Signed-off-by: Konrad Dybcio <konrad.dybcio at linaro.org>
>> ---
>>   drivers/gpu/drm/msm/adreno/a6xx_gmu.c |  5 ++---
>>   drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 14 ++++----------
>>   2 files changed, 6 insertions(+), 13 deletions(-)
> 
> I prefer this version compared to the v2. A helper routine is
> unnecessary here because:
> 1. there are very few scenarios where we have to read back the same
> register.
> 2. we may accidently readback a write only register.

Which would still trigger an address dependency on the CPU, no?

> 
>>
>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>> index 0e3dfd4c2bc8..4135a53b55a7 100644
>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c
>> @@ -466,9 +466,8 @@ static int a6xx_rpmh_start(struct a6xx_gmu *gmu)
>>   	int ret;
>>   	u32 val;
>>   
>> -	gmu_write(gmu, REG_A6XX_GMU_RSCC_CONTROL_REQ, 1 << 1);
>> -	/* Wait for the register to finish posting */
>> -	wmb();
>> +	gmu_write(gmu, REG_A6XX_GMU_RSCC_CONTROL_REQ, BIT(1));
>> +	gmu_read(gmu, REG_A6XX_GMU_RSCC_CONTROL_REQ);
> 
> This is unnecessary because we are polling on a register on the same port below. But I think we
> can replace "wmb()" above with "mb()" to avoid reordering between read
> and write IO instructions.

Ok on the dropping readback part

+ AFAIU from Will's response, we can drop the barrier as well

> 
>>   
>>   	ret = gmu_poll_timeout(gmu, REG_A6XX_GMU_RSCC_CONTROL_ACK, val,
>>   		val & (1 << 1), 100, 10000);
>> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>> index 973872ad0474..0acbc38b8e70 100644
>> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
>> @@ -1713,22 +1713,16 @@ static int hw_init(struct msm_gpu *gpu)
>>   	}
>>   
>>   	/* Clear GBIF halt in case GX domain was not collapsed */
>> +	gpu_write(gpu, REG_A6XX_GBIF_HALT, 0);
> 
> We need a full barrier here to avoid reordering. Also, lets add a
> comment about why we are doing this odd looking sequence.
> 
>> +	gpu_read(gpu, REG_A6XX_GBIF_HALT);
>>   	if (adreno_is_a619_holi(adreno_gpu)) {
>> -		gpu_write(gpu, REG_A6XX_GBIF_HALT, 0);
>>   		gpu_write(gpu, REG_A6XX_RBBM_GPR0_CNTL, 0);
>> -		/* Let's make extra sure that the GPU can access the memory.. */
>> -		mb();
> 
> We need a full barrier here.
> 
>> +		gpu_read(gpu, REG_A6XX_RBBM_GPR0_CNTL);
>>   	} else if (a6xx_has_gbif(adreno_gpu)) {
>> -		gpu_write(gpu, REG_A6XX_GBIF_HALT, 0);
>>   		gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, 0);
>> -		/* Let's make extra sure that the GPU can access the memory.. */
>> -		mb();
> 
> We need a full barrier here.

Not sure we do between REG_A6XX_GBIF_HALT & REG_A6XX_RBBM_(GBIF_HALT/GPR0_CNTL),
but I suppose keeping the one after REG_A6XX_RBBM_(GBIF_HALT/GPR0_CNTL) makes
sense to avoid the possibility of configuring the GPU before it can access DRAM..

> 
>> +		gpu_read(gpu, REG_A6XX_RBBM_GBIF_HALT);
>>   	}
>>   
>> -	/* Some GPUs are stubborn and take their sweet time to unhalt GBIF! */
>> -	if (adreno_is_a7xx(adreno_gpu) && a6xx_has_gbif(adreno_gpu))
>> -		spin_until(!gpu_read(gpu, REG_A6XX_GBIF_HALT_ACK));
>> -
> 
> Why is this removed?

Because it was a hack in the first place and the enforcement of GBIF
unhalt requests coming through before proceeding further removes the
necessity to check this (unless there's some hw-mandated delay we should
keep in mind, but kgsl doesn't have that and there doesn't seem to be
any from testing on 8[456]50).

Konrad