[PATCH v5 01/10] drm/v3d: Fix the MMU flush order

Wed Sep 18 11:54:40 UTC 2024

Hi Iago,

On 9/4/24 05:24, Iago Toral wrote:
> Hi Maira,
> 
> El jue, 29-08-2024 a las 10:05 -0300, Maíra Canal escribió:
>> We must first flush the MMU cache and then, flush the TLB, not the
>> other
>> way around. Currently, we can see a race condition between the MMU
>> cache
>> and the TLB when running multiple rendering processes at the same
>> time.
>> This is evidenced by MMU errors triggered by the IRQ.
>>
>> Fix the MMU flush order by flushing the MMU cache and then the TLB.
> 
> the patch is making 2 changes, it is changing the ordering of the
> flushes but also the fact that now we wait for the first flush to
> commplete before starting the second while the previous version would
> start both flushes and then wait for both to complete. The commit log
> seems to suggest that the first change is the one that fixes the issue
> but I wonder if that is really what is happening.
> 
> Also, have you tested keeping the original order of operations but with
> interleaved waits like we do here? Either way, I think we probably

I just tested keeping the order of the operations but using interleaved
waits and I end up having MMU errors that crash the GPU. Changing the
order of the operations was a deliberated choice due the GPU design and
and waiting for each of those to finish is needed to avoid race
conditions.

> should emphasized more in the committ log the fact that we are now
> waiting for each flush to complete before starting the other flush.
> 

But you are absolutely right. I need to mention it on the commit log.
I'll add this information in the next version.

> 
>>
>> Fixes: 57692c94dcbe ("drm/v3d: Introduce a new DRM driver for
>> Broadcom V3D V3.x+")
>> Signed-off-by: Maíra Canal <mcanal at igalia.com>
>> ---
>>   drivers/gpu/drm/v3d/v3d_mmu.c | 29 ++++++++++-------------------
>>   1 file changed, 10 insertions(+), 19 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/v3d/v3d_mmu.c
>> b/drivers/gpu/drm/v3d/v3d_mmu.c
>> index 14f3af40d6f6..06bb44c7f35d 100644
>> --- a/drivers/gpu/drm/v3d/v3d_mmu.c
>> +++ b/drivers/gpu/drm/v3d/v3d_mmu.c
>> @@ -32,32 +32,23 @@ static int v3d_mmu_flush_all(struct v3d_dev *v3d)
>>   {
>>   	int ret;
>>   
>> -	/* Make sure that another flush isn't already running when
>> we
>> -	 * start this one.
>> -	 */
>> -	ret = wait_for(!(V3D_READ(V3D_MMU_CTL) &
>> -			 V3D_MMU_CTL_TLB_CLEARING), 100);
>> -	if (ret)
>> -		dev_err(v3d->drm.dev, "TLB clear wait idle pre-wait
>> failed\n");
>> -
> 
> are we certain we can't have a flush in flux when a new one comes in?

We can, but it doesn't really matter, as by the end of both requests,
the TLB will be flushed.

Best Regards,
- Maíra

> 
>> -	V3D_WRITE(V3D_MMU_CTL, V3D_READ(V3D_MMU_CTL) |
>> -		  V3D_MMU_CTL_TLB_CLEAR);
>> -
>> -	V3D_WRITE(V3D_MMUC_CONTROL,
>> -		  V3D_MMUC_CONTROL_FLUSH |
>> +	V3D_WRITE(V3D_MMUC_CONTROL, V3D_MMUC_CONTROL_FLUSH |
>>   		  V3D_MMUC_CONTROL_ENABLE);
>>   
>> -	ret = wait_for(!(V3D_READ(V3D_MMU_CTL) &
>> -			 V3D_MMU_CTL_TLB_CLEARING), 100);
>> +	ret = wait_for(!(V3D_READ(V3D_MMUC_CONTROL) &
>> +			 V3D_MMUC_CONTROL_FLUSHING), 100);
>>   	if (ret) {
>> -		dev_err(v3d->drm.dev, "TLB clear wait idle
>> failed\n");
>> +		dev_err(v3d->drm.dev, "MMUC flush wait idle
>> failed\n");
>>   		return ret;
>>   	}
>>   
>> -	ret = wait_for(!(V3D_READ(V3D_MMUC_CONTROL) &
>> -			 V3D_MMUC_CONTROL_FLUSHING), 100);
>> +	V3D_WRITE(V3D_MMU_CTL, V3D_READ(V3D_MMU_CTL) |
>> +		  V3D_MMU_CTL_TLB_CLEAR);
>> +
>> +	ret = wait_for(!(V3D_READ(V3D_MMU_CTL) &
>> +			 V3D_MMU_CTL_TLB_CLEARING), 100);
>>   	if (ret)
>> -		dev_err(v3d->drm.dev, "MMUC flush wait idle
>> failed\n");
>> +		dev_err(v3d->drm.dev, "TLB clear wait idle
>> failed\n");
> 
> I'd maybe use "MMU TLB clear wait idle failed", so we can more easily
> identify this message comes from the MMU implementation.
> 
> Iago
> 
>>   
>>   	return ret;
>>   }
>