[PATCH v10 07/11] drm/etnaviv: Add support for the dma coherent device

Sun Jun 25 03:51:21 UTC 2023

Hi,

On 2023/6/22 01:45, Lucas Stach wrote:
> Am Donnerstag, dem 22.06.2023 um 01:21 +0800 schrieb Sui Jingfeng:
>> Hi,
>>
>> On 2023/6/21 23:58, Lucas Stach wrote:
>>> Am Mittwoch, dem 21.06.2023 um 23:30 +0800 schrieb Sui Jingfeng:
>>>> Hi,
>>>>
>>>> On 2023/6/21 18:00, Lucas Stach wrote:
>>>>>>     		dma_sync_sgtable_for_cpu(dev->dev, etnaviv_obj->sgt,
>>>>>>     					 etnaviv_op_to_dma_dir(op));
>>>>>>     		etnaviv_obj->last_cpu_prep_op = op;
>>>>>> @@ -408,8 +421,9 @@ int etnaviv_gem_cpu_fini(struct drm_gem_object *obj)
>>>>>>     {
>>>>>>     	struct drm_device *dev = obj->dev;
>>>>>>     	struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj);
>>>>>> +	struct etnaviv_drm_private *priv = dev->dev_private;
>>>>>>     
>>>>>> -	if (etnaviv_obj->flags & ETNA_BO_CACHED) {
>>>>>> +	if (!priv->dma_coherent && etnaviv_obj->flags & ETNA_BO_CACHED) {
>>>>>>     		/* fini without a prep is almost certainly a userspace error */
>>>>>>     		WARN_ON(etnaviv_obj->last_cpu_prep_op == 0);
>>>>>>     		dma_sync_sgtable_for_device(dev->dev, etnaviv_obj->sgt,
>>>>>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c b/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
>>>>>> index 3524b5811682..754126992264 100644
>>>>>> --- a/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
>>>>>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c
>>>>>> @@ -112,11 +112,16 @@ static const struct etnaviv_gem_ops etnaviv_gem_prime_ops = {
>>>>>>     struct drm_gem_object *etnaviv_gem_prime_import_sg_table(struct drm_device *dev,
>>>>>>     	struct dma_buf_attachment *attach, struct sg_table *sgt)
>>>>>>     {
>>>>>> +	struct etnaviv_drm_private *priv = dev->dev_private;
>>>>>>     	struct etnaviv_gem_object *etnaviv_obj;
>>>>>>     	size_t size = PAGE_ALIGN(attach->dmabuf->size);
>>>>>> +	u32 cache_flags = ETNA_BO_WC;
>>>>>>     	int ret, npages;
>>>>>>     
>>>>>> -	ret = etnaviv_gem_new_private(dev, size, ETNA_BO_WC,
>>>>>> +	if (priv->dma_coherent)
>>>>>> +		cache_flags = ETNA_BO_CACHED;
>>>>>> +
>>>>> Drop this change. Instead etnaviv_gem_new_impl() should do the upgrade
>>>>> from WC to CACHED as necessary by adding something like this:
>>>> I understand you are a profession person in vivante GPU driver domain.
>>>>
>>>> I respect you reviews and instruction.
>>>>
>>>> But, I'm really reluctant to agree with this, is there any space to
>>>> negotiate?
>>>>
>>>>> /*
>>>>>     * Upgrade WC to CACHED when the device is hardware coherent and the
>>>>>     * platform doesn't allow mixing cached and writecombined mappings to
>>>>>     * the same memory area.
>>>>>     */
>>>>> if ((flags & ETNA_BO_CACHE_MASK) == ETNA_BO_WC &&
>>>>>        dev_is_dma_coherent(dev) && !drm_arch_can_wc_memory())
>>>>>            flags = (flags & ~ETNA_BO_CACHE_MASK) & ETNA_BO_CACHED;
>>>> This is policy, not a mechanism.
>>>>
>>>> Using what cache property is a user-space program's choice.
>>>>
>>>> While you are override the WC with CACHED mapping. This is not correct
>>>> in the concept!
>>>>
>>> Please explain why you think that this isn't correct.
>> Again,
>>
>> this is user-space things!
>>
>> this is user-space things!
>>
>> this is user-space things!
>>
>> I have explained several times.
>>
>> made the decision for the user-space program is wrong.
>>
> This mode of communication isn't helpful. Please stop it.
>
> As I tried to explain to you multiple times: if userspace can break
> coherency by selecting the wrong mapping type then this is something
> the kernel should prevent.

You are right in overall.

This is the only one benefit which WC mapping is preferred over the 
cached mapping.

As you already told me,  the WC mapping don't *pollute CPU's cache*.

If we can make sure that a BO is *only* going to be used by the GPU,

then we still can choose WC mapping as the cache property of this BO.

As the cache property is CPU side thing.

>>> It allows
>>> userspace to use WC mappings that would potentially cause loss of
>>> coherency between CPU and GPU, which isn't acceptable.
>> Before made the WC works correctly,  you need the developing environment.
>>
>> userspace program can tune the BO cache mapping easily.
>>
>> Either environment or supply a conf file.
>>
>>
>> While with your implement, we don't have the opportunity to do debugging
>> and the development.
> You can always add a patch to your local kernel to re-allow WC mappings
> while you work on making them behave as expected on your platform.

We are doing the things about the *upstream*.

>   With
> the mainline kernel there is no way that the kernel driver will allow
> broken coherency.

A buffer is used by the GPU solely won't break the coherency.

> And as I also mentioned before, there is a clear upgrade path here:
> once WC mappings work as expected on your platform we can easily drop
> the upgrading from the kernel driver again. The userspace driver can
> already be changed to use CACHED BOs where beneficial on your platform
> in the meantime.

For our platform, I think the problem is that the GPU always write to L3 
share cache,

even you use the WC mapping.

As I already said, the WC mapping only influence the CPU side.

How can I control the GPU when a BO is WC mapping ?

Does the GPU know that whether a BO  is WC mapping or not ?

How can I let(or tell) the GPU that

please not write to shared L3 cache,  write to the system RAM directly?

On out platform, the coherency between the CPU and peripheral IO 
device's cache

is maintained by hardware. While at here.  the hardware is the shared L3 
cache.

I guess the current problem is that the GPU don't listen to me,

he still write to CPU cache's even you choose the WC mapping.

Then if you want to read the rendered image by CPU,

the CPU go the un-cached path. Then I think the CPU will get wrong data.

> Regards,
> Lucas

-- 
Jingfeng