[PATCH v10 07/11] drm/etnaviv: Add support for the dma coherent device

Sun Jun 25 04:04:13 UTC 2023

Hi,

On 2023/6/22 01:53, Lucas Stach wrote:
> Am Donnerstag, dem 22.06.2023 um 01:31 +0800 schrieb Sui Jingfeng:
>> Hi,
>>
>> On 2023/6/22 00:07, Lucas Stach wrote:
>>> And as the HW guarantees it on your platform, your platform
>>> implementation makes this function effectively a no-op. Skipping the
>>> call to this function is breaking the DMA API abstraction, as now the
>>> driver is second guessing the DMA API implementation. I really see no
>>> reason to do this.
>> It is the same reason you chose the word 'effectively', not 'difinitely'.
>>
>> We don't want waste the CPU's time,
>>
>>
>>    to running the dma_sync_sg_for_cpu funcion() function
>>
>>
>> ```
>>
>> void dma_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
>>               int nelems, enum dma_data_direction dir)
>> {
>>       const struct dma_map_ops *ops = get_dma_ops(dev);
>>
>>       BUG_ON(!valid_dma_direction(dir));
>>       if (dma_map_direct(dev, ops))
>>           dma_direct_sync_sg_for_cpu(dev, sg, nelems, dir);
>>       else if (ops->sync_sg_for_cpu)
>>           ops->sync_sg_for_cpu(dev, sg, nelems, dir);
>>       debug_dma_sync_sg_for_cpu(dev, sg, nelems, dir);
>> }
>>
>> ```
>>
>>
>>    to running the this:
>>
>>
>> ```
>>
>> int etnaviv_gem_cpu_fini(struct drm_gem_object *obj)
>> {
>>       struct drm_device *dev = obj->dev;
>>       struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj);
>>       struct etnaviv_drm_private *priv = dev->dev_private;
>>
>>       if (!priv->dma_coherent && etnaviv_obj->flags & ETNA_BO_CACHED) {
>>           /* fini without a prep is almost certainly a userspace error */
>>           WARN_ON(etnaviv_obj->last_cpu_prep_op == 0);
>>           dma_sync_sgtable_for_device(dev->dev, etnaviv_obj->sgt,
>> etnaviv_op_to_dma_dir(etnaviv_obj->last_cpu_prep_op));
>>           etnaviv_obj->last_cpu_prep_op = 0;
>>       }
>>
>>       return 0;
>> }
>>
>> ```
>>
> My judgment as the maintainer of this driver is that the small CPU
> overhead of calling this function is very well worth it, if the
> alternative is breaking the DMA API abstractions.
>
>> But, this is acceptable, because we can kill the GEM_CPU_PREP and
>> GEM_CPU_FINI ioctl entirely
>>
>> at userspace for cached buffer, as this is totally not needed for cached
>> mapping on our platform.
>>
> And that statement isn't true either.

Yes, you are right here. I admit.

Because I have suffered such problem in the past when developing 
xf86-video-loongson.

The root cause, I think,  is the CPU don't know when the GPU have 
finished the rendering.

Or there still some data reside in the GPU's cache.

We have to call etna_bo_cpu_prep(etna_bo, DRM_ETNA_PREP_READ) function

to make sure the  data fetch by CPU is the latest.

I realized this knowledge(issue) five month ago in this year, see [1] 
for reference.

I  just forget this thing when doing the debate with you.

[1] 
https://gitlab.freedesktop.org/longxin2019/xf86-video-loongson/-/commit/95f9596eb19223c3109ea1f32c3e086fd1d43bd8

||

>   The CPU_PREP/FINI ioctls also
> provide fence synchronization between CPU and GPU.

You are correct here.

> There are a few very
> specific cases where skipping those ioctls is acceptable (mostly when
> the userspace driver explicitly wants unsynchronized access), but in
> most cases they are required for correctness.

OK, you are extremely correct.

> Regards,
> Lucas

-- 
Jingfeng