[Freedreno] [PATCH 04/16] drm: msm: Flush the cache immediately after allocating pages

Mon Nov 7 08:35:00 UTC 2016

On 11/06/2016 07:45 PM, Rob Clark wrote:
> On Fri, Nov 4, 2016 at 6:44 PM, Jordan Crouse <jcrouse at codeaurora.org> wrote:
>> For reasons that are not entirely understood using dma_map_sg()
>> for nocache/write combine buffers doesn't always successfully flush
>> the cache after the memory is zeroed somewhere deep in the bowels
>> of the shmem code.  My working theory is that the cache flush on
>> the swiotlb bounce buffer address work isn't always flushing what
>> we need.
>>
>> Instead of using dma_map_sg() directly kmap and flush each page
>> at allocate time.  We could use invalidate + clean or just invalidate
>> if we wanted to but on ARM64 using a flush is safer and not much
>> slower for what we are trying to do.
>>
>> Hopefully someday I'll more clearly understand the relationship between
>> shmem  kmap, vmap and the swiotlb bounce buffer and we can be smarter
>> about when and how we invalidate the caches.
>
> Like I mentioned on irc, we defn don't want to ever hit bounce
> buffers.  I think the problem here is dma-mapping assumes we only
> support 32b physical addresses, which is not true (at least as long as
> we have iommu)..
>
> Archit hit a similar problem on the display side of things.

Yeah, the shmem allocated pages ended up being 33 bit addresses some times
on db820c. The msm driver sets the dma mask to a default of 32 bits.
The dma mapping api gets unhappy whenever we get sg chunks with 33 bit
addresses, and tries to use switolb for them. We eventually end up
overflowing the swiotlb.

Setting the mask to 33 bits worked as a temporary hack.

>
> I think the proper solution is to do something like this somewhere:
>
>    dma_set_mask_and_coherent(drm->dev, DMA_BIT_MASK(size))
>
> where size is 32 or 64 (48?) depending on device..
>
> Note that this value should be perhaps something that the iommu driver
> knows, since really it is about the iommu page tables.
>
> Maybe it would even make sense for the iommu driver to set this when
> we attach?  Although since the GEM code is allocating/mapping for
> multiple subdev's that might not work out (ie. if we only attached
> subdev's and not the parent drm dev which is used by the GEM code for
> dma_map/unmap()).

By multiple subdevs, you mean MDP and GPU devs? As far as their dma
masks are concerned, it should be the same for both, right? It's only
the iommu driver's returned iova that we need to care about. Weren't we
thinking of managing that by having separate drm_mm's for MDP and GPU?

>
> (ofc it would be better if dma-mapping was structured more like
> helpers which could be bypassed in the few special cases where the
> abstraction doesn't work, rather than forcing us to do pretend
> dma_map/unmap() for cache operations.. but that is a topic for a
> different rant)

Yeah, that would be nice.

Archit

>
> BR,
> -R
>
>> Signed-off-by: Jordan Crouse <jcrouse at codeaurora.org>
>> ---
>>  drivers/gpu/drm/msm/msm_gem.c | 21 ++++++++-------------
>>  1 file changed, 8 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
>> index 85f3047..29f5a30 100644
>> --- a/drivers/gpu/drm/msm/msm_gem.c
>> +++ b/drivers/gpu/drm/msm/msm_gem.c
>> @@ -79,6 +79,7 @@ static struct page **get_pages(struct drm_gem_object *obj)
>>                 struct drm_device *dev = obj->dev;
>>                 struct page **p;
>>                 int npages = obj->size >> PAGE_SHIFT;
>> +               int i;
>>
>>                 if (use_pages(obj))
>>                         p = drm_gem_get_pages(obj);
>> @@ -91,6 +92,13 @@ static struct page **get_pages(struct drm_gem_object *obj)
>>                         return p;
>>                 }
>>
>> +               for (i = 0; i < npages; i++) {
>> +                       void *addr = kmap_atomic(p[i]);
>> +
>> +                       __dma_flush_range(addr, addr + PAGE_SIZE);
>> +                       kunmap_atomic(addr);
>> +               }
>> +
>>                 msm_obj->sgt = drm_prime_pages_to_sg(p, npages);
>>                 if (IS_ERR(msm_obj->sgt)) {
>>                         dev_err(dev->dev, "failed to allocate sgt\n");
>> @@ -98,13 +106,6 @@ static struct page **get_pages(struct drm_gem_object *obj)
>>                 }
>>
>>                 msm_obj->pages = p;
>> -
>> -               /* For non-cached buffers, ensure the new pages are clean
>> -                * because display controller, GPU, etc. are not coherent:
>> -                */
>> -               if (msm_obj->flags & (MSM_BO_WC|MSM_BO_UNCACHED))
>> -                       dma_map_sg(dev->dev, msm_obj->sgt->sgl,
>> -                                       msm_obj->sgt->nents, DMA_BIDIRECTIONAL);
>>         }
>>
>>         return msm_obj->pages;
>> @@ -115,12 +116,6 @@ static void put_pages(struct drm_gem_object *obj)
>>         struct msm_gem_object *msm_obj = to_msm_bo(obj);
>>
>>         if (msm_obj->pages) {
>> -               /* For non-cached buffers, ensure the new pages are clean
>> -                * because display controller, GPU, etc. are not coherent:
>> -                */
>> -               if (msm_obj->flags & (MSM_BO_WC|MSM_BO_UNCACHED))
>> -                       dma_unmap_sg(obj->dev->dev, msm_obj->sgt->sgl,
>> -                                       msm_obj->sgt->nents, DMA_BIDIRECTIONAL);
>>                 sg_free_table(msm_obj->sgt);
>>                 kfree(msm_obj->sgt);
>>
>> --
>> 1.9.1
>>
>> _______________________________________________
>> Freedreno mailing list
>> Freedreno at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/freedreno

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project