[PATCH 3/3] drm/panthor: Prevent potential overwrite of buffer objects

Mon Nov 4 13:41:50 UTC 2024

On Mon, 4 Nov 2024 12:49:56 +0000
Akash Goel <akash.goel at arm.com> wrote:

> On 11/4/24 11:16, Boris Brezillon wrote:
> > Hi Akash,
> > 
> > On Thu, 31 Oct 2024 21:42:27 +0000
> > Akash Goel <akash.goel at arm.com> wrote:
> >   
> >> I assume you also reckon that there is a potential problem here for arm64.  
> > 
> > It impacts any system that's not IO-coherent I would say, and this
> > comment seems to prove this is a known issue [3].
> >   
> 
> Thanks for confirming.
> 
> Actually I had tried to check with Daniel Vetter about [3], as it was 
> not clear to me that how that code exactly helped in x86 case.
> As far as I understand, [3] updates the attribute of direct kernel 
> mapping of the shmem pages to WC, so as to be consistent with the 
> Userspace mapping of the pages or their vmapping inside the kernel.
> But didn't get how that alignment actually helped in cleaning the dirty 
> cache lines.

Yeah, I was not referring to the code but rather the fact that x86,
with its IO coherency model, is a special case here, and that other
archs probably need explicit flushes in a few places.

> >>
> >> shmem calls 'flush_dcache_folio()' after clearing the pages but that
> >> just clears the 'PG_dcache_clean' bit and CPU cache is not cleaned
> >> immediately.
> >>
> >> I realize that this patch is not foolproof, as Userspace can try to
> >> populate the BO from CPU side before mapping it on the GPU side.
> >>
> >> Not sure if we also need to consider the case when shmem pages are
> >> swapped out. Don't know if there could be a similar situation of dirty
> >> cachelines after the swap in.  
> > 
> > I think we do. We basically need to flush CPU caches any time
> > pages are [re]allocated, because the shmem layer will either zero-out
> > (first allocation) or populate (swap-in) in that path, and in both
> > cases, it involves a CPU copy to a cached mapping.
> >   
> 
> Thanks for confirming.
> 
> I think we may have to do cache flush page by page.
> Not all pages might get swapped out and the initial allocation of all 
> pages may not happen at the same time.

If the pages are mapped GPU-side, it's always all pages at a time (at
least until we add support for lazy page allocation, AKA growing/heap
buffers). You're right that GPU buffers that have only been mapped
CPU-side with mmap() get their pages lazily allocated, though I'm not
really sure we care about optimizing that case just yet.

> Please correct me if my understanding is wrong.

Eviction should be rare enough that we can probably pay the price of a
flush on the entire BO range.