[PATCH 2/3] iommu/io-pgtable-arm: Add IOMMU_LLC page protection flag

Sai Prakash Ranjan saiprakash.ranjan at codeaurora.org
Wed Jun 30 10:07:59 UTC 2021


Hi Will,

On 2021-03-25 23:03, Will Deacon wrote:
> On Tue, Mar 09, 2021 at 12:10:44PM +0530, Sai Prakash Ranjan wrote:
>> On 2021-02-05 17:38, Sai Prakash Ranjan wrote:
>> > On 2021-02-04 03:16, Will Deacon wrote:
>> > > On Tue, Feb 02, 2021 at 11:56:27AM +0530, Sai Prakash Ranjan wrote:
>> > > > On 2021-02-01 23:50, Jordan Crouse wrote:
>> > > > > On Mon, Feb 01, 2021 at 08:20:44AM -0800, Rob Clark wrote:
>> > > > > > On Mon, Feb 1, 2021 at 3:16 AM Will Deacon <will at kernel.org> wrote:
>> > > > > > > On Fri, Jan 29, 2021 at 03:12:59PM +0530, Sai Prakash Ranjan wrote:
>> > > > > > > > On 2021-01-29 14:35, Will Deacon wrote:
>> > > > > > > > > On Mon, Jan 11, 2021 at 07:45:04PM +0530, Sai Prakash Ranjan wrote:
>> > > > > > > > > > +#define IOMMU_LLC        (1 << 6)
>> > > > > > > > >
>> > > > > > > > > On reflection, I'm a bit worried about exposing this because I think it
>> > > > > > > > > will
>> > > > > > > > > introduce a mismatched virtual alias with the CPU (we don't even have a
>> > > > > > > > > MAIR
>> > > > > > > > > set up for this memory type). Now, we also have that issue for the PTW,
>> > > > > > > > > but
>> > > > > > > > > since we always use cache maintenance (i.e. the streaming API) for
>> > > > > > > > > publishing the page-tables to a non-coheren walker, it works out.
>> > > > > > > > > However,
>> > > > > > > > > if somebody expects IOMMU_LLC to be coherent with a DMA API coherent
>> > > > > > > > > allocation, then they're potentially in for a nasty surprise due to the
>> > > > > > > > > mismatched outer-cacheability attributes.
>> > > > > > > > >
>> > > > > > > >
>> > > > > > > > Can't we add the syscached memory type similar to what is done on android?
>> > > > > > >
>> > > > > > > Maybe. How does the GPU driver map these things on the CPU side?
>> > > > > >
>> > > > > > Currently we use writecombine mappings for everything, although there
>> > > > > > are some cases that we'd like to use cached (but have not merged
>> > > > > > patches that would give userspace a way to flush/invalidate)
>> > > > > >
>> > > > >
>> > > > > LLC/system cache doesn't have a relationship with the CPU cache.  Its
>> > > > > just a
>> > > > > little accelerator that sits on the connection from the GPU to DDR and
>> > > > > caches
>> > > > > accesses. The hint that Sai is suggesting is used to mark the buffers as
>> > > > > 'no-write-allocate' to prevent GPU write operations from being cached in
>> > > > > the LLC
>> > > > > which a) isn't interesting and b) takes up cache space for read
>> > > > > operations.
>> > > > >
>> > > > > Its easiest to think of the LLC as a bonus accelerator that has no cost
>> > > > > for
>> > > > > us to use outside of the unfortunate per buffer hint.
>> > > > >
>> > > > > We do have to worry about the CPU cache w.r.t I/O coherency (which is a
>> > > > > different hint) and in that case we have all of concerns that Will
>> > > > > identified.
>> > > > >
>> > > >
>> > > > For mismatched outer cacheability attributes which Will
>> > > > mentioned, I was
>> > > > referring to [1] in android kernel.
>> > >
>> > > I've lost track of the conversation here :/
>> > >
>> > > When the GPU has a buffer mapped with IOMMU_LLC, is the buffer also
>> > > mapped
>> > > into the CPU and with what attributes? Rob said "writecombine for
>> > > everything" -- does that mean ioremap_wc() / MEMREMAP_WC?
>> > >
>> >
>> > Rob answered this.
>> >
>> > > Finally, we need to be careful when we use the word "hint" as
>> > > "allocation
>> > > hint" has a specific meaning in the architecture, and if we only
>> > > mismatch on
>> > > those then we're actually ok. But I think IOMMU_LLC is more than
>> > > just a
>> > > hint, since it actually drives eviction policy (i.e. it enables
>> > > writeback).
>> > >
>> > > Sorry for the pedantry, but I just want to make sure we're all talking
>> > > about the same things!
>> > >
>> >
>> > Sorry for the confusion which probably was caused by my mentioning of
>> > android, NWA(no write allocate) is an allocation hint which we can
>> > ignore
>> > for now as it is not introduced yet in upstream.
>> >
>> 
>> Any chance of taking this forward? We do not want to miss out on small 
>> fps
>> gain when the product gets released.
> 
> Do we have a solution to the mismatched virtual alias?
> 

Sorry for the long delay on this thread.

For mismatched virtual alias question, wasn't this already discussed in 
stretch
when initial support for system cache [1] (which was reverted by you) 
was added?

Excerpt from there,

"As seen in downstream kernels there are few non-coherent devices which
would not want to allocate in system cache, and therefore would want
Inner/Outer non-cached memory. So, we may want to either override the
attributes per-device, or as you suggested we may want to introduce
another memory type 'sys-cached' that can be added with its separate
infra."

As for DMA API usage, we do not have any upstream users (video will be
one if they decide to upstream that).

[1] 
https://patchwork.kernel.org/project/linux-arm-msm/patch/20180615105329.26800-1-vivek.gautam@codeaurora.org/

Thanks,
Sai

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member
of Code Aurora Forum, hosted by The Linux Foundation


More information about the dri-devel mailing list