[PATCH 2/3] iommu/io-pgtable-arm: Add IOMMU_LLC page protection flag
Sai Prakash Ranjan
saiprakash.ranjan at codeaurora.org
Tue Feb 2 06:26:27 UTC 2021
On 2021-02-01 23:50, Jordan Crouse wrote:
> On Mon, Feb 01, 2021 at 08:20:44AM -0800, Rob Clark wrote:
>> On Mon, Feb 1, 2021 at 3:16 AM Will Deacon <will at kernel.org> wrote:
>> >
>> > On Fri, Jan 29, 2021 at 03:12:59PM +0530, Sai Prakash Ranjan wrote:
>> > > On 2021-01-29 14:35, Will Deacon wrote:
>> > > > On Mon, Jan 11, 2021 at 07:45:04PM +0530, Sai Prakash Ranjan wrote:
>> > > > > Add a new page protection flag IOMMU_LLC which can be used
>> > > > > by non-coherent masters to set cacheable memory attributes
>> > > > > for an outer level of cache called as last-level cache or
>> > > > > system cache. Initial user of this page protection flag is
>> > > > > the adreno gpu and then can later be used by other clients
>> > > > > such as video where this can be used for per-buffer based
>> > > > > mapping.
>> > > > >
>> > > > > Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan at codeaurora.org>
>> > > > > ---
>> > > > > drivers/iommu/io-pgtable-arm.c | 3 +++
>> > > > > include/linux/iommu.h | 6 ++++++
>> > > > > 2 files changed, 9 insertions(+)
>> > > > >
>> > > > > diff --git a/drivers/iommu/io-pgtable-arm.c
>> > > > > b/drivers/iommu/io-pgtable-arm.c
>> > > > > index 7439ee7fdcdb..ebe653ef601b 100644
>> > > > > --- a/drivers/iommu/io-pgtable-arm.c
>> > > > > +++ b/drivers/iommu/io-pgtable-arm.c
>> > > > > @@ -415,6 +415,9 @@ static arm_lpae_iopte
>> > > > > arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
>> > > > > else if (prot & IOMMU_CACHE)
>> > > > > pte |= (ARM_LPAE_MAIR_ATTR_IDX_CACHE
>> > > > > << ARM_LPAE_PTE_ATTRINDX_SHIFT);
>> > > > > + else if (prot & IOMMU_LLC)
>> > > > > + pte |= (ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE
>> > > > > + << ARM_LPAE_PTE_ATTRINDX_SHIFT);
>> > > > > }
>> > > > >
>> > > > > if (prot & IOMMU_CACHE)
>> > > > > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> > > > > index ffaa389ea128..1f82057df531 100644
>> > > > > --- a/include/linux/iommu.h
>> > > > > +++ b/include/linux/iommu.h
>> > > > > @@ -31,6 +31,12 @@
>> > > > > * if the IOMMU page table format is equivalent.
>> > > > > */
>> > > > > #define IOMMU_PRIV (1 << 5)
>> > > > > +/*
>> > > > > + * Non-coherent masters can use this page protection flag to set
>> > > > > cacheable
>> > > > > + * memory attributes for only a transparent outer level of cache,
>> > > > > also known as
>> > > > > + * the last-level or system cache.
>> > > > > + */
>> > > > > +#define IOMMU_LLC (1 << 6)
>> > > >
>> > > > On reflection, I'm a bit worried about exposing this because I think it
>> > > > will
>> > > > introduce a mismatched virtual alias with the CPU (we don't even have a
>> > > > MAIR
>> > > > set up for this memory type). Now, we also have that issue for the PTW,
>> > > > but
>> > > > since we always use cache maintenance (i.e. the streaming API) for
>> > > > publishing the page-tables to a non-coheren walker, it works out.
>> > > > However,
>> > > > if somebody expects IOMMU_LLC to be coherent with a DMA API coherent
>> > > > allocation, then they're potentially in for a nasty surprise due to the
>> > > > mismatched outer-cacheability attributes.
>> > > >
>> > >
>> > > Can't we add the syscached memory type similar to what is done on android?
>> >
>> > Maybe. How does the GPU driver map these things on the CPU side?
>>
>> Currently we use writecombine mappings for everything, although there
>> are some cases that we'd like to use cached (but have not merged
>> patches that would give userspace a way to flush/invalidate)
>>
>> BR,
>> -R
>
> LLC/system cache doesn't have a relationship with the CPU cache. Its
> just a
> little accelerator that sits on the connection from the GPU to DDR and
> caches
> accesses. The hint that Sai is suggesting is used to mark the buffers
> as
> 'no-write-allocate' to prevent GPU write operations from being cached
> in the LLC
> which a) isn't interesting and b) takes up cache space for read
> operations.
>
> Its easiest to think of the LLC as a bonus accelerator that has no cost
> for
> us to use outside of the unfortunate per buffer hint.
>
> We do have to worry about the CPU cache w.r.t I/O coherency (which is a
> different hint) and in that case we have all of concerns that Will
> identified.
>
For mismatched outer cacheability attributes which Will mentioned, I was
referring to [1] in android kernel.
[1] https://android-review.googlesource.com/c/kernel/common/+/1549097/3
Thanks,
Sai
--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a
member
of Code Aurora Forum, hosted by The Linux Foundation
More information about the dri-devel
mailing list