[PATCH 1/2] drm: add cache support for arm64

Mark Rutland mark.rutland at arm.com
Wed Aug 7 16:49:59 UTC 2019


On Wed, Aug 07, 2019 at 09:15:54AM -0700, Rob Clark wrote:
> On Wed, Aug 7, 2019 at 5:38 AM Mark Rutland <mark.rutland at arm.com> wrote:
> >
> > On Tue, Aug 06, 2019 at 09:31:55AM -0700, Rob Clark wrote:
> > > On Tue, Aug 6, 2019 at 7:35 AM Mark Rutland <mark.rutland at arm.com> wrote:
> > > >
> > > > On Tue, Aug 06, 2019 at 07:11:41AM -0700, Rob Clark wrote:
> > > > > On Tue, Aug 6, 2019 at 1:48 AM Christoph Hellwig <hch at lst.de> wrote:
> > > > > >
> > > > > > This goes in the wrong direction.  drm_cflush_* are a bad API we need to
> > > > > > get rid of, not add use of it.  The reason for that is two-fold:
> > > > > >
> > > > > >  a) it doesn't address how cache maintaince actually works in most
> > > > > >     platforms.  When talking about a cache we three fundamental operations:
> > > > > >
> > > > > >         1) write back - this writes the content of the cache back to the
> > > > > >            backing memory
> > > > > >         2) invalidate - this remove the content of the cache
> > > > > >         3) write back + invalidate - do both of the above
> > > > >
> > > > > Agreed that drm_cflush_* isn't a great API.  In this particular case
> > > > > (IIUC), I need wb+inv so that there aren't dirty cache lines that drop
> > > > > out to memory later, and so that I don't get a cache hit on
> > > > > uncached/wc mmap'ing.
> > > >
> > > > Is there a cacheable alias lying around (e.g. the linear map), or are
> > > > these addresses only mapped uncached/wc?
> > > >
> > > > If there's a cacheable alias, performing an invalidate isn't sufficient,
> > > > since a CPU can allocate a new (clean) entry at any point in time (e.g.
> > > > as a result of prefetching or arbitrary speculation).
> > >
> > > I *believe* that there are not alias mappings (that I don't control
> > > myself) for pages coming from
> > > shmem_file_setup()/shmem_read_mapping_page()..
> >
> > AFAICT, that's regular anonymous memory, so there will be a cacheable
> > alias in the linear/direct map.
> 
> tbh, I'm not 100% sure whether there is a cacheable alias, or whether
> any potential linear map is torn down.

I'm fairly confident that the linear/direct map cacheable alias is not
torn down when pages are allocated. The gneeric page allocation code
doesn't do so, and I see nothing the shmem code to do so.

For arm64, we can tear down portions of the linear map, but that has to
be done explicitly, and this is only possible when using rodata_full. If
not using rodata_full, it is not possible to dynamically tear down the
cacheable alias.

> My understanding is that a cacheable alias is "ok", with some
> caveats.. ie. that the cacheable alias is not accessed.  

Unfortunately, that is not true. You'll often get away with it in
practice, but that's a matter of probability rather than a guarantee.

You  cannot prevent a CPU from accessing a VA arbitrarily (e.g. as the
result of wild speculation). The ARM ARM (ARM DDI 0487E.a) points this
out explicitly:

| Entries for addresses that are Normal Cacheable can be allocated to
| the cache at any time, and so the cache invalidate instruction cannot
| ensure that the address is not present in a cache.

... along with:

| Caches introduce a number of potential problems, mainly because:
|
| * Memory accesses can occur at times other than when the programmer
|   would expect them.

;)

> I'm not entirely sure about pre-fetch from access to adjacent pages.
> We have been using shmem as a source for pages since the beginning,
> and I haven't seen it cause any problems in the last 6 years.  (This
> is limited to armv7 and armv8, I'm not really sure what would happen
> on armv6, but that is a combo I don't have to care about.)

Over time, CPUs get more aggressive w.r.t. prefetching and speculation,
so having not seen such issues in the past does not imply we won't in
future.

Anecdotally, we had an issue a few years ago that we were only able to
reproduce under heavy stress testing. A CPU would make speculative
instruction fetches from a read-destructive MMIO register, despite the
PC never going near the corresponding VA, and despite that code having
(apparently) caused no issue for years before that.

See commit:

  b6ccb9803e90c16b ("ARM: 7954/1: mm: remove remaining domain support from ARMv6")

... which unfortunately lacks the full war story.

Thanks,
Mark.


More information about the dri-devel mailing list