[RFC] Remove AGP support from Radeon/Nouveau/TTM

Wed May 13 10:25:58 UTC 2020

On Wed, May 13, 2020 at 9:55 AM Christian König
<ckoenig.leichtzumerken at gmail.com> wrote:
>
> Am 13.05.20 um 09:19 schrieb Daniel Vetter:
> > On Tue, May 12, 2020 at 8:22 PM Alex Deucher <alexdeucher at gmail.com> wrote:
> >> On Tue, May 12, 2020 at 12:38 PM Daniel Vetter <daniel at ffwll.ch> wrote:
> >>> On Tue, May 12, 2020 at 3:22 PM Alex Deucher <alexdeucher at gmail.com> wrote:
> >>>> On Tue, May 12, 2020 at 5:40 AM Karoly Balogh (Charlie/SGR)
> >>>> <charlie at scenergy.dfmk.hu> wrote:
> >>>>> Hi,
> >>>>>
> >>>>> On Tue, 12 May 2020, Rui Salvaterra wrote:
> >>>>>
> >>>>>>> FWIW, on my last-generation PowerBook with RV350 (IIRC), there was a
> >>>>>>> big performance difference between AGP and PCI GART. The latter was
> >>>>>>> sort of usable for normal desktop operation, but not so much for
> >>>>>>> OpenGL apps (which were usable with AGP).
> >>>>>> I never really understood what were the issues with AGP on PowerPC
> >>>>>> (well, Apple, the only ones I've tested) machines. I mean, did OS X also
> >>>>>> disable AGP entirely, or did it have workarounds somewhere else on the
> >>>>>> stack nobody was able to figure out?
> >>>>> I don't know about OS X, but I doubt there is a major/blocker hardware
> >>>>> issue, at least not one which affects every AGP machine.
> >>>>>
> >>>>> MorphOS' own Radeon driver uses the AGP facilities to some degree on all
> >>>>> AGP PowerPC Macs supported by that OS, which is from PMac AGP Graphics
> >>>>> (3,1) all the way up to the AGP G5 (7,3), including the various portables
> >>>>> and the Mac mini G4. For example it can utilize it to stream video data
> >>>>> directly from mainboard RAM, so you don't have to copy it with the CPU,
> >>>>> allowing reasonably good 720p h264 video playback on most systems above
> >>>>> the 1Ghz mark with the native MPlayer port. I'm sure the 3D part of the
> >>>>> driver also use it to some degree, given the performance improvement we
> >>>>> experienced when the AGP support was enabled (initially the system was
> >>>>> running without it), but to which extent I can't say.
> >>>> The problem is AGP doesn't support CPU cache snooping.  Technically
> >>>> PCI must support coherent device access to system memory.  Unsnooped
> >>>> access is an optional feature and some platforms may not support it at
> >>>> all.  Unfortunately, AGP required unsnooped access.  x8t generally
> >>>> provides a way to do this, but other platforms, not so much.  I don't
> >>>> recall to what extent PowerPC supported this.  The Linux DMA API
> >>>> doesn't really have a way to get uncached memory for DMA so there is
> >>>> that too.  Windows and Mac may provide a way to do this depending on
> >>>> the platforms.  What probably should have been done on AGP boards was
> >>>> to use both the AGP GART and the device GART.  The former for uncached
> >>>> memory (if the platform supported it) and the latter for cached
> >>>> memory.  That never happened.
> >>> Slight correction on the dma-api side of things: The dma-api very much
> >>> can give you uncached memory, but only on some platforms, and the
> >>> dma-api is very opinionated about which those are. And it refuses to
> >>> tell you whether your memory ends up being uncached or cached. That's
> >>> all done in the name of platform portability, which is good for most
> >>> drivers, but just too much pain for gpu drivers.
> >> Out of curiosity how do you do that without manually messing around
> >> with PAT or MTRRs?
> > i915 is even worse, we manually mess around with clflush. In
> > userspace. So really there's 2 axis for dma memory: coherent vs.
> > non-coherent (which is something the dma-api somewhat exposed), i.e.
> > do you need to clflush or not, and cached vs uncached, i.e. are the
> > PAT entries wc or wb.
> >
> > But yeah if you dont have PAT real uncached isn't possible, can't frob
> > MTTR for individual pages. That also, to my understanding, why the dma
> > api doesn't want to expose this to driver, but abstract it all away:
> > On many tiny soc platforms all you have for uncached is an mttr (well
> > the equivalent on that platform), so you anything you get from
> > dma_alloc_coherent needs to come from there.
>
> IIRC I once got it explained like this: On some platforms all you have a
> register with a value and if your address is above that value it is
> uncached and wc if it is below it is cached and wb.
>
> > But then no one is ever going to plug in a big gpu into such a system
> > and expect anything to work, so we really need an abstraction that
> > works on a bit more than just x86 (so we don't dig around in platform
> > stuff like updating PAT or issusing clflush anymore), but doesn't try
> > to work everywhere linux runs, just on the few platforms people expect
> > big gpus to work on. For all the kms-only drivers we have the dma api
> > seems actually perfectly fine (essentially the cma helpes we have
> > should be called dma helpers, since that's what they're using
> > underneath for all buffer management).
>
> That is unfortunately not true for AMD GPUs, people tend to put them
> into those embedded ARM or PowerPC boxes and just expect them to work.

I thought arm64 had proper PAT-style per-page caching attributes,
because servers don't like the hard split with the address limit or
whatever other unflexible trickery. But not sure.

And ppc I thought was defacto always coherent, at least on modern machines.

But yeah the older stuff is pure horrors. The problem is also that
afaik arm defaults to uncached (and on some arm platforms it's
impossible to get a wc/coherent mapping of a random system page) and
ppc defaults to cached&coherent. So we can't even have a reasonable
default for non-x86.

> On the other hand we have hardware/firmware engineers which assumed you
> always have USWC and we wonder for weeks why firmware loaded doesn't
> work....

Fun.
-Daniel

>
> Regards,
> Christian.
>
> >
> > Cheers, Daniel
> >
> >> Alex
> >>
> >>> Otherwise all agree, agp is a mighty mess and essentially just
> >>> crapshot outside of x86. It kinda worked for the much more static
> >>> allocations for dri1, but with in-kernel memory managers all the cache
> >>> flushing issues showed up big time and it all fell to pieces. Plus a
> >>> lot of these host chipset back then where designed for the rather
> >>> static windows gpu managers, so even on x86 the coherency issues for
> >>> agp mode when used together with ttm or something else really dynamic
> >>> is pretty bad because the hw just doesn't really cope and has all
> >>> kinds of flushing troubles and races. I think the later agp chipsets
> >>> were better.
> >>> -Daniel
> >>> --
> >>> Daniel Vetter
> >>> Software Engineer, Intel Corporation
> >>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> >
> >
>

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch