[PATCH] drm/msm/a6xx+: Insert a fence wait before SMMU table update

Rob Clark robdclark at gmail.com
Wed Sep 18 01:30:27 UTC 2024


On Tue, Sep 17, 2024 at 4:37 PM Konrad Dybcio <konradybcio at kernel.org> wrote:
>
> On 17.09.2024 5:30 PM, Rob Clark wrote:
> > On Tue, Sep 17, 2024 at 6:47 AM Konrad Dybcio <konradybcio at kernel.org> wrote:
> >>
> >> On 13.09.2024 9:51 PM, Rob Clark wrote:
> >>> From: Rob Clark <robdclark at chromium.org>
> >>>
> >>> The CP_SMMU_TABLE_UPDATE _should_ be waiting for idle, but on some
> >>> devices (x1-85, possibly others), it seems to pass that barrier while
> >>> there are still things in the event completion FIFO waiting to be
> >>> written back to memory.
> >>
> >> Can we try to force-fault around here on other GPUs and perhaps
> >> limit this workaround?
> >
> > not sure what you mean by "force-fault"...
>
> I suppose 'reproduce' is what I meant

I haven't _noticed_ it yet.. if you want to try on devices you have,
glmark2 seems to be good at reproducing..

I think the reason is combo of high fps (on x1-85 most scenes are
north of 8k fps) so you get a lot of context switches btwn compositor
and glmark2.  Most scenes are just a clear plus single draw, and I
guess the compositor is just doing a single draw/blit.  A6xx can be
two draws/blits deep in it's pipeline, a7xx can be four, which maybe
exacerbates this.

> > we could probably limit
> > this to certain GPUs, the only reason I didn't is (a) it should be
> > harmless when it is not needed,
>
> Do we have any realistic perf hits here?

I don't think so, we can't switch ttbr0 while the gpu is still busy so
what the sqe does for CP_SMMU_TABLE_UPDATE _should_ be equivalent.
Maybe it amounts to some extra CP cycles and memory read, but I think
that should be negligible given that the expensive thing is that we
are stalling the gpu until it is idle.

> > and (b) I have no real good way to get
> > an exhaustive list of where it is needed.  Maybe/hopefully it is only
> > x1-85, but idk.
> >
> > It does bring up an interesting question about preemption, though
>
> Yeah..

The KMD does setup an xAMBLE to clear the perfcntrs on context switch.
We could maybe piggy back on that, but I guess we'd have to patch in
the fence value to wait for?

> Do we know what windows does here?

not sure, maybe akhil has some way to check.  Whether a similar
scenario comes up with windows probably depends on how the winsys
works.  If it dropped frames when rendering >vblank rate, you'd get
fewer context switches.

BR,
-R

> Konrad


More information about the dri-devel mailing list