[PATCH 08/21] udl-kms: avoid prefetch
Alexey Brodkin
Alexey.Brodkin at synopsys.com
Wed Jun 6 12:04:24 UTC 2018
Hi Mikulas,
On Tue, 2018-06-05 at 11:30 -0400, Mikulas Patocka wrote:
>
> On Tue, 5 Jun 2018, Alexey Brodkin wrote:
>
> > Hi Mikulas,
> >
> > On Sun, 2018-06-03 at 16:41 +0200, Mikulas Patocka wrote:
> > > Modern processors can detect linear memory accesses and prefetch data
> > > automatically, so there's no need to use prefetch.
> >
> > Not each and every CPU that's capable of running Linux has prefetch
> > functionality :)
> >
> > Still read-on...
> >
> > > Signed-off-by: Mikulas Patocka <mpatocka at redhat.com>
> > >
> > > ---
> > > drivers/gpu/drm/udl/udl_transfer.c | 7 -------
> > > 1 file changed, 7 deletions(-)
> > >
> > > Index: linux-4.16.12/drivers/gpu/drm/udl/udl_transfer.c
> > > ===================================================================
> > > --- linux-4.16.12.orig/drivers/gpu/drm/udl/udl_transfer.c 2018-05-31 14:48:12.000000000 +0200
> > > +++ linux-4.16.12/drivers/gpu/drm/udl/udl_transfer.c 2018-05-31 14:48:12.000000000 +0200
> > > @@ -13,7 +13,6 @@
> > > #include <linux/module.h>
> > > #include <linux/slab.h>
> > > #include <linux/fb.h>
> > > -#include <linux/prefetch.h>
> > > #include <asm/unaligned.h>
> > >
> > > #include <drm/drmP.h>
> > > @@ -51,9 +50,6 @@ static int udl_trim_hline(const u8 *bbac
> > > int start = width;
> > > int end = width;
> > >
> > > - prefetch((void *) front);
> > > - prefetch((void *) back);
> >
> > AFAIK prefetcher fetches new data according to a known history... i.e. based on previously
> > used pattern we'll trying to get the next batch of data.
> >
> > But the code above is in the very beginning of the data processing routine where
> > prefetcher doesn't yet have any history to know what and where to prefetch.
> >
> > So I'd say this particular usage is good.
> > At least those prefetches shouldn't hurt because typically it
> > would be just 1 instruction if those exist or nothing if CPU/compiler doesn't
> > support it.
>
> See this post https://urldefense.proofpoint.com/v2/url?u=https-3A__lwn.net_Articles_444336_&d=DwIBAg&c=DPL6_X_6JkXFx7AXWqB0tg&r=lqdeeSSEes0GFDDl656e
> ViXO7breS55ytWkhpk5R81I&m=a5RaqJtvajFkM1hL7bOKD5jV7cpFfTvG2Y1cYCdBPd0&s=w0W8wFtAgENp8TE6RzdPGhdKRasJc_otIn08V0EkgrY&e= where they measured that
> prefetch hurts performance. Prefetch shouldn't be used unless you have a
> proof that it improves performance.
>
> The problem is that the prefetch instruction causes stalls in the pipeline
> when it encounters TLB miss and the automatic prefetcher doesn't.
Wow, thanks for the link.
I didn't know about that subtle issue with prefetch instructions on ARM and x86.
So OK in case of UDL these prefetches anyways make not not much sense I guess and there's
something worse still, see what I've got from WandBoard Quad running kmscube [1] application
with help of perf utility:
--------------------------->8-------------------------
# Overhead Command Shared Object Symbol
# ........ ....... ....................... ........................................
#
92.93% kmscube [kernel.kallsyms] [k] udl_render_hline
2.51% kmscube [kernel.kallsyms] [k] __divsi3
0.33% kmscube [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
0.22% kmscube [kernel.kallsyms] [k] lock_acquire
0.19% kmscube [kernel.kallsyms] [k] _raw_spin_unlock_irq
0.17% kmscube [kernel.kallsyms] [k] udl_handle_damage
0.12% kmscube [kernel.kallsyms] [k] v7_dma_clean_range
0.11% kmscube [kernel.kallsyms] [k] l2c210_clean_range
0.06% kmscube [kernel.kallsyms] [k] __memzero
--------------------------->8-------------------------
That said it's not even USB 2.0 which is a bottle-neck but
computations in the udl_render_hline().
[1] https://cgit.freedesktop.org/mesa/kmscube/
-Alexey
More information about the dri-devel
mailing list