[Intel-gfx] [linus:master] [file] 0ede61d858: will-it-scale.per_thread_ops -2.9% regression

Oliver Sang oliver.sang at intel.com
Mon Nov 27 06:58:52 UTC 2023


hi, Linus,

On Sun, Nov 26, 2023 at 03:20:58PM -0800, Linus Torvalds wrote:
> On Sun, 26 Nov 2023 at 12:23, Linus Torvalds
> <torvalds at linux-foundation.org> wrote:
> >
> > IOW, I might have messed up some "trivial cleanup" when prepping for
> > sending it out...
> 
> Bah. Famous last words. One of the "trivial cleanups" made the code
> more "obvious" by renaming the nospec mask as just "mask".
> 
> And that trivial rename broke that patch *entirely*, because now that
> name shadowed the "fmode_t" mask argument.
> 
> Don't even ask how long it took me to go from "I *tested* this,
> dammit, now it doesn't work at all" to "Oh God, I'm so stupid".
> 
> So that nobody else would waste any time on this, attached is a new
> attempt. This time actually tested *after* the changes.

we applied the new patch upon 0ede61d858, and confirmed regression is gone,
even 3.4% better than 93faf426e3 now.

Tested-by: kernel test robot <oliver.sang at intel.com>

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-8.3/thread/16/debian-11.1-x86_64-20220510.cgz/lkp-cpl-4sp2/poll2/will-it-scale

commit:
  93faf426e3 ("vfs: shave work on failed file open")
  0ede61d858 ("file: convert to SLAB_TYPESAFE_BY_RCU")
  c712b4365b ("Improve __fget_files_rcu() code generation (and thus __fget_light())")

93faf426e3cc000c 0ede61d8589cc2d93aa78230d74 c712b4365b5b4dbe1d1380edd37
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
    228481 ±  4%      -4.6%     217900 ±  6%     -11.7%     201857 ±  5%  meminfo.DirectMap4k
     89056            -2.0%      87309            -1.6%      87606        proc-vmstat.nr_slab_unreclaimable
     16.28            -0.7%      16.16            -1.0%      16.12        turbostat.RAMWatt
      0.01 ±  9%  +58125.6%       4.17 ±175%  +23253.5%       1.67 ±222%  perf-sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
    781.67 ± 10%      +6.5%     832.50 ± 19%     -14.3%     670.17 ±  4%  perf-sched.wait_and_delay.count.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
     97958 ±  7%      -9.7%      88449 ±  4%      -0.6%      97399 ±  4%  sched_debug.cpu.avg_idle.stddev
      0.00 ± 12%     +24.2%       0.00 ± 17%      -5.2%       0.00 ±  7%  sched_debug.cpu.next_balance.stddev
   6391048            -2.9%    6208584            +3.4%    6605584        will-it-scale.16.threads
    399440            -2.9%     388036            +3.4%     412848        will-it-scale.per_thread_ops
   6391048            -2.9%    6208584            +3.4%    6605584        will-it-scale.workload
     19.99 ±  4%      -2.2       17.74            +1.2       21.18 ±  2%  perf-profile.calltrace.cycles-pp.fput.do_poll.do_sys_poll.__x64_sys_poll.do_syscall_64
      1.27 ±  5%      +0.8        2.11 ±  3%     +31.1       32.36 ±  2%  perf-profile.calltrace.cycles-pp.__fdget.do_poll.do_sys_poll.__x64_sys_poll.do_syscall_64
     32.69 ±  4%      +5.0       37.70           -32.7        0.00        perf-profile.calltrace.cycles-pp.__fget_light.do_poll.do_sys_poll.__x64_sys_poll.do_syscall_64
      0.00           +27.9       27.85            +0.0        0.00        perf-profile.calltrace.cycles-pp.__get_file_rcu.__fget_light.do_poll.do_sys_poll.__x64_sys_poll
     20.00 ±  4%      -2.3       17.75            +0.4       20.43 ±  2%  perf-profile.children.cycles-pp.fput
      0.24 ± 10%      -0.1        0.18 ±  2%      -0.1        0.18 ± 10%  perf-profile.children.cycles-pp.syscall_return_via_sysret
      1.48 ±  5%      +0.5        1.98 ±  3%     +30.8       32.32 ±  2%  perf-profile.children.cycles-pp.__fdget
     31.85 ±  4%      +6.0       37.86           -31.8        0.00        perf-profile.children.cycles-pp.__fget_light
      0.00           +27.7       27.67            +0.0        0.00        perf-profile.children.cycles-pp.__get_file_rcu
     30.90 ±  4%     -20.6       10.35 ±  2%     -30.9        0.00        perf-profile.self.cycles-pp.__fget_light
     19.94 ±  4%      -2.4       17.53            -0.3       19.62 ±  2%  perf-profile.self.cycles-pp.fput
      9.81 ±  4%      -2.4        7.42 ±  2%      +1.7       11.51 ±  4%  perf-profile.self.cycles-pp.do_poll
      0.23 ± 11%      -0.1        0.17 ±  4%      -0.1        0.18 ± 11%  perf-profile.self.cycles-pp.syscall_return_via_sysret
      0.44 ±  7%      +0.0        0.45 ±  5%      +0.1        0.52 ±  4%  perf-profile.self.cycles-pp.__poll
      0.85 ±  4%      +0.1        0.92 ±  3%     +30.3       31.17 ±  2%  perf-profile.self.cycles-pp.__fdget
      0.00           +26.5       26.48            +0.0        0.00        perf-profile.self.cycles-pp.__get_file_rcu
 2.146e+10 ±  2%      +8.5%  2.329e+10 ±  2%      -2.1%  2.101e+10        perf-stat.i.branch-instructions
      0.22 ± 14%      -0.0        0.19 ± 14%      -0.0        0.20 ±  3%  perf-stat.i.branch-miss-rate%
 2.424e+10 ±  2%      +4.1%  2.524e+10 ±  2%      -4.7%  2.311e+10        perf-stat.i.dTLB-loads
 1.404e+10 ±  2%      +8.7%  1.526e+10 ±  2%      -6.2%  1.316e+10        perf-stat.i.dTLB-stores
     70.87            -2.3       68.59            -1.0       69.90        perf-stat.i.iTLB-load-miss-rate%
   5267608            -5.5%    4979133 ±  2%      -0.4%    5244253        perf-stat.i.iTLB-load-misses
   2102507            +5.4%    2215725            +5.7%    2222286        perf-stat.i.iTLB-loads
     18791 ±  3%     +10.5%      20757 ±  2%      -1.8%      18446        perf-stat.i.instructions-per-iTLB-miss
    266.67 ±  2%      +6.8%     284.75 ±  2%      -4.1%     255.70        perf-stat.i.metric.M/sec
      0.01 ± 10%     -10.5%       0.01 ±  5%      -1.8%       0.01 ±  6%  perf-stat.overall.MPKI
      0.19            -0.0        0.17            +0.0        0.20        perf-stat.overall.branch-miss-rate%
      0.65            -3.1%       0.63            +6.1%       0.69        perf-stat.overall.cpi
      0.00 ±  4%      -0.0        0.00 ±  4%      +0.0        0.00 ±  4%  perf-stat.overall.dTLB-store-miss-rate%
     71.48            -2.3       69.21            -1.2       70.24        perf-stat.overall.iTLB-load-miss-rate%
     18757           +10.0%      20629            -3.2%      18161        perf-stat.overall.instructions-per-iTLB-miss
      1.54            +3.2%       1.59            -5.8%       1.45        perf-stat.overall.ipc
   4795147            +6.4%    5100406            -9.0%    4365017        perf-stat.overall.path-length
  2.14e+10 ±  2%      +8.5%  2.322e+10 ±  2%      -2.1%  2.094e+10        perf-stat.ps.branch-instructions
 2.417e+10 ±  2%      +4.1%  2.516e+10 ±  2%      -4.7%  2.303e+10        perf-stat.ps.dTLB-loads
   1.4e+10 ±  2%      +8.7%  1.522e+10 ±  2%      -6.3%  1.312e+10        perf-stat.ps.dTLB-stores
   5253923            -5.5%    4966218 ±  2%      -0.5%    5228207        perf-stat.ps.iTLB-load-misses
   2095770            +5.4%    2208605            +5.7%    2214962        perf-stat.ps.iTLB-loads
 3.065e+13            +3.3%  3.167e+13            -5.9%  2.883e+13        perf-stat.total.instructions

> 
>                   Linus



More information about the Intel-gfx mailing list