[PATCH net-next v25 00/13] Device Memory TCP

Yunsheng Lin linyunsheng at huawei.com
Mon Sep 9 11:21:05 UTC 2024


On 2024/9/9 13:43, Mina Almasry wrote:

> 
> Perf - page-pool benchmark:
> ---------------------------
> 
> bench_page_pool_simple.ko tests with and without these changes:
> https://pastebin.com/raw/ncHDwAbn
> 
> AFAIK the number that really matters in the perf tests is the
> 'tasklet_page_pool01_fast_path Per elem'. This one measures at about 8
> cycles without the changes but there is some 1 cycle noise in some
> results.
> 
> With the patches this regresses to 9 cycles with the changes but there
> is 1 cycle noise occasionally running this test repeatedly.
> 
> Lastly I tried disable the static_branch_unlikely() in
> netmem_is_net_iov() check. To my surprise disabling the
> static_branch_unlikely() check reduces the fast path back to 8 cycles,
> but the 1 cycle noise remains.

Sorry for the late report, as I was adding a testing page_pool ko basing
on [1] to avoid introducing performance regression when fixing the bug in
[2].
I used it to test the performance impact of devmem patchset for page_pool
too, it seems there might be some noticable performance impact quite stably
for the below testcases, about 5%~16% performance degradation as below in
the arm64 system:

Before the devmem patchset:
 Performance counter stats for 'insmod ./page_pool_test.ko test_push_cpu=16 test_pop_cpu=16 nr_test=100000000 test_napi=1' (100 runs):

         17.167561      task-clock (msec)         #    0.003 CPUs utilized            ( +-  0.40% )
                 8      context-switches          #    0.474 K/sec                    ( +-  0.65% )
                 0      cpu-migrations            #    0.001 K/sec                    ( +-100.00% )
                84      page-faults               #    0.005 M/sec                    ( +-  0.13% )
          44576552      cycles                    #    2.597 GHz                      ( +-  0.40% )
          59627412      instructions              #    1.34  insn per cycle           ( +-  0.03% )
          14370325      branches                  #  837.063 M/sec                    ( +-  0.02% )
             21902      branch-misses             #    0.15% of all branches          ( +-  0.27% )

       6.818873600 seconds time elapsed                                          ( +-  0.02% )

 Performance counter stats for 'insmod ./page_pool_test.ko test_push_cpu=16 test_pop_cpu=16 nr_test=100000000 test_napi=1 test_direct=1' (100 runs):

         17.595423      task-clock (msec)         #    0.004 CPUs utilized            ( +-  0.01% )
                 8      context-switches          #    0.460 K/sec                    ( +-  0.50% )
                 0      cpu-migrations            #    0.000 K/sec
                84      page-faults               #    0.005 M/sec                    ( +-  0.15% )
          45693020      cycles                    #    2.597 GHz                      ( +-  0.01% )
          59676212      instructions              #    1.31  insn per cycle           ( +-  0.00% )
          14385384      branches                  #  817.564 M/sec                    ( +-  0.00% )
             21786      branch-misses             #    0.15% of all branches          ( +-  0.14% )

       4.098627802 seconds time elapsed                                          ( +-  0.11% )

After the devmem patchset:
Performance counter stats for 'insmod ./page_pool_test.ko test_push_cpu=16 test_pop_cpu=16 nr_test=100000000 test_napi=1' (100 runs):

         17.047973      task-clock (msec)         #    0.002 CPUs utilized            ( +-  0.39% )
                 8      context-switches          #    0.488 K/sec                    ( +-  0.82% )
                 0      cpu-migrations            #    0.001 K/sec                    ( +- 70.35% )
                84      page-faults               #    0.005 M/sec                    ( +-  0.12% )
          44269558      cycles                    #    2.597 GHz                      ( +-  0.39% )
          59594383      instructions              #    1.35  insn per cycle           ( +-  0.02% )
          14362599      branches                  #  842.481 M/sec                    ( +-  0.02% )
             21949      branch-misses             #    0.15% of all branches          ( +-  0.25% )

       7.964890303 seconds time elapsed                                          ( +-  0.16% )

 Performance counter stats for 'insmod ./page_pool_test.ko test_push_cpu=16 test_pop_cpu=16 nr_test=100000000 test_napi=1 test_direct=1' (100 runs):

         17.660975      task-clock (msec)         #    0.004 CPUs utilized            ( +-  0.02% )
                 8      context-switches          #    0.458 K/sec                    ( +-  0.57% )
                 0      cpu-migrations            #    0.003 K/sec                    ( +- 43.81% )
                84      page-faults               #    0.005 M/sec                    ( +-  0.17% )
          45862652      cycles                    #    2.597 GHz                      ( +-  0.02% )
          59764866      instructions              #    1.30  insn per cycle           ( +-  0.01% )
          14404323      branches                  #  815.602 M/sec                    ( +-  0.01% )
             21826      branch-misses             #    0.15% of all branches          ( +-  0.19% )

       4.304644609 seconds time elapsed                                          ( +-  0.75% )

1. https://lore.kernel.org/all/20240906073646.2930809-2-linyunsheng@huawei.com/
2. https://lore.kernel.org/lkml/8067f204-1380-4d37-8ffd-007fc6f26738@kernel.org/T/

> 



More information about the dri-devel mailing list