[virglrenderer-devel] A bit of performance analysis
Marc-André Lureau
marcandre.lureau at gmail.com
Fri Sep 7 11:56:44 UTC 2018
Hi
On Fri, Sep 7, 2018 at 3:45 PM Gert Wollny <gert.wollny at collabora.com> wrote:
>
> Dear all,
>
> given that the deqp test suites are very close to pass without errors,
> and a release coming close I was thinking that it is time to look a bit
> closer a the performance numbers and to get a base line I ran some
> benchmarks and compared results obtained by running directly on the
> host, running within qemu, and via vtest (see end of this email)
>
> Benchmarks that use many textures and buffers, like Unigine Heaven and
> Unigine Valley running within Qemu slows the application down by the
> factor of approximately six on r600 and 20 on the Intel Kabylake. On
> the other hand, synthetic benchmarks from Gputest are less penalized
> on r600 and on Intel they actually run on par with the host system or
> even faster. My assumtion is that on Intel with these shader heavy
> applications the different shader optimizations running on the guest
> and the host actually improve the final code.
>
> Instrumenting by using perf on the r600 host running Unigine Valley in
> the guest doesn't reveal any specific hot spot on the host within qemu
> or virglrenderer. memcpy accounts for 6% of the total run time, but
> here only one third results from calls from qemu or virglrenderer,
> another 6% of the total run time goes to libpixman, apparently to
> update some cursor. The only notable function directly in virglrenderer
> is vrend_draw_bind_const_shader with 1.4%, and the memcpy calls
> triggered by IOV transfers account for approximately 1.2%.
>
> On Intel host another hot spot seems to be vmx_vcpu_run (ca 9%), this
> might point to some qemu configuration problem.
>
> The vtest results in Intel/Ubuntu are between running directly on the
> host and running in qemu as one would expect. On the r600/Gentoo system
> the picture is completely different, and my assumtion is that my kernel
> configuration might be off here.
>
> On the guest side things look a bit different. Here for the valley
> benchmark more then 33% of the time is spend in and below
> entry_SYSCALL_64 mostly initiated by mesa map_buffer_range
> (glMapBufferRange) / unmap_buffer:
>
> 32.12% entry_SYSCALL_64
> - 31.96% do_syscall_64
> - 23.46% _x64_sys_ioctl
> - 23.35% ksys_ioctl
> - 22.35% do_vfs_ioctl
> - 21.89% drm_ioctl
> - 20.40% drm_ioctl_kernel
> + 7.47% virtio_gpu_wait_ioctl
> + 5.73% virtio_gpu_transfer_to_host_ioctl
> + 4.58% virtio_gpu_transfer_from_host_ioctl
> 1.63% virtio_gpu_execbuffer_ioctl
> + 5.06% __x64_sys_nanosleep
> + 2.35% __x64_sys_futex
>
> Instrumenting on Intel/Ubuntu reveals another hotspot in the guest
> kernel's iowrite16 (self ca. 25%) that is not as prominent on the
> AMD/Gentoo system (self ca. 3%) (VM in both cases a Ubuntu bionic with
> the latests Ubuntu (cosmic) 4.17.0. kernel).
>
> Some of this will likely be alleviated by coherent memory support or
> udmabuf. However, given that these data transfer related hot spots
> takes such a big chunk of the run-time it is difficult to
> directly identify other hots spots where performance could be
> significantly improved. IOV linearization will help to cut down on
> memcpy but the instrumatation seems to indicate that for the tested
> benchmark this is not in a hot code path. Another improvement might be
> to do more asyncronous data transfer: i.e. I'm not sure whether sending
> the command stream always results in the guest waiting for an ACK, if
> this is so then there is certainly room for improvement.
>
> It would be interesting to know what benchmarking tools others are
> using. From Google I heard about glbench, but I'm unable to actually
> find it. Maybe this benchmark now uses a new name?
>
> best regards,
> Gert
>
>
> [1] https://gitlab.freedesktop.org/virgl/virglrenderer/issues/1
>
> -- Benchmark results:
>
> Host: Ubuntu 18.04, linux 4.15.0-33-generic
>
> CPU/GPU Intel Kabylake
> Driver: i965
> Mesa host/guest: git-19dbc7dd0f
> Virglrenderer: git-2766ae7e97
>
> ## Unigine Valley (1024x768, Q:High, AA:2x)
>
> Driver | FPS avrg (min, max) | Score |Score/host | Remark
> --------------------------------------------------------------------
> -----
> Virgl/qemu | 1.0 (1.0, 1.5) | 42 | 0.04 | Some artifacts
> Virgl/vtest | 12.3 (8.4,17.5) | 515 | 0.40 | (Scenes 10
> -13)
> Host | 31.4 (17.9, 47.9) | 1314 | 1 |
>
>
> ## Unigine Heaven (1024x768, Q:High, Tess: Normal, AA:2x)
>
> Driver | FPS avrg (min, max) | Score |Score/host
> --------------------------------------------------------
> Virgl/qemu | 2.1 (1.5, 3.9) | 52 | 0.06
> Virgl/vtest | 13.4 (5.8, 24.9) | 337 | 0.36
> Host | 37.3 (8.3, 64.1) | 940 | 1
You might be interested by qemu "[PATCH v4 00/29] vhost-user for input & GPU"
Unigine Heaven 4.0 on Intel® HD Graphics 530 (Skylake GT2)
host is fps:31.1 / score:784
qemu-gtk/egl+virtio-gpu: fps:2.6/ score: 64
qemu-gtk/egl+vhost-user-gpu: fps:12.9 / score: 329
spice+virtio-gpu: fps:2.8 / score: 70
spice+vhost-user-gpu: fps:12.1 / score: 304
There is some work to make it more acceptable (both in qemu &
libvirt), but hopefully this will happen some day..
>
>
> ## Gputest Furmark Windowed: 1024x640
>
> | Driver | FPS | Points | Points/host
> --------------------------------------------
> | Virgl/qemu | 25 | 1554 | 1.12
> | Virgl/vtest | 24 | 1477 | 1.11
> | host | 22 | 1329 | 1
>
> ## Gputest Pixmark Piano Windowed: 1024x640
>
> | Driver | FPS | Points | Points/host
> ---------------------------------------------
> | Virgl/qemu | 6 | 416 | 0.96
> | Virgl/vtest | 6 | 418 | 0.96
> | Host | 7 | 434 | 1
>
> ---------------------------------------------------------------------
> -------
>
> Host: Gentoo 4.14.52-gentoo
> CPU: AMD FX-6300
> GPU: AMD 6870 HD
> Driver: r600 (MESA_GL_VERSION_OVERRIDE=4.4)
> Mesa: git-52caee70a4
> virglrenderer: git-76670ade
>
> ## Unigine Heaven (1024x768, Q:High, Tess: Normal, AA:2x)
>
> Driver | FPS avrg (min, max) | Score | Score/host | Remark
> Virgl/qemu | 6.2 (3.4, 24.0) | 156 | 0.40 |
> Virgl/vtest | 1.2 (1.0, 2,6) | 30 | 0.08 | Makes the
> | system
> | nearly
> | unusable
> Host | 15.2 (4.3, 74.0) | 382 | 1 |
>
> Since tesselation is very heavy on the shaders on r600 I also run this
> benchmark without it:
>
> ## Unigine Heaven (1024x768, Q:High, Tess: Disabled, AA:2x)
>
> Driver | FPS avrg (min, max) | Score | Score/host
> Virgl/qemu | 12,1 (7,3, 28.2) | 304 | 0.18
> Host | 67,5 (19,4, 118.9) | 1701 | 1
>
> ## Unigine Valley (1024x768, Q:High, AA:2x)
>
> Driver | FPS avrg (min, max) | Score |Score/host | Remark
> Virgl/qemu | 8.4 (6.5 11.6) | 353 | 0.17 | Some artifacts
> | (Scenes 10-13)
> Virgl/vtest | 2.9 (2.3 4.3) | 123 | 0.07 | Slows the
> | system down
> Host | 50,5 (22,7, 86,4) | 2112 | 1 |
>
> ## Gputest Furmark Windowed: 1024x640
>
> | Driver | FPS | Points | Points/host
> | Virgl/qemu | 23 | 1399 | 0.45
> | Virgl/vtest | 2 | 150 | 0.05
> | Host | 52 | 3138 | 1
>
> ## Gputest Pixmark Piano Windowed: 1024x640
>
> | Driver | FPS | Points | Points/host
> | Virgl/qemu | 11 | 672 | 0.68
> | Virgl/vtest | 0-1 | 39 | 0.04
> | Host | 15 | 995 | 1
> _______________________________________________
> virglrenderer-devel mailing list
> virglrenderer-devel at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/virglrenderer-devel
--
Marc-André Lureau
More information about the virglrenderer-devel
mailing list