[virglrenderer-devel] A bit of performance analysis

Gert Wollny gert.wollny at collabora.com
Fri Sep 7 11:43:58 UTC 2018


Dear all, 

given that the deqp test suites are very close to pass without errors,
and a release coming close I was thinking that it is time to look a bit
closer a the performance numbers and to get a base line I ran some
benchmarks and compared results obtained by running directly on the
host, running within qemu, and via vtest (see end of this email) 

Benchmarks that use many textures and buffers, like Unigine Heaven and
Unigine Valley running within Qemu slows the application down by the
factor of approximately six on r600 and 20 on the Intel Kabylake. On
the other hand, synthetic benchmarks from Gputest are less penalized
on r600 and on Intel they actually run on par with the host system or
even faster. My assumtion is that on Intel with these shader heavy
applications the different shader optimizations running on the guest
and the host actually improve the final code.

Instrumenting by using perf on the r600 host running Unigine Valley in
the guest doesn't reveal any specific hot spot on the host within qemu
or virglrenderer. memcpy accounts for 6% of the total run time, but
here only one third results from calls from qemu or virglrenderer, 
another 6% of the total run time goes to libpixman, apparently to
update some cursor. The only notable function directly in virglrenderer
is vrend_draw_bind_const_shader with 1.4%, and the memcpy calls
triggered by IOV transfers account for approximately 1.2%.

On Intel host another hot spot seems to be vmx_vcpu_run (ca 9%), this
might point to some qemu configuration problem.

The vtest results in Intel/Ubuntu are between running directly on the
host and running in qemu as one would expect. On the r600/Gentoo system
the picture is completely different, and my assumtion is that my kernel
configuration might be off here.

On the guest side things look a bit different. Here for the valley
benchmark more then 33% of the time is spend in and below
entry_SYSCALL_64 mostly initiated by mesa map_buffer_range
(glMapBufferRange) / unmap_buffer:

 32.12% entry_SYSCALL_64
    - 31.96% do_syscall_64
      - 23.46%  _x64_sys_ioctl
        - 23.35% ksys_ioctl
          - 22.35% do_vfs_ioctl
            - 21.89% drm_ioctl
              - 20.40% drm_ioctl_kernel
                + 7.47% virtio_gpu_wait_ioctl 
                + 5.73% virtio_gpu_transfer_to_host_ioctl
                + 4.58% virtio_gpu_transfer_from_host_ioctl
                  1.63% virtio_gpu_execbuffer_ioctl
      + 5.06% __x64_sys_nanosleep
      + 2.35% __x64_sys_futex
      
Instrumenting on Intel/Ubuntu reveals another hotspot in the guest
kernel's iowrite16 (self ca. 25%) that is not as prominent on the
AMD/Gentoo system (self ca. 3%) (VM in both cases a Ubuntu bionic with
the latests Ubuntu (cosmic) 4.17.0. kernel).

Some of this will likely be alleviated by coherent memory support or
udmabuf. However, given that these data transfer related hot spots
takes such a big chunk of the run-time it is difficult to 
directly identify other hots spots where performance could be
significantly improved. IOV linearization will help to cut down on
memcpy but the instrumatation seems to indicate that for the tested
benchmark this is not in a hot code path. Another improvement might be
to do more asyncronous data transfer: i.e. I'm not sure whether sending
the command stream always results in the guest waiting for an ACK, if
this is so then there is certainly room for improvement. 

It would be interesting to know what benchmarking tools others are
using. From Google I heard about glbench, but I'm unable to actually
find it. Maybe this benchmark now uses a new name?

best regards,
Gert 


[1] https://gitlab.freedesktop.org/virgl/virglrenderer/issues/1

-- Benchmark results:  

Host: Ubuntu 18.04, linux 4.15.0-33-generic

CPU/GPU Intel Kabylake 
Driver: i965
Mesa host/guest: git-19dbc7dd0f
Virglrenderer: git-2766ae7e97

## Unigine Valley (1024x768, Q:High, AA:2x)

 Driver      | FPS avrg (min, max) | Score |Score/host |     Remark
 --------------------------------------------------------------------
-----
 Virgl/qemu  |   1.0 (1.0, 1.5)    |  42   |   0.04    | Some artifacts
 Virgl/vtest |  12.3 (8.4,17.5)    |  515  |   0.40    | (Scenes 10
-13) 
 Host        |  31.4 (17.9, 47.9)  | 1314  |   1       |


## Unigine Heaven (1024x768, Q:High, Tess: Normal, AA:2x)

 Driver      | FPS avrg (min, max) | Score |Score/host 
--------------------------------------------------------
 Virgl/qemu  |   2.1 (1.5, 3.9)    | 52   |   0.06 
 Virgl/vtest |  13.4 (5.8, 24.9)   | 337  |   0.36 
 Host        |  37.3 (8.3, 64.1)   | 940  |   1         


## Gputest Furmark Windowed: 1024x640 

| Driver      | FPS  | Points | Points/host
--------------------------------------------
| Virgl/qemu  |  25  |  1554  |  1.12
| Virgl/vtest |  24  |  1477  |  1.11
| host        |  22  |  1329  |  1

## Gputest Pixmark Piano Windowed: 1024x640 

| Driver      | FPS  | Points | Points/host 
---------------------------------------------
| Virgl/qemu  |  6   |  416   |   0.96
| Virgl/vtest |  6   |  418   |   0.96
| Host        |  7   |  434   |   1

---------------------------------------------------------------------
-------

Host: Gentoo 4.14.52-gentoo
CPU: AMD FX-6300
GPU: AMD 6870 HD 
Driver: r600  (MESA_GL_VERSION_OVERRIDE=4.4) 
Mesa: git-52caee70a4
virglrenderer: git-76670ade
                       
## Unigine Heaven (1024x768, Q:High, Tess: Normal, AA:2x) 

 Driver      | FPS avrg (min, max) | Score | Score/host |  Remark
 Virgl/qemu  | 6.2 (3.4, 24.0)     |  156  | 0.40       | 
 Virgl/vtest | 1.2 (1.0, 2,6)      |   30  | 0.08       | Makes the 
                                                        | system 
                                                        | nearly 
                                                        | unusable
 Host        | 15.2  (4.3, 74.0)   |  382  | 1          | 

Since tesselation is very heavy on the shaders on r600 I also run this
benchmark without it: 

## Unigine Heaven (1024x768, Q:High, Tess: Disabled, AA:2x) 

 Driver      | FPS avrg (min, max) | Score | Score/host
 Virgl/qemu  | 12,1 (7,3, 28.2)    |  304  |   0.18
 Host        | 67,5  (19,4, 118.9) |  1701 |   1 

## Unigine Valley (1024x768, Q:High, AA:2x)

 Driver     | FPS avrg (min, max) | Score |Score/host |     Remark
 Virgl/qemu |  8.4 (6.5 11.6)     | 353   | 0.17      | Some artifacts 
                                                      | (Scenes 10-13) 
 Virgl/vtest |  2.9 (2.3 4.3)      | 123   | 0.07     | Slows the 
                                                      | system down
 Host        |  50,5 (22,7, 86,4)  | 2112  | 1        |        

## Gputest Furmark Windowed: 1024x640 

| Driver      | FPS  | Points | Points/host
| Virgl/qemu  | 23   | 1399   |  0.45
| Virgl/vtest |  2   | 150    |  0.05 
| Host        | 52   | 3138   |  1

## Gputest Pixmark Piano Windowed: 1024x640 

| Driver      | FPS  | Points | Points/host 
| Virgl/qemu  | 11   | 672    |   0.68
| Virgl/vtest | 0-1  | 39     |   0.04 
| Host        | 15   |  995   |   1


More information about the virglrenderer-devel mailing list