[Mesa-dev] Merging experimental r600/nir code

Eero Tamminen eero.t.tamminen at intel.com
Thu Feb 13 09:06:19 UTC 2020


Hi,

On 13.2.2020 10.38, Timur Kristóf wrote:
> I think the question about PGO is this: are the profiles of the users'
> applications gonna be the same as the profile that is collected from
> the benchmarks?
> 
> Eg. if the test benchmark uses different draw calls or triggers
> different shader compiler code paths than a your favourite game, in
> theory PGO could harm the performance of your game.
> 
> Also, how do we prevent it from making bad decisions based on the hw
> that the profile was made on?
> 
> For example, if you collect the profiling data from a machine that has
> a Polaris 10 GPU, then the profile will show that chip_class is
> extremely likely to be GFX8 and thus the PGO build will be optimized
> for that case. If I then run the same build on my Navi 10, the PGO
> build might actually be slower, because the driver needs to take a
> different code path than what the PGO build was optimized for.
> 
> What do you guys think about this?

How much HW specific stuff can impact things, depends on whether those 
things are executed constantly, or is it only something done once.  If 
former, it may be useful to (try) design driver so that they get 
executed only once.

Most CPU extensive part is shader compilation (with Intel, linking stage 
more than things done before it), and the heavy part is AFAIK to a large 
extent HW independent.  In benchmarks, shader compilation is almost 
always done at startup, in games shader compilation typically happens 
also afterwards.

As to how much PGO can make things worse, I think that depends on how 
independent the non-executed part of the code is.  If it's not mixed 
with code that did get executed, I don't think there will be any visible 
impact.  But if it's badly mixed, hot/cold function identification will 
group things wrong.


	- Eero

> Best regards,
> Timur
> 
> On Thu, 2020-02-13 at 02:40 -0500, Marek Olšák wrote:
>> Can we automate this?
>>
>> Let's say we implement noop ioctls for radeonsi and iris, and then we
>> run the drivers to collect pgo data on any hw.
>>
>> Can meson execute this build sequence:
>> build with pgo=generate
>> run tests
>> clean
>> build with pgo=use
>>
>> automated as buildtype=release-pgo.
>>a bit
>> Marek
>>
>> On Wed., Feb. 12, 2020, 23:37 Dieter Nützel, <Dieter at nuetzel-hh.de>
>> wrote:
>>> Hello Marek,
>>>
>>> I hoped you would ask this...
>>> ...but first sorry for the delay of my announced numbers.
>>> Our family is/was sick, my wife more than me and our children are
>>> fine,
>>> again.
>>> So be lenient with me somewhat.
>>>
>>> Am 12.02.2020 19:46, schrieb Marek Olšák:
>>>> How do you enable LTO+PGO? Is it something we could enable by
>>> default
>>>> for release builds?
>>>>
>>>> Marek
>>>
>>> I think we can achieve this.
>>>
>>> I'm running with LTO+PGO 'release' since late December (around
>>> Christmas).
>>> My KDE Plasma5 (OpenGL 3.0) system/desktop was never
>>> agiler/fluider
>>> since then.
>>> Even the numbers (glmark2) show it. The 'glmark2' numbers are the
>>> best
>>> I've ever seen on this system.
>>> LTO offer only some small space reduction and hardly any speedup.
>>> But LTO+PGO is GREAT.
>>>
>>> First I compile with '-Db_lto=true -Db_pgo=generate'.
>>>
>>> mkdir build
>>> cd build
>>> meson ../ --strip --buildtype release -Ddri-drivers=
>>> -Dplatforms=drm,x11
>>> -Dgallium-drivers=r600,radeonsi,swrast -Dvulkan-drivers=amd
>>> -Dgallium-nine=true -Dgallium-opencl=standalone -Dglvnd=true
>>> -Dgallium-va=true -Dgallium-xvmc=false -Dgallium-omx=disabled
>>> -Dgallium-xa=false -Db_lto=true -Db_pgo=generate
>>>
>>> After that my 'build' dir looks like this:
>>>
>>> drwxr-xr-x  8 dieter users    4096 13. Feb 04:34 .
>>> drwxr-xr-x 14 dieter users    4096 13. Feb 04:33 ..
>>> drwxr-xr-x  2 dieter users    4096 13. Feb 04:34 bin
>>> -rw-r--r--  1 dieter users 4369873 13. Feb 04:34 build.ninja
>>> -rw-r--r--  1 dieter users 4236719 13. Feb 04:34
>>> compile_commands.json
>>> drwxr-xr-x  2 dieter users    4096 13. Feb 04:34 include
>>> drwxr-xr-x  2 dieter users    4096 13. Feb 04:34 meson-info
>>> drwxr-xr-x  2 dieter users    4096 13. Feb 04:33 meson-logs
>>> drwxr-xr-x  2 dieter users    4096 13. Feb 04:34 meson-private
>>> drwxr-xr-x 14 dieter users    4096 13. Feb 04:34 src
>>>
>>> time nice +19 ninja
>>>
>>> Lasts ~15 minutes on my aging/'slow' Intel Xeon X3470 Nehalem,
>>> 4c/8t,
>>> 2.93 GHz, 24 GB, Polaris 20.
>>> Without LTO+PGO it is ~4-5 minutes. (AMD anyone?)
>>>
>>> Then I remove all files/dirs except 'src'.
>>>
>>> Next 'installing' the new built files under '/usr/local/' (mostly
>>> symlinked to /usr/lib64/).
>>>
>>> Now run as much OpenGL/Vulkan progs as I can.
>>> Normaly starting with glmark2 and vkmark.
>>>
>>> Here comes my (whole) list:
>>> Knights
>>> Wireshark
>>> K3b
>>> Skanlite
>>> Kdenlive
>>> GIMP
>>> Krita
>>> FreeCAD
>>> Blender 2.81x
>>> digikam
>>> K4DirStat
>>> Discover
>>> YaST
>>> Do some 'movements'/work in/with every prog.
>>> +
>>> some LibreOffice work (OpenGL enabled)
>>> one or two OpenGL games
>>> and Vulkan games
>>> +
>>> run some WebGL stuff in my browsers (Konqi/FF).
>>>
>>> After that I have the needed '*.gcda' files in 'src'.
>>>
>>> Now second rebuild in 'src'.
>>> Due to the deleted files/dirs I can do a second 'meson' config run
>>> in my
>>> current 'build' dir.
>>>
>>> meson ../ --strip --buildtype release -Ddri-drivers=
>>> -Dplatforms=drm,x11
>>> -Dgallium-drivers=r600,radeonsi,swrast -Dvulkan-drivers=amd
>>> -Dgallium-nine=true -Dgallium-opencl=standalone -Dglvnd=true
>>> -Dgallium-va=true -Dgallium-xvmc=false -Dgallium-omx=disabled
>>> -Dgallium-xa=false -Db_lto=true -Db_pgo=use
>>>
>>> After around 5-6 minutes (!!!) I can install the LTO+PGO 'release'
>>> build
>>> driver files and enjoy next level of OpenGL speed.
>>> Vulkan do NOT show such GREAT improvements.
>>>
>>> Only '-Db_lto=true -Db_pgo=generate' need ~3 times compilation and
>>> mostly linking time.
>>>
>>> Below are some memory and speed numbers.
>>> Should I send an additional post with a better title to the list?
>>> Hope this helps ;-)))
>>>
>>> -Dieter
>>>
>>> *******************************************************************
>>> ********************************
>>>
>>> Mesa git 21bc16a723 (somewhat older)
>>>
>>> normal
>>>
>>> -rwxr-xr-x   4 root root 9525520 13. Jan 20:00
>>> libvdpau_radeonsi.so.1.0.0
>>> -rwxr-xr-x   4 root root 9525520 13. Jan 20:00
>>> libvdpau_r600.so.1.0.0
>>>
>>> -rwxr-xr-x   8 root root 18444192 13. Jan 20:00 swrast_dri.so
>>> -rwxr-xr-x   8 root root 18444192 13. Jan 20:00 radeonsi_dri.so
>>> -rwxr-xr-x   8 root root 18444192 13. Jan 20:00 r600_dri.so
>>> -rwxr-xr-x   8 root root 18444192 13. Jan 20:00 kms_swrast_dri.so
>>> -rwxr-xr-x   4 root root  9505072 13. Jan 20:00
>>> radeonsi_drv_video.so
>>> -rwxr-xr-x   4 root root  9505072 13. Jan 20:00 r600_drv_video.so
>>>
>>>
>>> -Db_lto=true
>>>
>>> -rwxr-xr-x 2 root root 8078368 13. Jan 21:24 libvdpau_r600.so.1.0.0
>>> -rwxr-xr-x 2 root root 8078368 13. Jan 21:24
>>> libvdpau_radeonsi.so.1.0.0
>>>
>>> -rwxr-xr-x 4 root root 16878368 13. Jan 21:24 kms_swrast_dri.so
>>> -rwxr-xr-x 4 root root 16878368 13. Jan 21:24 r600_dri.so
>>> -rwxr-xr-x 2 root root  8074312 13. Jan 21:24 r600_drv_video.so
>>> -rwxr-xr-x 4 root root 16878368 13. Jan 21:24 radeonsi_dri.so
>>> -rwxr-xr-x 2 root root  8074312 13. Jan 21:24 radeonsi_drv_video.so
>>> -rwxr-xr-x 4 root root 16878368 13. Jan 21:24 swrast_dri.so
>>>
>>>
>>> -Db_lto=true -Db_pgo=use
>>>
>>> -rwxr-xr-x   4 root root 5600328 14. Jan 00:11
>>> libvdpau_radeonsi.so.1.0.0
>>> -rwxr-xr-x   4 root root 5600328 14. Jan 00:11
>>> libvdpau_r600.so.1.0.0
>>>
>>> -rwxr-xr-x   8 root root 11172768 14. Jan 00:11 swrast_dri.so
>>> -rwxr-xr-x   8 root root 11172768 14. Jan 00:11 radeonsi_dri.so
>>> -rwxr-xr-x   8 root root 11172768 14. Jan 00:11 r600_dri.so
>>> -rwxr-xr-x   8 root root 11172768 14. Jan 00:11 kms_swrast_dri.so
>>> -rwxr-xr-x   4 root root  5567640 14. Jan 00:11
>>> radeonsi_drv_video.so
>>> -rwxr-xr-x   4 root root  5567640 14. Jan 00:11 r600_drv_video.so
>>>
>>> *******************************************************************
>>> ********************************
>>>
>>> normal
>>>
>>> =======================================================
>>>       glmark2 2017.07
>>> =======================================================
>>>       OpenGL Information
>>>       GL_VENDOR:     X.Org
>>>       GL_RENDERER:   Radeon RX 580 Series (POLARIS10, DRM 3.35.0,
>>> 5.4.8-1.g582f5cb-default, LLVM 10.0.0)
>>>       GL_VERSION:    4.6 (Compatibility Profile) Mesa 20.0.0-devel
>>> (git-3a4f8c8158)
>>> =======================================================
>>> [build] use-vbo=false: FPS: 3332 FrameTime: 0.300 ms
>>> [build] use-vbo=true: FPS: 12144 FrameTime: 0.082 ms
>>> [texture] texture-filter=nearest: FPS: 11661 FrameTime: 0.086 ms
>>> [texture] texture-filter=linear: FPS: 11677 FrameTime: 0.086 ms
>>> [texture] texture-filter=mipmap: FPS: 11967 FrameTime: 0.084 ms
>>> [shading] shading=gouraud: FPS: 12047 FrameTime: 0.083 ms
>>> [shading] shading=blinn-phong-inf: FPS: 12120 FrameTime: 0.083 ms
>>> [shading] shading=phong: FPS: 12103 FrameTime: 0.083 ms
>>> [shading] shading=cel: FPS: 11891 FrameTime: 0.084 ms
>>> [bump] bump-render=high-poly: FPS: 11255 FrameTime: 0.089 ms
>>> [bump] bump-render=normals: FPS: 11747 FrameTime: 0.085 ms
>>> [bump] bump-render=height: FPS: 11574 FrameTime: 0.086 ms
>>> libpng warning: iCCP: known incorrect sRGB profile
>>> [effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 12546 FrameTime: 0.080
>>> ms
>>> libpng warning: iCCP: known incorrect sRGB profile
>>> [effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 11551
>>> FrameTime:
>>> 0.087 ms
>>> [pulsar] light=false:quads=5:texture=false: FPS: 11163 FrameTime:
>>> 0.090
>>> ms
>>> libpng warning: iCCP: known incorrect sRGB profile
>>> [desktop] blur-
>>> radius=5:effect=blur:passes=1:separable=true:windows=4:
>>> FPS: 5829 FrameTime: 0.172 ms
>>> libpng warning: iCCP: known incorrect sRGB profile
>>> [desktop] effect=shadow:windows=4: FPS: 6132 FrameTime: 0.163 ms
>>> [buffer]
>>> columns=200:interleave=false:update-dispersion=0.9:update-
>>> fraction=0.5:update-method=map:
>>> FPS: 856 FrameTime: 1.168 ms
>>> [buffer]
>>> columns=200:interleave=false:update-dispersion=0.9:update-
>>> fraction=0.5:update-method=subdata:
>>> FPS: 1136 FrameTime: 0.880 ms
>>> [buffer]
>>> columns=200:interleave=true:update-dispersion=0.9:update-
>>> fraction=0.5:update-method=map:
>>> FPS: 934 FrameTime: 1.071 ms
>>> [ideas] speed=duration: FPS: 3178 FrameTime: 0.315 ms
>>> [jellyfish] <default>: FPS: 9535 FrameTime: 0.105 ms
>>> [terrain] <default>: FPS: 1704 FrameTime: 0.587 ms
>>> [shadow] <default>: FPS: 8704 FrameTime: 0.115 ms
>>> [refract] <default>: FPS: 3307 FrameTime: 0.302 ms
>>> [conditionals] fragment-steps=0:vertex-steps=0: FPS: 11970
>>> FrameTime:
>>> 0.084 ms
>>> [conditionals] fragment-steps=5:vertex-steps=0: FPS: 12293
>>> FrameTime:
>>> 0.081 ms
>>> [conditionals] fragment-steps=0:vertex-steps=5: FPS: 12059
>>> FrameTime:
>>> 0.083 ms
>>> [function] fragment-complexity=low:fragment-steps=5: FPS: 12338
>>> FrameTime: 0.081 ms
>>> [function] fragment-complexity=medium:fragment-steps=5: FPS: 12257
>>> FrameTime: 0.082 ms
>>> [loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS:
>>> 12324
>>> FrameTime: 0.081 ms
>>> [loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5:
>>> FPS:
>>> 11839 FrameTime: 0.084 ms
>>> [loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS:
>>> 11880
>>> FrameTime: 0.084 ms
>>> =======================================================
>>>                                     glmark2 Score: 9304
>>> =======================================================
>>>
>>> *******************************************************************
>>> ********************************
>>>
>>> -Db_lto=true -Db_pgo=use
>>>
>>> =======================================================
>>>       glmark2 2017.07
>>> =======================================================
>>>       OpenGL Information
>>>       GL_VENDOR:     X.Org
>>>       GL_RENDERER:   Radeon RX 580 Series (POLARIS10, DRM 3.36.0,
>>> 5.5.2-1.g3a91916-default, LLVM 10.0.0)
>>>       GL_VERSION:    4.6 (Compatibility Profile) Mesa 20.1.0-devel
>>> (git-2799676218)
>>> =======================================================
>>> [build] use-vbo=false: FPS: 3324 FrameTime: 0.301 ms
>>> [build] use-vbo=true: FPS: 14835 FrameTime: 0.067 ms
>>> [texture] texture-filter=nearest: FPS: 14280 FrameTime: 0.070 ms
>>> [texture] texture-filter=linear: FPS: 14398 FrameTime: 0.069 ms
>>> [texture] texture-filter=mipmap: FPS: 14225 FrameTime: 0.070 ms
>>> [shading] shading=gouraud: FPS: 14162 FrameTime: 0.071 ms
>>> [shading] shading=blinn-phong-inf: FPS: 14087 FrameTime: 0.071 ms
>>> [shading] shading=phong: FPS: 14133 FrameTime: 0.071 ms
>>> [shading] shading=cel: FPS: 14116 FrameTime: 0.071 ms
>>> [bump] bump-render=high-poly: FPS: 11632 FrameTime: 0.086 ms
>>> [bump] bump-render=normals: FPS: 14402 FrameTime: 0.069 ms
>>> [bump] bump-render=height: FPS: 14369 FrameTime: 0.070 ms
>>> libpng warning: iCCP: known incorrect sRGB profile
>>> [effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 14696 FrameTime: 0.068
>>> ms
>>> libpng warning: iCCP: known incorrect sRGB profile
>>> [effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 11628
>>> FrameTime:
>>> 0.086 ms
>>> [pulsar] light=false:quads=5:texture=false: FPS: 13094 FrameTime:
>>> 0.076
>>> ms
>>> libpng warning: iCCP: known incorrect sRGB profile
>>> [desktop] blur-
>>> radius=5:effect=blur:passes=1:separable=true:windows=4:
>>> FPS: 6635 FrameTime: 0.151 ms
>>> libpng warning: iCCP: known incorrect sRGB profile
>>> [desktop] effect=shadow:windows=4: FPS: 8023 FrameTime: 0.125 ms
>>> [buffer]
>>> columns=200:interleave=false:update-dispersion=0.9:update-
>>> fraction=0.5:update-method=map:
>>> FPS: 866 FrameTime: 1.155 ms
>>> [buffer]
>>> columns=200:interleave=false:update-dispersion=0.9:update-
>>> fraction=0.5:update-method=subdata:
>>> FPS: 1126 FrameTime: 0.888 ms
>>> [buffer]
>>> columns=200:interleave=true:update-dispersion=0.9:update-
>>> fraction=0.5:update-method=map:
>>> FPS: 939 FrameTime: 1.065 ms
>>> [ideas] speed=duration: FPS: 4568 FrameTime: 0.219 ms
>>> [jellyfish] <default>: FPS: 11735 FrameTime: 0.085 ms
>>> [terrain] <default>: FPS: 1691 FrameTime: 0.591 ms
>>> [shadow] <default>: FPS: 11271 FrameTime: 0.089 ms
>>> [refract] <default>: FPS: 3250 FrameTime: 0.308 ms
>>> [conditionals] fragment-steps=0:vertex-steps=0: FPS: 15095
>>> FrameTime:
>>> 0.066 ms
>>> [conditionals] fragment-steps=5:vertex-steps=0: FPS: 14874
>>> FrameTime:
>>> 0.067 ms
>>> [conditionals] fragment-steps=0:vertex-steps=5: FPS: 14918
>>> FrameTime:
>>> 0.067 ms
>>> [function] fragment-complexity=low:fragment-steps=5: FPS: 14995
>>> FrameTime: 0.067 ms
>>> [function] fragment-complexity=medium:fragment-steps=5: FPS: 14879
>>> FrameTime: 0.067 ms
>>> [loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS:
>>> 14910
>>> FrameTime: 0.067 ms
>>> [loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5:
>>> FPS:
>>> 14969 FrameTime: 0.067 ms
>>> [loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS:
>>> 14804
>>> FrameTime: 0.068 ms
>>> =======================================================
>>>                                     glmark2 Score: 11119
>>> =======================================================
>>>
>>> *******************************************************************
>>> ********************************
>>>
>>>> On Wed, Feb 12, 2020 at 1:56 AM Dieter Nützel <
>>> Dieter at nuetzel-hh.de>
>>>> wrote:
>>>>
>>>>> Hello Gert,
>>>>>
>>>>> your merge 'broke' LTO and then later on PGO
>>> compilation/linking.
>>>>>
>>>>> I do generally compiling with
>>>>> '-Dgallium-drivers=r600,radeonsi,swrast'
>>>>> for testing radeonsi and (your) r600 work. ;-)
>>>>>
>>>>> After your merge I get several warnings in 'addrlib' with LTO
>>> and
>>>>> even a
>>>>> compiler error (gcc (SUSE Linux) 9.2.1 20200128).
>>>>>
>>>>> I had to disable 'r600' ('swrast' is needed for 'nine') to get a
>>>>> working
>>>>> LTO and even better PGO radeonsi driver.
>>>>> I'm preparing GREAT LTO+PGO (the later is the greater) numbers
>>> over
>>>>> the
>>>>> last 2 months. I'll send my results later, today.
>>>>>
>>>>> Summary
>>>>> radeonsi is ~40% smaller and 16-20% faster with PGO (!!!).
>>>>>
>>>>> Honza and the GCC people (Intel's ICC folks) do GREAT things.
>>>>> 'glmark2' numbers are better then 'vkmark'. (Hello, Marek.).
>>>>>
>>>>> Need some sleep.
>>>>>
>>>>> See my log, below.
>>>>>
>>>>> Greetings and GREAT work!
>>>>>
>>>>> -Dieter
>>
>> _______________________________________________
>> mesa-dev mailing list
>> mesa-dev at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 



More information about the mesa-dev mailing list