[Mesa-dev] [PATCH 00/26] RadeonSI: Primitive culling with async compute

Dieter Nützel Dieter at nuetzel-hh.de
Thu Feb 14 18:43:38 UTC 2019


For the whole series (the updated branch merged in)

Tested-by: Dieter Nützel <Dieter at nuetzel-hh.de>

on Polaris 20

FreeCAD, Blender, UH, UV, US, some VTK apps
No surprising speed up but e.g. NO slowdown.

tb stands even for
[Mesa-dev] [PATCH 0/4] RadeonSI: Follow-up for the primitive culling 
series
too (but no SI, here).

mplayer / mpv works like a charm, again.

ParaView-5.6.0-MPI-Linux-64bit

1920x1080
pd off ~18 fps
pd on ~24 fps ! ;-)

2560x1440
pd off ~14 fps
pd on ~16 fps

./pvbatch 
../lib/python2.7/site-packages/paraview/benchmark/manyspheres.py -s 100 
-r 726 -v 1920,1080 -f 30

Is this right?

Poor
Intel Xeon X3470, 2.93 GHz, 3.2 GHz turbo, 4c/8t
24 GB
Polaris 20, 8 GB
PCIe 2.1 only (NO PCIe atomics)

Dieter

Am 14.02.2019 03:07, schrieb Marek Olšák:
> I just updated the branch, fixing video players.
> 
> Marek
> 
> On Wed, Feb 13, 2019 at 8:28 PM Dieter Nützel <Dieter at nuetzel-hh.de>
> wrote:
> 
>> Now with LLVM 9.0 git;-)
>> 
>> Running, except mplayer/mpv (same as before).
>> 
>> mplayer: ../src/gallium/drivers/radeon/radeon_winsys.h:866:
>> radeon_get_heap_index: Assertion `!"32BIT without WC is disallowed"'
>> 
>> failed.
>> Abbruch (core dumped)
>> 
>> mpv: ../src/gallium/drivers/radeon/radeon_winsys.h:866:
>> radeon_get_heap_index: Assertion `!"32BIT without WC is disallowed"'
>> 
>> failed.
>> Abbruch (core dumped)
>> 
>> And this after glxgears, Blender, FreeCAD, UH and UV:
>> 
>> [38939.440950] [drm:amdgpu_ctx_mgr_entity_fini [amdgpu]] *ERROR* ctx
>> 
>> 00000000679c61fd is still alive
>> [38939.440993] [drm:amdgpu_ctx_mgr_fini [amdgpu]] *ERROR* ctx
>> 00000000679c61fd is still alive
>> [38964.901076] [drm:amdgpu_ctx_mgr_entity_fini [amdgpu]] *ERROR* ctx
>> 
>> 000000009c4b659b is still alive
>> [38964.901130] [drm:amdgpu_ctx_mgr_fini [amdgpu]] *ERROR* ctx
>> 000000009c4b659b is still alive
>> [38980.844577] [drm:amdgpu_ctx_mgr_entity_fini [amdgpu]] *ERROR* ctx
>> 
>> 000000001bee3a35 is still alive
>> [38980.844642] [drm:amdgpu_ctx_mgr_fini [amdgpu]] *ERROR* ctx
>> 000000001bee3a35 is still alive
>> 
>> Newer 'amd-staging-drm-next' needed? #0bf64b0a9f78 currently
>> 
>> If I only had some big triangle apps...;-)
>> 
>> Dieter
>> 
>> Am 13.02.2019 17:36, schrieb Marek Olšák:
>>> Dieter, you need final LLVM 8.0.
>>> 
>>> Marek
>>> 
>>> On Wed, Feb 13, 2019 at 11:02 AM Dieter Nützel
>> <Dieter at nuetzel-hh.de>
>>> wrote:
>>> 
>>>> GREAT stuff, Marek!
>>>> 
>>>> But sadly some crashes.
>>>> Is my LLVM git version to old?
>>>> 7. Jan 2019 (short before 8.0 cut)
>>>> 
>>>> LLVM (http://llvm.org/):
>>>> LLVM version 8.0.0svn
>>>> Optimized build.
>>>> Default target: x86_64-unknown-linux-gnu
>>>> Host CPU: nehalem
>>>> 
>>>> Registered Targets:
>>>> amdgcn - AMD GCN GPUs
>>>> r600   - AMD GPUs HD2XXX-HD6XXX
>>>> x86    - 32-bit X86: Pentium-Pro and above
>>>> x86-64 - 64-bit X86: EM64T and AMD64
>>>> 
>>>> Please have a look at my post @Phoronix:
>>>> 
>>> 
>> 
> https://www.phoronix.com/forums/forum/phoronix/latest-phoronix-articles/1079916-radeonsi-picks-up-primitive-culling-with-async-compute-for-performance-wins?p=1079984#post1079984
>>>> 
>>>> Thanks,
>>>> Dieter
>>>> 
>>>> Am 13.02.2019 06:15, schrieb Marek Olšák:
>>>>> Hi,
>>>>> 
>>>>> This patch series uses async compute to do primitive culling
>>>> before
>>>>> the vertex shader. It significantly improves performance for
>>>>> applications
>>>>> that use a lot of geometry that is invisible because primitives
>>>> don't
>>>>> intersect sample points or there are a lot of back faces, etc.
>>>>> 
>>>>> It passes 99.9999% of all tests (GL CTS, dEQP, piglit) and is
>> 100%
>>>> 
>>>>> stable.
>>>>> It supports all chips all the way from Sea Islands to Radeon
>> VII.
>>>>> 
>>>>> As you can see in the results marked (ENABLED) in the picture
>>>> below,
>>>>> it destroys our competition (The GeForce results are from a
>>>> Phoronix
>>>>> article from 2017, the latest ones I could find):
>>>>> 
>>>>> Benchmark: ParaView - Many Spheres - 2560x1440
>>>>> 
>> https://people.freedesktop.org/~mareko/prim-discard-cs-results.png
>>>>> 
>>>>> 
>>>>> The last patch describes the implementation and functional
>>>> limitations
>>>>> if you can find the huge code comment, so I'm not gonna do that
>>>> here.
>>>>> 
>>>>> I decided to enable this optimization on all Pro graphics cards.
>>>>> The reason is that I haven't had time to benchmark games.
>>>>> This decision may be changed based on community feedback, etc.
>>>>> 
>>>>> People using the Pro graphics cards can disable this by setting
>>>>> AMD_DEBUG=nopd, and people using consumer graphics cards can
>>>> enable
>>>>> this by setting AMD_DEBUG=pd. So you always have a choice.
>>>>> 
>>>>> Eventually we might also enable this on consumer graphics cards
>>>> for
>>>>> those
>>>>> games that benefit. It might decrease performance if there is
>> not
>>>>> enough
>>>>> invisible geometry.
>>>>> 
>>>>> Branch:
>>>>> https://cgit.freedesktop.org/~mareko/mesa/log/?h=prim-discard-cs
>>>>> 
>>>>> Please review.
>>>>> 
>>>>> Thanks,
>>>>> Marek
>>>>> _______________________________________________
>>>>> mesa-dev mailing list
>>>>> mesa-dev at lists.freedesktop.org
>>>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


More information about the mesa-dev mailing list