[PATCH 0/5] radeon: Write-combined CPU mappings of BOs in GTT

Dieter Nützel Dieter at nuetzel-hh.de
Thu Jul 17 20:58:13 PDT 2014


Am 18.07.2014 05:07, schrieb Michel Dänzer:
> On 17.07.2014 19:09, Christian König wrote:
>> Am 17.07.2014 12:01, schrieb Michel Dänzer:
>>> In order to try and improve X(Shm)PutImage performance with glamor, I
>>> implemented support for write-combined CPU mappings of BOs in GTT.
>>> 
>>> This did provide a nice speedup, but to my surprise, using VRAM 
>>> instead
>>> of write-combined GTT turned out to be even faster in general on my
>>> Kaveri machine, both for the internal GPU and for discrete GPUs.
>>> 
>>> However, I've kept the changes from GTT to VRAM separated, in case 
>>> this
>>> turns out to be a loss on other setups.
>>> 
>>> Kernel patches:
>>> 
>>> [PATCH 1/5] drm/radeon: Remove radeon_gart_restore()
>>> [PATCH 2/5] drm/radeon: Pass GART page flags to
>>> [PATCH 3/5] drm/radeon: Allow write-combined CPU mappings of BOs in
>>> [PATCH 4/5] drm/radeon: Use write-combined CPU mappings of rings and
>> 
>> Those four are Reviewed-by: Christian König <christian.koenig at amd.com>
> 
> Thanks!
> 
> 
>>> [PATCH 5/5] drm/radeon: Use VRAM for indirect buffers on >= SI
>> 
>> I'm still not very keen with this change since I still don't 
>> understand
>> the reason why it's faster than with GTT. Definitely needs more 
>> testing
>> on a wider range of systems.
> 
> Sure. If anyone wants to give this patch a spin and see if they can
> measure any performance difference, good or bad, that would be 
> interesting.
> 
>> Maybe limit it to APUs for now?
> 
> But IIRC, CPU writes to VRAM vs. write-combined GTT are actually an 
> even
> bigger win with dedicated GPUs than with the Kaveri built-in GPU on my
> system. I suspect it may depend on the bandwidth available for PCIe vs.
> system memory though.

Michel,

please, please do NOT change anything on this!;-)
You all know that I currently can only run this on my poor Duron 1800 
with RV730 (AGP), but...

With this all 'objview' demos (mesa-demos) run at 60 fps (vsync),
even with chip set/CPU power management enabled (athcool on).

If I set vblank_mode=0
the slowest GreatLakesBiplaneHP.obj
run at ~100 fps (~16 fps before) => 6x speedup.
(Even 5 planes run at 30 fps) - Wow!!!

'buddha' went from ~40 fps up to ~175 fps
'bunny' went from ~60 fps up to ~215 fps
'bobcat' show not such a big improvement 'only' 70 fps more

R600_HYPERZ=1
help somewhat, too but not in all cases.

Overall X/Kwin eXperience is much better.
Let me know which benchmarks you need.

Cheers,
   Dieter

BTW Do anyone know how I can override BIOS GTT settings?
I can only set 256 MB max. - BIOS patching?
With agpmode=-1 I can run with 1024 MB GTT


More information about the dri-devel mailing list