[Mesa-dev] [PATCH 0/5] radeon: Write-combined CPU mappings of BOs in GTT
Dieter Nützel
Dieter at nuetzel-hh.de
Thu Jul 17 20:58:13 PDT 2014
Am 18.07.2014 05:07, schrieb Michel Dänzer:
> On 17.07.2014 19:09, Christian König wrote:
>> Am 17.07.2014 12:01, schrieb Michel Dänzer:
>>> In order to try and improve X(Shm)PutImage performance with glamor, I
>>> implemented support for write-combined CPU mappings of BOs in GTT.
>>>
>>> This did provide a nice speedup, but to my surprise, using VRAM
>>> instead
>>> of write-combined GTT turned out to be even faster in general on my
>>> Kaveri machine, both for the internal GPU and for discrete GPUs.
>>>
>>> However, I've kept the changes from GTT to VRAM separated, in case
>>> this
>>> turns out to be a loss on other setups.
>>>
>>> Kernel patches:
>>>
>>> [PATCH 1/5] drm/radeon: Remove radeon_gart_restore()
>>> [PATCH 2/5] drm/radeon: Pass GART page flags to
>>> [PATCH 3/5] drm/radeon: Allow write-combined CPU mappings of BOs in
>>> [PATCH 4/5] drm/radeon: Use write-combined CPU mappings of rings and
>>
>> Those four are Reviewed-by: Christian König <christian.koenig at amd.com>
>
> Thanks!
>
>
>>> [PATCH 5/5] drm/radeon: Use VRAM for indirect buffers on >= SI
>>
>> I'm still not very keen with this change since I still don't
>> understand
>> the reason why it's faster than with GTT. Definitely needs more
>> testing
>> on a wider range of systems.
>
> Sure. If anyone wants to give this patch a spin and see if they can
> measure any performance difference, good or bad, that would be
> interesting.
>
>> Maybe limit it to APUs for now?
>
> But IIRC, CPU writes to VRAM vs. write-combined GTT are actually an
> even
> bigger win with dedicated GPUs than with the Kaveri built-in GPU on my
> system. I suspect it may depend on the bandwidth available for PCIe vs.
> system memory though.
Michel,
please, please do NOT change anything on this!;-)
You all know that I currently can only run this on my poor Duron 1800
with RV730 (AGP), but...
With this all 'objview' demos (mesa-demos) run at 60 fps (vsync),
even with chip set/CPU power management enabled (athcool on).
If I set vblank_mode=0
the slowest GreatLakesBiplaneHP.obj
run at ~100 fps (~16 fps before) => 6x speedup.
(Even 5 planes run at 30 fps) - Wow!!!
'buddha' went from ~40 fps up to ~175 fps
'bunny' went from ~60 fps up to ~215 fps
'bobcat' show not such a big improvement 'only' 70 fps more
R600_HYPERZ=1
help somewhat, too but not in all cases.
Overall X/Kwin eXperience is much better.
Let me know which benchmarks you need.
Cheers,
Dieter
BTW Do anyone know how I can override BIOS GTT settings?
I can only set 256 MB max. - BIOS patching?
With agpmode=-1 I can run with 1024 MB GTT
More information about the mesa-dev
mailing list