[PATCH 0/5] radeon: Write-combined CPU mappings of BOs in GTT

Marek Olšák maraeo at gmail.com
Fri Jul 18 10:47:09 PDT 2014


On Fri, Jul 18, 2014 at 5:47 PM, Christian König
<deathsimple at vodafone.de> wrote:
> Am 18.07.2014 05:07, schrieb Michel Dänzer:
>>>>
>>>> [PATCH 5/5] drm/radeon: Use VRAM for indirect buffers on >= SI
>>>
>>> I'm still not very keen with this change since I still don't understand
>>> the reason why it's faster than with GTT. Definitely needs more testing
>>> on a wider range of systems.
>>
>> Sure. If anyone wants to give this patch a spin and see if they can
>> measure any performance difference, good or bad, that would be
>> interesting.
>>
>>> Maybe limit it to APUs for now?
>>
>> But IIRC, CPU writes to VRAM vs. write-combined GTT are actually an even
>> bigger win with dedicated GPUs than with the Kaveri built-in GPU on my
>> system. I suspect it may depend on the bandwidth available for PCIe vs.
>> system memory though.
>
>
> I've made a few tests today with the kernel part of the patches running
> Xonotic on Ultra in 1920 x 1080.
>
> Without any patches I get around ~47.0fps on average with my dedicated
> HD7870.
>
> Adding only "drm/radeon: Use write-combined CPU mappings of rings and IBs on
>>= SI" and that goes down to ~45.3fps.
>
> Adding on to off that "drm/radeon: Use VRAM for indirect buffers on >= SI"
> and the frame rate goes down to ~27.74fps.
>
> So enabling this unconditionally is definitely not a good idea. What I don't
> understand yet is why using USWC reduces the fps on SI as well. It looks
> like the reads from the IB buffer for command stream validation on SI affect
> that more than thought.

Yes, there is a CS parser with SI, but shouldn't the parser read from
the CPU copy that came with the ioctl instead? Anyway, I recommend
only using VRAM for IBs which are not parsed and patched by the CPU
(which reduces it down to CIK graphics and DMA IBs, right?)

Marek


More information about the dri-devel mailing list