[Mesa-dev] [PATCH 0/5] radeon: Write-combined CPU mappings of BOs in GTT

Christian König deathsimple at vodafone.de
Wed Jul 23 00:32:04 PDT 2014

Am 23.07.2014 09:21, schrieb Michel Dänzer:
> On 23.07.2014 15:42, Christian König wrote:
>> Am 23.07.2014 05:54, schrieb Michel Dänzer:
>>> On 21.07.2014 17:07, Christian König wrote:
>>>> Am 19.07.2014 03:15, schrieb Michel Dänzer:
>>>>> On 19.07.2014 00:47, Christian König wrote:
>>>>>> Am 18.07.2014 05:07, schrieb Michel Dänzer:
>>>>>>>>> [PATCH 5/5] drm/radeon: Use VRAM for indirect buffers on >= SI
>>>>>>>> I'm still not very keen with this change since I still don't
>>>>>>>> understand
>>>>>>>> the reason why it's faster than with GTT. Definitely needs more
>>>>>>>> testing
>>>>>>>> on a wider range of systems.
>>>>>>> Sure. If anyone wants to give this patch a spin and see if they can
>>>>>>> measure any performance difference, good or bad, that would be
>>>>>>> interesting.
>>>>>>>> Maybe limit it to APUs for now?
>>>>>>> But IIRC, CPU writes to VRAM vs. write-combined GTT are actually an
>>>>>>> even
>>>>>>> bigger win with dedicated GPUs than with the Kaveri built-in GPU
>>>>>>> on my
>>>>>>> system. I suspect it may depend on the bandwidth available for
>>>>>>> PCIe vs.
>>>>>>> system memory though.
>>>>>> I've made a few tests today with the kernel part of the patches
>>>>>> running
>>>>>> Xonotic on Ultra in 1920 x 1080.
>>>>>> Without any patches I get around ~47.0fps on average with my dedicated
>>>>>> HD7870.
>>>>>> Adding only "drm/radeon: Use write-combined CPU mappings of rings and
>>>>>> IBs on >= SI" and that goes down to ~45.3fps.
>>>>>> Adding on to off that "drm/radeon: Use VRAM for indirect buffers on >=
>>>>>> SI" and the frame rate goes down to ~27.74fps.
>>>>> Hmm, looks like I'll need to do more benchmarking of 3D workloads as
>>>>> well.
>>> I haven't been able to consistently[0] measure any significant
>>> difference between all placements of the rings and IBs with Xonotic or
>>> Reaction Quake with my Bonaire. I'd expect Xonotic to be shader / GPU
>>> memory bandwidth bound rather than CS bound anyway, so a ~40% hit from
>>> that kernel patch alone is very surprising. Are you sure it wasn't just
>>> the same kind of variation as described below?
>> Yes, I've measured that multiple times and the results where quite
>> consistent.
>> But I didn't measured it on a Bonaire, where the bottleneck probably
>> isn't the CPU load. I measured it on a fast Pitcairn
> Ahem, my Bonaire is cranking out ~90fps of Xonotic Ultra at 1920x1080.
> :) (And AFAIK there are even faster Bonaire variants)

My Bonaire only makes something around 17fps with Xonotic Ultra at 
1920x1080, might be a good idea to figure out why at some point.

>> and there Xonotic was clearly affected by the patches.
> Okay, I hadn't realized we're not doing any command stream checking as
> of CIK, that probably explains the difference.

Good point, I should probably test the putting IBs in VRAM patch with my 
Bonaire as well.

>>>> My tests clearly show that we still can use USWC for the ring buffer on
>>>> SI and probably earlier chips as well.
>>> Yeah, that might be the safest approach for now.
>> How about using USWC for the rings on all chips since R600
> Any particular reason against doing it for older chips which support
> unsnooped access as well?

Not really, I just didn't noticed that older chips can do this as well.


>> and for the IB only on CIK? As far as I can see that should do the trick
>> quite well.
> Yeah, sounds good.

More information about the mesa-dev mailing list