[Mesa-dev] [PATCH 15/18] radeonsi: upload constants into VRAM instead of GTT

Nicolai Hähnle nhaehnle at gmail.com
Fri Feb 17 09:24:41 UTC 2017


On 16.02.2017 23:36, Marek Olšák wrote:
> On Thu, Feb 16, 2017 at 4:21 PM, Nicolai Hähnle <nhaehnle at gmail.com> wrote:
>> On 16.02.2017 13:53, Marek Olšák wrote:
>>>
>>> From: Marek Olšák <marek.olsak at amd.com>
>>>
>>> This lowers lgkm wait cycles by 30% on VI and normal conditions.
>>> The might be a measurable improvement when CE is disabled (radeon)
>>> or under L2 thrashing.
>>
>>
>> Good idea. I'm just wondering if all the users of const upload end up as
>> streaming writes? I hope we don't accidentally hit some place where writes
>> from the CPU end up extremely slow, e.g. where st/mesa uploads some
>> structures.
>
> I think constant buffers always benefit from being in VRAM. If every
> CU loads a value from a constant buffer, you'll get at least 16 TC L2
> read requests on Fiji (each group of 4 CUs submits one), which can be
> misses under thrashing. This is very different from "streaming" where
> you expect to get exactly 1 read request for each piece of data.

Good point.


> The small problem with VRAM uploads may be write combining. I don't
> know the alignment at which it operates and how exactly it works. E.g.
> if we get 2 16-byte uploads aligned to 32, there is an untouched hole
> of 16 bytes. Does the hole have any effect on upload performance?
> u_upload_mgr could fill all holes if it was a problem.

So some quick googling found this: 
https://fgiesen.wordpress.com/2013/01/29/write-combining-is-not-your-friend/ 
with the main three rules in the conclusion:

- Never read from write-combined memory.
- Try to keep writes sequential. This is good style even when it’s not 
strictly necessary. On processors with picky write-combining logic, you 
might also need to use volatile or some other way to cause the compiler 
not to reorder instructions.
- Don’t leave holes. Always write large, contiguous ranges.

All uses of u_upload_data should be fine (it's just a memcpy). Your 
example of 2 16-byte uploads aligned to 32 isn't ideal, but it's 
probably not terrible.

Scanning st/mesa, I see the following potentially questionable pieces of 
code:

1) st_DrawAtlasBitmaps: If the compiler reorders the verts->XYZ writes, 
that could be bad. But this is obsolete functionality, so we don't care.
2) st_DrawTex: similar
3) st_draw_quad: Used for scissored/windowed clears. Re-ordering by the 
compiler could potentially hurt, we may want to look into that.
4) st_pbo_draw: same as st_draw_quad

The blitter only uses u_upload_data, which is fine.

Also, none of the issues affect large uploads, so I think we're good.

The patch is: Reviewed-by: Nicolai Hähnle <nicolai.haehnle at amd.com>


> Also, Feral's games upload directly to VRAM all the time. This patch
> is nothing compared to what they're doing.
>
> Marek
>



More information about the mesa-dev mailing list