[Mesa-dev] [PATCH 00/13] Use the constant engine in radeonsi

Marek Olšák maraeo at gmail.com
Fri Apr 15 18:16:28 UTC 2016


On Thu, Apr 14, 2016 at 8:29 PM, Nicolai Hähnle <nhaehnle at gmail.com> wrote:
> On 13.04.2016 20:34, Bas Nieuwenhuizen wrote:
>>
>> This series implements updating descriptors using the constant engine.
>> This should result in a 0%-3% improvement for CPU bound applications,
>> as we only have to upload the change descriptors from the CPU.
>
>
> Thanks again for looking at this!
>
>> There are very slight performance advantages on the GPU too, as the
>> CE uploads the data directly to the L2 cache. This series reduces the
>> number of cycles waited on LGKM counters by about 1/6th across multiple
>> applications. However this translates in almost negligible performance
>> improvements.
>>
>> Only the amdgpu winsys supports this for now. I think the radeon
>> winsys can be extended to support it, but I don't have any radeon
>> hardware so it is probably better that someone else implements that.
>>
>> This series does not include support for partially updating
>> uniforms. My gallium interface proof of concept became kind of messy
>> and the constant engine patches have been requested for descriptors.
>
>
> Interesting coincidence that Jakob wrote an email about "Rework uniform
> storage" on the same day...
>
> Two related general observations on the series:
>
> 1. The CE also has a LOAD_CONST_RAM command. Instead of writing all
> descriptors to the CE IB at CS begin, perhaps you could save space and CPU
> time by using that for the first upload per CS? Though it also might make
> sense to have this depend on the dirty_mask population count.
>
> 2. It seems like it should be possible to get rid of the list_dirty flag
> now, but I actually don't think that's true, because AFAICS doing that on
> top of this series will force a re-upload of all descriptors on every new
> CS, which is clearly bad. Still, logically there really shouldn't be a
> list_dirty _and_ a dirty_mask.
>
> So how about a dirty_mask and a ce_ram_dirty flag? Do not set the dirty_mask
> on begin_new_cs, and use the dirty_mask to gate descriptor uploads. Set the
> ce_ram_dirty flag on begin_new_cs, and use it to either (a) override the
> dirty_mask as appropriate for the first descriptor upload in each CS or (b)
> emit a LOAD_CONST_RAM command.
>
> I'd appreciate if you could address these points - I don't feel too strongly
> about the LOAD_CONST_RAM optimization, but having both list_dirty and
> dirty_mask is weird.

The AMDGPU kernel driver support the preamble CE IB, which is
conditionally executed before the CE IB. It was designed to skip CE
RAM initialization at the beginning of IBs. The usage should be:

The preamble CE IB should initialize CE RAM from memory (LOAD_CONST_RAM).
The main CE & DE IBs should assume CE RAM is initialized.

The kernel driver doesn't schedule the preamble CE IB if the command
submission immediately follows a previous submission from the same
context.

Marek


More information about the mesa-dev mailing list