[Mesa-dev] [PATCH] radeonsi: Enable VGPR spilling for all shader types v3

Wed Jan 21 03:20:42 PST 2015

On Wed, Jan 21, 2015 at 3:03 AM, Michel Dänzer <michel at daenzer.net> wrote:
> On 20.01.2015 22:39, Marek Olšák wrote:
>> The problem with CPDMA (DMA_DATA and WRITE_DATA) is that the ordering
>> of flushes must be correct. First, partial flushes must be done, so
>> that the shaders are idle.
>
> That's only necessary when reusing a single BO for the shader code, not
> when allocating a new BO when the relocations change, right?

Yes.

>
>
>> Then you can use CP DMA to update the binary. After that, ICACHE should
>> be invalidated.
>
> ICACHE has to be invalidated when writing with the CPU as well, right?

Yes, but the invalidation at the beginning of IBs is sufficient for
all CPU accesses, so nothing needs to be done.

>
>
>> The problem with mapping VRAM can be avoided by keeping a CPU copy of
>> the binary from the beginning. We would only need a CPU copy of those
>> shaders that use the scratch buffer. Then, you wouldn't have to read
>> VRAM at all, which would make it even simpler.
>
> Right, but CPU writes to the new BO in VRAM could cause stalls anyway.

If CPU writes are the problem, we can create a temporary BO in GTT,
upload and update the shader there, and copy it to the shader BO in
VRAM using CPDMA. In this case, the shader BO in VRAM doesn't have to
be reallocated, and shader state doesn't have to be re-emitted. Only
the ICACHE should be flushed after CPDMA.

One copy packet is better than a lot of small WRITE_DATA packets.

Marek