[Mesa-dev] [PATCH] radeonsi: Enable VGPR spilling for all shader types v3

Tue Jan 27 03:00:08 PST 2015

On Sat, Jan 24, 2015 at 4:32 AM, Michel Dänzer <michel at daenzer.net> wrote:
> On 24.01.2015 11:56, Tom Stellard wrote:
>> On Thu, Jan 22, 2015 at 11:27:32AM +0900, Michel Dänzer wrote:
>>>
>>> Tom, for now I suggest this solution, summarized from Marek's previous
>>> descriptions:
>>>
>>> (At least) for shaders which have relocations, keep a copy of the
>>> machine code in malloced memory. When the relocated values change,
>>> update them in the malloced memory, allocate a new BO, map it, copy the
>>> machine code from the malloced memory to the BO, replace any existing
>>> shader BO with the new one and invalidate the shader state.
>>>
>>
>> Hi,
>>
>> Attached is a WIP patch attempting to implement it this way.
>> Unfortunately, I was unable to get it working, so I wanted to
>> submit it for review in case someone can spot what I'm doing wrong.
>>
>> You can find the broken code wrapped in #if 0 in the
>> si_update_scratch_buffer() function in si_state_shaders.c
>>
>> Based on the dmesg output and other tests I've done, it appears
>> that the GPU is still executing the shader code from the old bo
>> which does not contain the relocations.
>>
>> The code in the #else branch works fine, but it updates the existing
>> bo in place rather than creating a new one.
>>
>> Any idea what I've done wrong?
>
> [...]
>
>> +             /* Update the shaders, so they are using the latest scratch.  The
>> +              * scratch buffer may have been changed since these shaders were
>> +              * last used, so we still need to try to update them, even if
>> +              * they require scratch buffers smaller than the current size.
>> +              */
>> +             if (si_update_scratch_buffer(sctx, sctx->ps_shader))
>> +                     sctx->emitted.named.ps = NULL;
>> +             if (si_update_scratch_buffer(sctx, sctx->gs_shader))
>> +                     sctx->emitted.named.gs = NULL;
>> +             if (si_update_scratch_buffer(sctx, sctx->vs_shader))
>> +                     sctx->emitted.named.vs = NULL;
>
> Does this work instead?
>
>                 if (si_update_scratch_buffer(sctx, sctx->ps_shader))
>                         si_pm4_bind_state(sctx, ps, sctx->ps_shader->current->pm4);
>                 if (si_update_scratch_buffer(sctx, sctx->gs_shader))
>                         si_pm4_bind_state(sctx, gs, sctx->gs_shader->current->pm4);
>                 if (si_update_scratch_buffer(sctx, sctx->vs_shader))
>                         si_pm4_bind_state(sctx, vs, sctx->vs_shader->current->pm4);

Yes, this should work. Reallocating a pm4 state (by calling
si_shader_init_pm4_state) isn't enough to use it. It must be bound
too, because it's a brand new state.

The old pm4 state is leaked though. The attached diff should fix it.

Marek
-------------- next part --------------

diff --git a/src/gallium/drivers/radeonsi/si_pm4.c b/src/gallium/drivers/radeonsi/si_pm4.c
index 954eb6e..0acb257 100644
--- a/src/gallium/drivers/radeonsi/si_pm4.c
+++ b/src/gallium/drivers/radeonsi/si_pm4.c
@@ -103,6 +103,13 @@ void si_pm4_add_bo(struct si_pm4_state *state,
 	state->bo_priority[idx] = priority;
 }
 
+void si_pm4_free_state_simple(struct si_pm4_state *state)
+{
+	for (int i = 0; i < state->nbo; ++i)
+		r600_resource_reference(&state->bo[i], NULL);
+	FREE(state);
+}
+
 void si_pm4_free_state(struct si_context *sctx,
 		       struct si_pm4_state *state,
 		       unsigned idx)
@@ -114,10 +121,7 @@ void si_pm4_free_state(struct si_context *sctx,
 		sctx->emitted.array[idx] = NULL;
 	}
 
-	for (int i = 0; i < state->nbo; ++i) {
-		r600_resource_reference(&state->bo[i], NULL);
-	}
-	FREE(state);
+	si_pm4_free_state_simple(state);
 }
 
 unsigned si_pm4_dirty_dw(struct si_context *sctx)
diff --git a/src/gallium/drivers/radeonsi/si_pm4.h b/src/gallium/drivers/radeonsi/si_pm4.h
index c4d6098..d215882 100644
--- a/src/gallium/drivers/radeonsi/si_pm4.h
+++ b/src/gallium/drivers/radeonsi/si_pm4.h
@@ -71,6 +71,7 @@ void si_pm4_add_bo(struct si_pm4_state *state,
 		   enum radeon_bo_usage usage,
 		   enum radeon_bo_priority priority);
 
+void si_pm4_free_state_simple(struct si_pm4_state *state);
 void si_pm4_free_state(struct si_context *sctx,
 		       struct si_pm4_state *state,
 		       unsigned idx);
diff --git a/src/gallium/drivers/radeonsi/si_state_shaders.c b/src/gallium/drivers/radeonsi/si_state_shaders.c
index 810aba8..53ffb3c 100644
--- a/src/gallium/drivers/radeonsi/si_state_shaders.c
+++ b/src/gallium/drivers/radeonsi/si_state_shaders.c
@@ -326,6 +326,9 @@ static void si_shader_ps(struct si_screen *sscreen, struct si_shader *shader)
 static void si_shader_init_pm4_state(struct si_screen *sscreen,
 				     struct si_shader *shader)
 {
+	if (shader->pm4)
+		si_pm4_free_state_simple(shader->pm4);
+
 	switch (shader->selector->type) {
 	case PIPE_SHADER_VERTEX:
 		if (shader->key.vs.as_es)