[PATCH] drm/amdgpu: Fix Manual Execution of Cleaner Shader in Gang Submissions

Alex Deucher alexdeucher at gmail.com
Fri Mar 28 14:58:15 UTC 2025


On Thu, Mar 27, 2025 at 9:50 AM Christian König
<christian.koenig at amd.com> wrote:
>
> Am 27.03.25 um 10:37 schrieb SRINIVASAN SHANMUGAM:
>
> On 3/27/2025 2:54 PM, Christian König wrote:
>
> Over all this change doesn't seem to make much sense to me.
>
> Why exactly is isolation->spearhead not pointing to the dummy kernel job we submit?
>
> Does the owner check or gang_submit check in
> amdgpu_device_enforce_isolation() fail to set up the spearhead?
>
> I'm currently debugging exactly that.
>
> Good news is that I can reproduce the problem.
>
>
> I have to take that back. I've tested the cleaner shader functionality a bit this morning and as far as I can see this works exactly as intended.
>
> Srini, what exactly is your use case which doesn't work?
>
> Hi Christian, Good Morning!
>
> The usecase is to trigger the cleaner shader, using sysfs "run_cleaner_shader" independent of  enabling "enforce_isolation", so that cleaner shader packet gets submitted to COMP_1.0.0 ring by default, without prior enabling any enforce_isolation via sysfs,
>
>
> I've tested exactly that and it seems to work perfectly fine:
>    kworker/u96:1-209     [020] .....    86.655999: amdgpu_isolation: prev=0000000000000000, next=ffffffffffffffff
>    kworker/u96:1-209     [020] .....    86.656190: amdgpu_cleaner_shader: ring=gfx_0.0.0, seqno=2
>            <...>-11      [022] .....   150.607688: amdgpu_isolation: prev=ffffffffffffffff, next=0000000000000000
>    kworker/u96:0-11      [022] .....   150.608228: amdgpu_cleaner_shader: ring=comp_1.0.0, seqno=2
>    kworker/u96:0-11      [022] .....   150.620597: amdgpu_isolation: prev=0000000000000000, next=ffffffffffffffff
>    kworker/u96:0-11      [022] .....   150.620624: amdgpu_cleaner_shader: ring=gfx_0.0.0, seqno=1527
>
>
> The only thing which might be confusing is that when you issue the cleaner shader multiple times when the GPU is idle it would only run once.
>
> But that should be easy to change if necessary.

The problem is that it doesn't take into account KFD jobs.  We need to
be able to run the cleaner shader even if there have been no KGD jobs,

Alex

>
> Regards,
> Christian.
>
> AFAIK, this "isolation->spearhead" initialization is not being takencare in this path "amdgpu_gfx_run_cleaner_shader -> amdgpu_gfx_run_cleaner_shader_job" (ie., when we trigger cleaner shader, using sysfs "run_cleaner_shader"), and this check "&job->base.s_fence->scheduled == isolation->spearhead;"  is having the problem ie., "&job->base.s_fence->scheduled" address are is not matching with "isolation->spearhead" address, which results into zero & thus fails to emit cleaner shader, when running using "run_cleaner_shader" sysfs entry, in "amdgpu_vm_flush()" function
>
> Best regards,
>
> Srini
>
>
> Regards,
> Christian.
>
> Regards,
> Christian.
>
>


More information about the amd-gfx mailing list