[PATCH 21/24] drm/amdkfd: add queue remapping

Felix Kuehling felix.kuehling at amd.com
Thu Nov 23 19:02:31 UTC 2023


On 2023-11-23 11:25, James Zhu wrote:
>
> On 2023-11-22 17:35, Felix Kuehling wrote:
>>
>> On 2023-11-03 09:11, James Zhu wrote:
>>> Add queue remapping to force the waves in any running
>>> processes to complete a CWSR trap.
>>
>> Please add an explanation why this is needed.
>
> [JZ] Even though the profiling-enabled bits is turned off, the CWSR 
> trap handlers for some kernels with this process may still in running 
> stage, this will
>
> force the waves in any running processes to complete a CWSR trap, and 
> make sure pc sampling is completely stopped with this process.   I 
> will add it later.

It may be confusing to talk specifically about "CWSR trap handler". 
There is only one trap handler that is triggered by different events: 
CWSR, host trap, s_trap instructions, exceptions, etc. When a new trap 
triggers, it serializes with any currently running trap handler in that 
wavefront. So it seems that you're using CWSR as a way to ensure that 
any host trap has completed: CWSR will wait for previous traps to finish 
before trapping again for CWSR, the HWS firmware waits for CWSR 
completion and the driver waits for HWS to finish CWSR with a fence on a 
HIQ QUERY_STATUS packet. Is that correct?

Regards,
   Felix


>
>>
>>
>>>
>>> Signed-off-by: James Zhu <James.Zhu at amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 11 
>>> +++++++++++
>>>   drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h |  5 +++++
>>>   drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c          |  3 +++
>>>   3 files changed, 19 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>>> index c0e71543389a..a3f57be63f4f 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>>> @@ -3155,6 +3155,17 @@ int debug_refresh_runlist(struct 
>>> device_queue_manager *dqm)
>>>       return debug_map_and_unlock(dqm);
>>>   }
>>>   +void remap_queue(struct device_queue_manager *dqm,
>>> +                enum kfd_unmap_queues_filter filter,
>>> +                uint32_t filter_param,
>>> +                uint32_t grace_period)
>>
>> Not sure if you need the filter and grace period parameters in this 
>> function. What's the point of exposing that to callers who just want 
>> to trigger a CWSR? You could also change the function name to reflect 
>> the purpose of the function, rather than the implementation.
> [JZ] Just want to create a general function in case that used by 
> others. I am fine to remove passing filter_param/grace_period
>>
>> Regards,
>>   Felix
>>
>>
>>> +{
>>> +    dqm_lock(dqm);
>>> +    if (!dqm->dev->kfd->shared_resources.enable_mes)
>>> +        execute_queues_cpsch(dqm, filter, filter_param, grace_period);
>>> +    dqm_unlock(dqm);
>>> +}
>>> +
>>>   #if defined(CONFIG_DEBUG_FS)
>>>     static void seq_reg_dump(struct seq_file *m,
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h 
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
>>> index cf7e182588f8..f8aae3747a36 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
>>> @@ -303,6 +303,11 @@ int debug_lock_and_unmap(struct 
>>> device_queue_manager *dqm);
>>>   int debug_map_and_unlock(struct device_queue_manager *dqm);
>>>   int debug_refresh_runlist(struct device_queue_manager *dqm);
>>>   +void remap_queue(struct device_queue_manager *dqm,
>>> +                enum kfd_unmap_queues_filter filter,
>>> +                uint32_t filter_param,
>>> +                uint32_t grace_period);
>>> +
>>>   static inline unsigned int get_sh_mem_bases_32(struct 
>>> kfd_process_device *pdd)
>>>   {
>>>       return (pdd->lds_base >> 16) & 0xFF;
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
>>> index e8f0559b618e..66670cdb813a 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
>>> @@ -24,6 +24,7 @@
>>>   #include "kfd_priv.h"
>>>   #include "amdgpu_amdkfd.h"
>>>   #include "kfd_pc_sampling.h"
>>> +#include "kfd_device_queue_manager.h"
>>>     struct supported_pc_sample_info {
>>>       uint32_t ip_version;
>>> @@ -164,6 +165,8 @@ static int kfd_pc_sample_stop(struct 
>>> kfd_process_device *pdd,
>>> cancel_work_sync(&pdd->dev->pcs_data.hosttrap_entry.base.pc_sampling_work); 
>>>
>>> kfd_process_set_trap_pc_sampling_flag(&pdd->qpd,
>>> pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, false);
>>> +        remap_queue(pdd->dev->dqm,
>>> +            KFD_UNMAP_QUEUES_FILTER_ALL_QUEUES, 0, 
>>> USE_DEFAULT_GRACE_PERIOD);
>>>             mutex_lock(&pdd->dev->pcs_data.mutex);
>>> pdd->dev->pcs_data.hosttrap_entry.base.target_simd = 0;


More information about the amd-gfx mailing list