[PATCH 21/24] drm/amdkfd: add queue remapping

James Zhu jamesz at amd.com
Thu Nov 23 19:49:20 UTC 2023


On 2023-11-23 14:02, Felix Kuehling wrote:
> On 2023-11-23 11:25, James Zhu wrote:
>>
>> On 2023-11-22 17:35, Felix Kuehling wrote:
>>>
>>> On 2023-11-03 09:11, James Zhu wrote:
>>>> Add queue remapping to force the waves in any running
>>>> processes to complete a CWSR trap.
>>>
>>> Please add an explanation why this is needed.
>>
>> [JZ] Even though the profiling-enabled bits is turned off, the CWSR 
>> trap handlers for some kernels with this process may still in running 
>> stage, this will
>>
>> force the waves in any running processes to complete a CWSR trap, and 
>> make sure pc sampling is completely stopped with this process.   I 
>> will add it later.
>
> It may be confusing to talk specifically about "CWSR trap handler". 
> There is only one trap handler that is triggered by different events: 
> CWSR, host trap, s_trap instructions, exceptions, etc. When a new trap 
> triggers, it serializes with any currently running trap handler in 
> that wavefront. So it seems that you're using CWSR as a way to ensure 
> that any host trap has completed: CWSR will wait for previous traps to 
> finish before trapping again for CWSR, the HWS firmware waits for CWSR 
> completion and the driver waits for HWS to finish CWSR with a fence on 
> a HIQ QUERY_STATUS packet. Is that correct?
[JZ] I think your explanation is more detail. Need Joseph to confirm.
>
> Regards,
>   Felix
>
>
>>
>>>
>>>
>>>>
>>>> Signed-off-by: James Zhu <James.Zhu at amd.com>
>>>> ---
>>>>   drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 11 
>>>> +++++++++++
>>>>   drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h |  5 +++++
>>>>   drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c          |  3 +++
>>>>   3 files changed, 19 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
>>>> b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>>>> index c0e71543389a..a3f57be63f4f 100644
>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
>>>> @@ -3155,6 +3155,17 @@ int debug_refresh_runlist(struct 
>>>> device_queue_manager *dqm)
>>>>       return debug_map_and_unlock(dqm);
>>>>   }
>>>>   +void remap_queue(struct device_queue_manager *dqm,
>>>> +                enum kfd_unmap_queues_filter filter,
>>>> +                uint32_t filter_param,
>>>> +                uint32_t grace_period)
>>>
>>> Not sure if you need the filter and grace period parameters in this 
>>> function. What's the point of exposing that to callers who just want 
>>> to trigger a CWSR? You could also change the function name to 
>>> reflect the purpose of the function, rather than the implementation.
>> [JZ] Just want to create a general function in case that used by 
>> others. I am fine to remove passing filter_param/grace_period
>>>
>>> Regards,
>>>   Felix
>>>
>>>
>>>> +{
>>>> +    dqm_lock(dqm);
>>>> +    if (!dqm->dev->kfd->shared_resources.enable_mes)
>>>> +        execute_queues_cpsch(dqm, filter, filter_param, 
>>>> grace_period);
>>>> +    dqm_unlock(dqm);
>>>> +}
>>>> +
>>>>   #if defined(CONFIG_DEBUG_FS)
>>>>     static void seq_reg_dump(struct seq_file *m,
>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h 
>>>> b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
>>>> index cf7e182588f8..f8aae3747a36 100644
>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
>>>> @@ -303,6 +303,11 @@ int debug_lock_and_unmap(struct 
>>>> device_queue_manager *dqm);
>>>>   int debug_map_and_unlock(struct device_queue_manager *dqm);
>>>>   int debug_refresh_runlist(struct device_queue_manager *dqm);
>>>>   +void remap_queue(struct device_queue_manager *dqm,
>>>> +                enum kfd_unmap_queues_filter filter,
>>>> +                uint32_t filter_param,
>>>> +                uint32_t grace_period);
>>>> +
>>>>   static inline unsigned int get_sh_mem_bases_32(struct 
>>>> kfd_process_device *pdd)
>>>>   {
>>>>       return (pdd->lds_base >> 16) & 0xFF;
>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
>>>> b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
>>>> index e8f0559b618e..66670cdb813a 100644
>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
>>>> @@ -24,6 +24,7 @@
>>>>   #include "kfd_priv.h"
>>>>   #include "amdgpu_amdkfd.h"
>>>>   #include "kfd_pc_sampling.h"
>>>> +#include "kfd_device_queue_manager.h"
>>>>     struct supported_pc_sample_info {
>>>>       uint32_t ip_version;
>>>> @@ -164,6 +165,8 @@ static int kfd_pc_sample_stop(struct 
>>>> kfd_process_device *pdd,
>>>> cancel_work_sync(&pdd->dev->pcs_data.hosttrap_entry.base.pc_sampling_work); 
>>>>
>>>> kfd_process_set_trap_pc_sampling_flag(&pdd->qpd,
>>>> pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, false);
>>>> +        remap_queue(pdd->dev->dqm,
>>>> +            KFD_UNMAP_QUEUES_FILTER_ALL_QUEUES, 0, 
>>>> USE_DEFAULT_GRACE_PERIOD);
>>>>             mutex_lock(&pdd->dev->pcs_data.mutex);
>>>> pdd->dev->pcs_data.hosttrap_entry.base.target_simd = 0;


More information about the amd-gfx mailing list