[PATCH 19/32] drm/amdkfd: add runtime enable operation

Thu Mar 23 19:45:20 UTC 2023

[AMD Official Use Only - General]

> -----Original Message-----
> From: Kuehling, Felix <Felix.Kuehling at amd.com>
> Sent: Monday, March 20, 2023 8:31 PM
> To: Kim, Jonathan <Jonathan.Kim at amd.com>; amd-
> gfx at lists.freedesktop.org; dri-devel at lists.freedesktop.org
> Subject: Re: [PATCH 19/32] drm/amdkfd: add runtime enable operation
>
>
> On 2023-01-25 14:53, Jonathan Kim wrote:
> > The debugger can attach to a process prior to HSA enablement (i.e.
> > inferior is spawned by the debugger and attached to immediately before
> > target process has been enabled for HSA dispatches) or it
> > can attach to a running target that is already HSA enabled.  Either
> > way, the debugger needs to know the enablement status to know when
> > it can inspect queues.
> >
> > For the scenario where the debugger spawns the target process,
> > it will have to wait for ROCr's runtime enable request from the target.
> > The runtime enable request will be able to see that its process has been
> > debug attached.  ROCr raises an EC_PROCESS_RUNTIME signal to the
> > debugger then blocks the target process while waiting the debugger's
> > response. Once the debugger has received the runtime signal, it will
> > unblock the target process.
> >
> > For the scenario where the debugger attaches to a running target
> > process, ROCr will set the target process' runtime status as enabled so
> > that on an attach request, the debugger will be able to see this
> > status and will continue with debug enablement as normal.
> >
> > A secondary requirement is to conditionally enable the trap tempories
> only
> > if the user requests it (env var HSA_ENABLE_DEBUG=1) or if the debugger
> > attaches with HSA runtime enabled.  This is because setting up the trap
> > temporaries incurs a performance overhead that is unacceptable for
> > microbench performance in normal mode for certain customers.
> >
> > In the scenario where the debugger spawns the target process, when ROCr
> > detects that the debugger has attached during the runtime enable
> > request, it will enable the trap temporaries before it blocks the target
> > process while waiting for the debugger to respond.
> >
> > In the scenario where the debugger attaches to a running target process,
> > it will enable to trap temporaries itself.
> >
> > Finally, there is an additional restriction that is required to be
> > enforced with runtime enable and HW debug mode setting. The debugger
> must
> > first ensure that HW debug mode has been enabled before permitting HW
> debug
> > mode operations.
> >
> > With single process debug devices, allowing the debugger to set debug
> > HW modes prior to trap activation means that debug HW mode setting can
> > occur before the KFD has reserved the debug VMID (0xf) from the hardware
> > scheduler's VMID allocation resource pool.  This can result in the
> > hardware scheduler assigning VMID 0xf to a non-debugged process and
> > having that process inherit debug HW mode settings intended for the
> > debugged target process instead, which is both incorrect and potentially
> > fatal for normal mode operation.
> >
> > With multi process debug devices, allowing the debugger to set debug
> > HW modes prior to trap activation means that non-debugged processes
> > migrating to a new VMID could inherit unintended debug settings.
> >
> > All debug operations that touch HW settings must require trap activation
> > where trap activation is triggered by both debug attach and runtime
> > enablement (target has KFD opened and is ready to dispatch work).
> >
> > v2: fix up hierarchy of semantics in description.
> >
> > Signed-off-by: Jonathan Kim <jonathan.kim at amd.com>
> > ---
> >   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 150
> ++++++++++++++++++++++-
> >   drivers/gpu/drm/amd/amdkfd/kfd_debug.c   |   6 +-
> >   drivers/gpu/drm/amd/amdkfd/kfd_debug.h   |   4 +
> >   drivers/gpu/drm/amd/amdkfd/kfd_priv.h    |   1 +
> >   4 files changed, 157 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> > index 09fe8576dc8c..46f9d453dc5e 100644
> > --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> > @@ -2654,11 +2654,147 @@ static int kfd_ioctl_criu(struct file *filep,
> struct kfd_process *p, void *data)
> >     return ret;
> >   }
> >
> > -static int kfd_ioctl_runtime_enable(struct file *filep, struct kfd_process *p,
> void *data)
> > +static int runtime_enable(struct kfd_process *p, uint64_t r_debug,
> > +                   bool enable_ttmp_setup)
> >   {
> > +   int i = 0, ret = 0;
> > +
> > +   if (p->is_runtime_retry)
> > +           goto retry;
> > +
> > +   if (p->runtime_info.runtime_state !=
> DEBUG_RUNTIME_STATE_DISABLED)
> > +           return -EBUSY;
> > +
> > +   for (i = 0; i < p->n_pdds; i++) {
> > +           struct kfd_process_device *pdd = p->pdds[i];
> > +
> > +           if (pdd->qpd.queue_count)
> > +                   return -EEXIST;
> > +   }
> > +
> > +   p->runtime_info.runtime_state =
> DEBUG_RUNTIME_STATE_ENABLED;
> > +   p->runtime_info.r_debug = r_debug;
> > +   p->runtime_info.ttmp_setup = enable_ttmp_setup;
> > +
> > +   if (p->runtime_info.ttmp_setup) {
> > +           for (i = 0; i < p->n_pdds; i++) {
> > +                   struct kfd_process_device *pdd = p->pdds[i];
> > +
> > +                   if (!kfd_dbg_is_rlc_restore_supported(pdd->dev)) {
> > +                           amdgpu_gfx_off_ctrl(pdd->dev->adev, false);
> > +                           pdd->dev->kfd2kgd->enable_debug_trap(
> > +                                           pdd->dev->adev,
> > +                                           true,
> > +                                           pdd->dev-
> >vm_info.last_vmid_kfd);
> > +                   }
> > +
> > +                   if (kfd_dbg_is_per_vmid_supported(pdd->dev)) {
>
> Should this be else-if? It seems weird that enable_debug_trap could be
> called twice in a row. If RLC restore is only applicable on
> single-process debug devices, then maybe put the per-VMID case first.
>
>
> > +                           pdd->spi_dbg_override = pdd->dev->kfd2kgd-
> >enable_debug_trap(
> > +                                           pdd->dev->adev,
> > +                                           false,
> > +                                           pdd->dev-
> >vm_info.last_vmid_kfd);
> > +
> > +                           if (!pdd->dev-
> >shared_resources.enable_mes)
> > +                                   debug_refresh_runlist(pdd->dev-
> >dqm);
> > +                           else
> > +                                   kfd_dbg_set_mes_debug_mode(pdd);
>
> Do we really need to update the runlist here? When the runtime gets
> enabled, there are no queues yet for the process. So there should be no
> change to the runlist until the process creates its first queue.

Acked.  Yeah this seems unnecessary and can be dropped.
>
>
> > +                   }
> > +           }
> > +   }
> > +
> > +retry:
> > +   if (p->debug_trap_enabled) {
> > +           if (!p->is_runtime_retry) {
> > +                   kfd_dbg_trap_activate(p);
> > +
>       kfd_dbg_ev_raise(KFD_EC_MASK(EC_PROCESS_RUNTIME),
> > +                                   p, NULL, 0, false, NULL, 0);
> > +           }
> > +
> > +           mutex_unlock(&p->mutex);
> > +           ret = down_interruptible(&p->runtime_enable_sema);
> > +           mutex_lock(&p->mutex);
> > +
> > +           p->is_runtime_retry = !!ret;
> > +   }
> > +
> > +   return ret;
> > +}
> > +
> > +static int runtime_disable(struct kfd_process *p)
> > +{
> > +   int i = 0, ret;
> > +   bool was_enabled = p->runtime_info.runtime_state ==
> DEBUG_RUNTIME_STATE_ENABLED;
> > +
> > +   p->runtime_info.runtime_state =
> DEBUG_RUNTIME_STATE_DISABLED;
> > +   p->runtime_info.r_debug = 0;
> > +
> > +   if (p->debug_trap_enabled) {
> > +           if (was_enabled)
> > +                   kfd_dbg_trap_deactivate(p, false, 0);
>
> Does this call kfd_dbg_trap_deactivate multiple times on retry? Is that
> a problem?

I don't think dbg_trap_deactivate gets called again on retry.  Prior to the deactivate call, the per-process runtime state gets saved as DEBUG_RUNTIME_STATE_DISABLED and early returns on a down_interruptible error.  If the caller retries on the error, this means was_enabled is false so we never deactivate again.

Thanks,

Jon

>
> Regards,
>    Felix
>
>
> > +
> > +           if (!p->is_runtime_retry)
> > +
>       kfd_dbg_ev_raise(KFD_EC_MASK(EC_PROCESS_RUNTIME),
> > +                                   p, NULL, 0, false, NULL, 0);
> > +
> > +           mutex_unlock(&p->mutex);
> > +           ret = down_interruptible(&p->runtime_enable_sema);
> > +           mutex_lock(&p->mutex);
> > +
> > +           p->is_runtime_retry = !!ret;
> > +           if (ret)
> > +                   return ret;
> > +   }
> > +
> > +   if (was_enabled && p->runtime_info.ttmp_setup) {
> > +           for (i = 0; i < p->n_pdds; i++) {
> > +                   struct kfd_process_device *pdd = p->pdds[i];
> > +
> > +                   if (!kfd_dbg_is_rlc_restore_supported(pdd->dev))
> > +                           amdgpu_gfx_off_ctrl(pdd->dev->adev, true);
> > +           }
> > +   }
> > +
> > +   p->runtime_info.ttmp_setup = false;
> > +
> > +   /* disable DISPATCH_PTR save */
> > +   for (i = 0; i < p->n_pdds; i++) {
> > +           struct kfd_process_device *pdd = p->pdds[i];
> > +
> > +           if (kfd_dbg_is_per_vmid_supported(pdd->dev)) {
> > +                   pdd->spi_dbg_override =
> > +                                   pdd->dev->kfd2kgd-
> >disable_debug_trap(
> > +                                   pdd->dev->adev,
> > +                                   false,
> > +                                   pdd->dev->vm_info.last_vmid_kfd);
> > +
> > +                   if (!pdd->dev->shared_resources.enable_mes)
> > +                           debug_refresh_runlist(pdd->dev->dqm);
> > +                   else
> > +                           kfd_dbg_set_mes_debug_mode(pdd);
> > +           }
> > +   }
> > +
> >     return 0;
> >   }
> >
> > +static int kfd_ioctl_runtime_enable(struct file *filep, struct kfd_process
> *p, void *data)
> > +{
> > +   struct kfd_ioctl_runtime_enable_args *args = data;
> > +   int r;
> > +
> > +   mutex_lock(&p->mutex);
> > +
> > +   if (args->mode_mask &
> KFD_RUNTIME_ENABLE_MODE_ENABLE_MASK)
> > +           r = runtime_enable(p, args->r_debug,
> > +                           !!(args->mode_mask &
> KFD_RUNTIME_ENABLE_MODE_TTMP_SAVE_MASK));
> > +   else
> > +           r = runtime_disable(p);
> > +
> > +   mutex_unlock(&p->mutex);
> > +
> > +   return r;
> > +}
> > +
> >   static int kfd_ioctl_set_debug_trap(struct file *filep, struct kfd_process
> *p, void *data)
> >   {
> >     struct kfd_ioctl_dbg_trap_args *args = data;
> > @@ -2720,6 +2856,18 @@ static int kfd_ioctl_set_debug_trap(struct file
> *filep, struct kfd_process *p, v
> >             goto unlock_out;
> >     }
> >
> > +   if (target->runtime_info.runtime_state !=
> DEBUG_RUNTIME_STATE_ENABLED &&
> > +                   (args->op ==
> KFD_IOC_DBG_TRAP_SET_WAVE_LAUNCH_OVERRIDE ||
> > +                    args->op ==
> KFD_IOC_DBG_TRAP_SET_WAVE_LAUNCH_MODE ||
> > +                    args->op == KFD_IOC_DBG_TRAP_SUSPEND_QUEUES
> ||
> > +                    args->op == KFD_IOC_DBG_TRAP_RESUME_QUEUES
> ||
> > +                    args->op ==
> KFD_IOC_DBG_TRAP_SET_NODE_ADDRESS_WATCH ||
> > +                    args->op ==
> KFD_IOC_DBG_TRAP_CLEAR_NODE_ADDRESS_WATCH ||
> > +                    args->op == KFD_IOC_DBG_TRAP_SET_FLAGS)) {
> > +           r = -EPERM;
> > +           goto unlock_out;
> > +   }
> > +
> >     switch (args->op) {
> >     case KFD_IOC_DBG_TRAP_ENABLE:
> >             if (target != p)
> > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
> > index 4174b479ea6f..47f8425a0db3 100644
> > --- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
> > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.c
> > @@ -220,7 +220,7 @@ static int kfd_dbg_set_workaround(struct
> kfd_process *target, bool enable)
> >     return r;
> >   }
> >
> > -static int kfd_dbg_set_mes_debug_mode(struct kfd_process_device *pdd)
> > +int kfd_dbg_set_mes_debug_mode(struct kfd_process_device *pdd)
> >   {
> >     uint32_t spi_dbg_cntl = pdd->spi_dbg_override | pdd-
> >spi_dbg_launch_mode;
> >     uint32_t flags = pdd->process->dbg_flags;
> > @@ -240,7 +240,7 @@ static int kfd_dbg_set_mes_debug_mode(struct
> kfd_process_device *pdd)
> >    *                                to unwind
> >    *                else: ignored
> >    */
> > -static void kfd_dbg_trap_deactivate(struct kfd_process *target, bool
> unwind, int unwind_count)
> > +void kfd_dbg_trap_deactivate(struct kfd_process *target, bool unwind,
> int unwind_count)
> >   {
> >     int i, count = 0;
> >
> > @@ -311,7 +311,7 @@ int kfd_dbg_trap_disable(struct kfd_process
> *target)
> >     return 0;
> >   }
> >
> > -static int kfd_dbg_trap_activate(struct kfd_process *target)
> > +int kfd_dbg_trap_activate(struct kfd_process *target)
> >   {
> >     int i, r = 0, unwind_count = 0;
> >
> > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.h
> b/drivers/gpu/drm/amd/amdkfd/kfd_debug.h
> > index fefb9dc5cf69..22707f7a2368 100644
> > --- a/drivers/gpu/drm/amd/amdkfd/kfd_debug.h
> > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_debug.h
> > @@ -28,6 +28,8 @@
> >   void kgd_gfx_v9_set_wave_launch_stall(struct amdgpu_device *adev,
> >                                     uint32_t vmid,
> >                                     bool stall);
> > +void kfd_dbg_trap_deactivate(struct kfd_process *target, bool unwind,
> int unwind_count);
> > +int kfd_dbg_trap_activate(struct kfd_process *target);
> >   bool kfd_dbg_ev_raise(uint64_t event_mask,
> >                     struct kfd_process *process, struct kfd_dev *dev,
> >                     unsigned int source_id, bool use_worker,
> > @@ -80,4 +82,6 @@ static inline bool kfd_dbg_has_gws_support(struct
> kfd_dev *dev)
> >     /* Assume debugging and cooperative launch supported otherwise.
> */
> >     return true;
> >   }
> > +
> > +int kfd_dbg_set_mes_debug_mode(struct kfd_process_device *pdd);
> >   #endif
> > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> > index 4cb433a21e3d..63c59ad2a4ca 100644
> > --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> > @@ -946,6 +946,7 @@ struct kfd_process {
> >
> >     /* Tracks runtime enable status */
> >     struct semaphore runtime_enable_sema;
> > +   bool is_runtime_retry;
> >     struct kfd_runtime_info runtime_info;
> >
> >   };