[PATCH] drm/amdgpu: keep job->vm in amdgpu_job_prepare_job
Alex Deucher
alexdeucher at gmail.com
Tue Jul 29 13:08:29 UTC 2025
On Mon, Jul 28, 2025 at 10:41 PM YuanShang Mao (River)
<YuanShang.Mao at amd.com> wrote:
>
> [AMD Official Use Only - AMD Internal Distribution Only]
>
> Hi Alexander
>
> > I didn't think we actually resubmitting jobs anymore. How are we ending up trying to resubmit in the first place?
>
> It is not about resubmitting. amdgpu_vm_generation will check if the VM is valid for this job to use. For example, if a gfx job depends on an sdma job, which is already cancelled, then the gfx job should be skipped.
Ah, that makes sense. Can you include that in the patch description?
With that included, the patch is
Reviewed-by: Alex Deucher <alexander.deucher at amd.com>
> Perhaps the dependencies between tasks should be resolved by the drm instead of amdgpu.
If we can do that or check the dependencies via the job itself that
would be better in the long term.
Alex
>
> uint64_t amdgpu_vm_generation(struct amdgpu_device *adev, struct amdgpu_vm *vm)
> {
> uint64_t result = (u64)atomic_read(&adev->vram_lost_counter) << 32;
>
> if (!vm)
> return result;
>
> result += lower_32_bits(vm->generation);
> /* Add one if the page tables will be re-generated on next CS */
> if (drm_sched_entity_error(&vm->delayed))
> ++result;
>
> return result;
> }
>
> Thanks
> River
>
>
> -----Original Message-----
> From: Alex Deucher <alexdeucher at gmail.com>
> Sent: Tuesday, July 29, 2025 2:10 AM
> To: YuanShang Mao (River) <YuanShang.Mao at amd.com>
> Cc: Deucher, Alexander <Alexander.Deucher at amd.com>; Koenig, Christian <Christian.Koenig at amd.com>; amd-gfx at lists.freedesktop.org; cao, lin <lin.cao at amd.com>; Zhang, Tiantian (Celine) <Tiantian.Zhang at amd.com>
> Subject: Re: [PATCH] drm/amdgpu: keep job->vm in amdgpu_job_prepare_job
>
> On Mon, Jul 28, 2025 at 5:01 AM YuanShang Mao (River) <YuanShang.Mao at amd.com> wrote:
> >
> > [AMD Official Use Only - AMD Internal Distribution Only]
> >
> > Hi Alexander
> >
> > Since Christian is on vacation. Could you help review the below patch?
> > If set job->vm to null in amdgpu_job_prepare_job, the job which should be skipped in amdgpu_job_run will be submitted unexpectedly.
>
> I think we can just remove the amdgpu_vm_generation() check in amdgpu_job_run(). I didn't think we actually resubmitting jobs anymore. How are we ending up trying to resubmit in the first place?
>
> Alex
>
> >
> > Thanks
> > River
> >
> >
> > -----Original Message-----
> > From: YuanShang Mao (River) <YuanShang.Mao at amd.com>
> > Sent: Wednesday, July 23, 2025 5:06 PM
> > To: amd-gfx at lists.freedesktop.org
> > Cc: Koenig, Christian <Christian.Koenig at amd.com>; YuanShang Mao
> > (River) <YuanShang.Mao at amd.com>
> > Subject: [PATCH] drm/amdgpu: keep job->vm in amdgpu_job_prepare_job
> >
> > job->vm is used in function amdgpu_job_run to get the page
> > table re-generation counter and decide whether the job should be skipped.
> >
> > Signed-off-by: YuanShang <YuanShang.Mao at amd.com>
> > ---
> > drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 7 -------
> > 1 file changed, 7 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > index 45febdc2f349..18998343815e 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > @@ -360,13 +360,6 @@ amdgpu_job_prepare_job(struct drm_sched_job *sched_job,
> > dev_err(ring->adev->dev, "Error getting VM ID (%d)\n", r);
> > goto error;
> > }
> > - /*
> > - * The VM structure might be released after the VMID is
> > - * assigned, we had multiple problems with people trying to use
> > - * the VM pointer so better set it to NULL.
> > - */
> > - if (!fence)
> > - job->vm = NULL;
> > return fence;
> > }
> >
> > --
> > 2.25.1
> >
More information about the amd-gfx
mailing list