[Intel-gfx] [PATCH v5 02/20] drm/msm: Fix drm/sched point of no return rules
Daniel Vetter
daniel.vetter at ffwll.ch
Fri Aug 6 16:41:56 UTC 2021
On Fri, Aug 6, 2021 at 12:58 AM Rob Clark <robdclark at gmail.com> wrote:
>
> On Thu, Aug 5, 2021 at 3:47 AM Daniel Vetter <daniel.vetter at ffwll.ch> wrote:
> >
> > Originally drm_sched_job_init was the point of no return, after which
> > drivers must submit a job. I've split that up, which allows us to fix
> > this issue pretty easily.
> >
> > Only thing we have to take care of is to not skip to error paths after
> > that. Other drivers do this the same for out-fence and similar things.
> >
> > Fixes: 1d8a5ca436ee ("drm/msm: Conversion to drm scheduler")
> > Cc: Rob Clark <robdclark at chromium.org>
> > Cc: Rob Clark <robdclark at gmail.com>
> > Cc: Sean Paul <sean at poorly.run>
> > Cc: Sumit Semwal <sumit.semwal at linaro.org>
> > Cc: "Christian König" <christian.koenig at amd.com>
> > Cc: linux-arm-msm at vger.kernel.org
> > Cc: dri-devel at lists.freedesktop.org
> > Cc: freedreno at lists.freedesktop.org
> > Cc: linux-media at vger.kernel.org
> > Cc: linaro-mm-sig at lists.linaro.org
> > Signed-off-by: Daniel Vetter <daniel.vetter at intel.com>
> > ---
> > drivers/gpu/drm/msm/msm_gem_submit.c | 15 +++++++--------
> > 1 file changed, 7 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
> > index 6d6c44f0e1f3..d0ed4ddc509e 100644
> > --- a/drivers/gpu/drm/msm/msm_gem_submit.c
> > +++ b/drivers/gpu/drm/msm/msm_gem_submit.c
> > @@ -52,9 +52,6 @@ static struct msm_gem_submit *submit_create(struct drm_device *dev,
> > return ERR_PTR(ret);
> > }
> >
> > - /* FIXME: this is way too early */
> > - drm_sched_job_arm(&job->base);
> > -
> > xa_init_flags(&submit->deps, XA_FLAGS_ALLOC);
> >
> > kref_init(&submit->ref);
> > @@ -883,6 +880,9 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
> >
> > submit->user_fence = dma_fence_get(&submit->base.s_fence->finished);
> >
> > + /* point of no return, we _have_ to submit no matter what */
> > + drm_sched_job_arm(&submit->base);
> > +
> > /*
> > * Allocate an id which can be used by WAIT_FENCE ioctl to map back
> > * to the underlying fence.
> > @@ -892,17 +892,16 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
> > if (submit->fence_id < 0) {
> > ret = submit->fence_id = 0;
> > submit->fence_id = 0;
> > - goto out;
> > }
> >
> > - if (args->flags & MSM_SUBMIT_FENCE_FD_OUT) {
> > + if (ret == 0 && args->flags & MSM_SUBMIT_FENCE_FD_OUT) {
> > struct sync_file *sync_file = sync_file_create(submit->user_fence);
> > if (!sync_file) {
> > ret = -ENOMEM;
> > - goto out;
> > + } else {
> > + fd_install(out_fence_fd, sync_file->file);
> > + args->fence_fd = out_fence_fd;
> > }
> > - fd_install(out_fence_fd, sync_file->file);
> > - args->fence_fd = out_fence_fd;
>
> I wonder if instead we should (approximately) undo "drm/msm/submit:
> Simplify out-fence-fd handling" so that the point that it could fail
> is moved up ahead of the drm_sched_job_arm()?
Hm yeah. Up to you how you want to paint this shed, I think either is fine.
> Also, does the dma_fence_get() work before drm_sched_job_arm()? From
> a quick look, it looks like it won't, but I'm still playing catchup
> and haven't had a chance to look at your entire series. If it doesn't
> work before drm_sched_job_arm(), then there is really no way to
> prevent a error path between the fence-init and job-submit.
Yes. I thought I've checked that I put the _arm() in the right spot,
but I guess I screwed up and you need the fence before the point where
I've put the job_arm()? And yes the error path cannot be avoided for
out-fences, that's what I tried to explain in the commit message.
> But, prior to your series, wouldn't a failure after
> drm_sched_job_init() but before the job is submitted just burn a
> fence-id, and otherwise carry on it's merry way?
Maybe? I'm not sure whether the scheduler gets confused about the gap
and freak out abou that. I'm fairly new to that code and learning
(which is part why I'm working on it). Since you look up in
fences/syncobj after job_init() it should be pretty easy to whip up a
testcase and see what happens. Also as long as nothing fails you won't
see an issue, that's for sure.
-Daniel
> BR,
> -R
>
> > }
> >
> > submit_attach_object_fences(submit);
> > --
> > 2.32.0
> >
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
More information about the Intel-gfx
mailing list