[pull] amdgpu, radeon, ttm, sched drm-next-5.13

Alex Deucher alexdeucher at gmail.com
Thu Apr 8 13:03:40 UTC 2021


On Thu, Apr 8, 2021 at 6:28 AM Christian König
<ckoenig.leichtzumerken at gmail.com> wrote:
>
> Am 08.04.21 um 09:13 schrieb Christian König:
> > Am 07.04.21 um 21:04 schrieb Alex Deucher:
> >> On Wed, Apr 7, 2021 at 3:23 AM Dave Airlie <airlied at gmail.com> wrote:
> >>> On Wed, 7 Apr 2021 at 06:54, Alex Deucher <alexdeucher at gmail.com>
> >>> wrote:
> >>>> On Fri, Apr 2, 2021 at 12:22 PM Christian König
> >>>> <ckoenig.leichtzumerken at gmail.com> wrote:
> >>>>> Hey Alex,
> >>>>>
> >>>>> the TTM and scheduler changes should already be in the drm-misc-next
> >>>>> branch (not 100% sure about the TTM patch, need to double check
> >>>>> next week).
> >>>>>
> >>>> The TTM change is not in drm-misc yet.
> >>>>
> >>>>> Could that cause problems when both are merged into drm-next?
> >>>> Dave, Daniel, how do you want to handle this?  The duplicated patch
> >>>> is this one:
> >>>> https://cgit.freedesktop.org/drm/drm-misc/commit/?id=ac4eb83ab255de9c31184df51fd1534ba36fd212
> >>>>
> >>>> amdgpu has changes which depend on it.  The same patch is included
> >>>> in this PR.
> >>> Ouch not sure how best to sync up here, maybe get misc-next into my
> >>> tree then rebase your tree on top of it?
> >> I can do that.
> >
> > Please let me double check later today that we have everything we need
> > in drm-misc-next.
>
> There where two patch for TTM (one from Felix and one from Oak) which
> still needed to be pushed to drm-misc-next. I've done that just a minute
> ago.
>

They were included in this PR.

>
> Then we have this patch which fixes a bug in code removed on
> drm-misc-next. I think it should be dropped when amd-staging-drm-next is
> based on drm-next/drm-misc-next.
>
> Author: xinhui pan <xinhui.pan at amd.com>
> Date:   Wed Feb 24 11:28:08 2021 +0800
>
>      drm/ttm: Do not add non-system domain BO into swap list
>

Ok.

>
> I've also found the following patch which is problematic as well:
>
> commit c8a921d49443025e10794342d4433b3f29616409
> Author: Jack Zhang <Jack.Zhang1 at amd.com>
> Date:   Mon Mar 8 12:41:27 2021 +0800
>
>      drm/amd/amdgpu implement tdr advanced mode
>
>      [Why]
>      Previous tdr design treats the first job in job_timeout as the bad job.
>      But sometimes a later bad compute job can block a good gfx job and
>      cause an unexpected gfx job timeout because gfx and compute ring share
>      internal GC HW mutually.
>
>      [How]
>      This patch implements an advanced tdr mode.It involves an additinal
>      synchronous pre-resubmit step(Step0 Resubmit) before normal resubmit
>      step in order to find the real bad job.
>
>      1. At Step0 Resubmit stage, it synchronously submits and pends for the
>      first job being signaled. If it gets timeout, we identify it as guilty
>      and do hw reset. After that, we would do the normal resubmit step to
>      resubmit left jobs.
>
>      2. For whole gpu reset(vram lost), do resubmit as the old way.
>
>      Signed-off-by: Jack Zhang <Jack.Zhang1 at amd.com>
>      Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky at amd.com>
>
> That one is modifying both amdgpu as well as the scheduler code. IIRC I
> actually requested that the patch is split into two, but that was
> somehow not done.
>
> How should we proceed here? Should I separate the patch, push the
> changes to drm-misc-next and then we merge with drm-next and rebase
> amd-staging-drm-next on top of that?
>
> That's most likely the cleanest option approach as far as I can see.

That's fine with me.  We could have included them in my PR.  Now we
have wait for drm-misc-next to be merged again before we can merge the
amdgpu code.  Is anyone planning to do another drm-misc merge at this
point?

Alex

>
> Thanks,
> Christian.
>
> >
> > Regards,
> > Christian.
> >
> >>
> >> Alex
> >>
> >>
> >>> Dave.
> >
>


More information about the amd-gfx mailing list