[PATCH 1/1] drm/amdgpu: Fix per-IB secure flag GFX hang
Alex Deucher
alexdeucher at gmail.com
Thu Feb 27 21:30:09 UTC 2020
On Thu, Feb 27, 2020 at 4:15 PM Luben Tuikov <luben.tuikov at amd.com> wrote:
>
> On 2020-02-27 6:56 a.m., Huang Rui wrote:
> > On Thu, Feb 27, 2020 at 06:39:03AM +0800, Tuikov, Luben wrote:
> >> Since commit "Move to a per-IB secure flag (TMZ)",
> >> we've been seeing hangs in GFX. Ray H. pointed out
> >> by sending a patch that we need to send FRAME
> >> CONTROL stop/start back-to-back, every time we
> >> flip the TMZ flag as per each IB we submit. That
> >> is, when we transition from TMZ to non-TMZ we have
> >> to send a stop with TMZ followed by a start with
> >> non-TMZ, and similarly for transitioning from
> >> non-TMZ into TMZ.
> >>
> >> This patch implements this, thus fixing the GFX
> >> hang.
> >>
> >> Signed-off-by: Luben Tuikov <luben.tuikov at amd.com>
> >> ---
> >> drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 87 +++++++++++++++++-------
> >> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 5 +-
> >> drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 15 ++--
> >> drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 13 ++--
> >> 4 files changed, 79 insertions(+), 41 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> >> index 4b2342d11520..16d6df3304d3 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> >> @@ -216,40 +216,75 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs,
> >> amdgpu_ring_emit_cntxcntl(ring, status);
> >> }
> >>
> >> - secure = false;
> >> + /* Find the first non-preamble IB.
> >> + */
> >> for (i = 0; i < num_ibs; ++i) {
> >> ib = &ibs[i];
> >>
> >> /* drop preamble IBs if we don't have a context switch */
> >> - if ((ib->flags & AMDGPU_IB_FLAG_PREAMBLE) &&
> >> - skip_preamble &&
> >> - !(status & AMDGPU_PREAMBLE_IB_PRESENT_FIRST) &&
> >> - !amdgpu_mcbp &&
> >> - !amdgpu_sriov_vf(adev)) /* for SRIOV preemption, Preamble CE ib must be inserted anyway */
> >> - continue;
> >> -
> >> - /* If this IB is TMZ, add frame TMZ start packet,
> >> - * else, turn off TMZ.
> >> - */
> >> - if (ib->flags & AMDGPU_IB_FLAGS_SECURE && ring->funcs->emit_tmz) {
> >> - if (!secure) {
> >> - secure = true;
> >> - amdgpu_ring_emit_tmz(ring, true);
> >> - }
> >> - } else if (secure) {
> >> + if (!(ib->flags & AMDGPU_IB_FLAG_PREAMBLE) ||
> >> + !skip_preamble ||
> >> + (status & AMDGPU_PREAMBLE_IB_PRESENT_FIRST) ||
> >> + amdgpu_mcbp ||
> >> + amdgpu_sriov_vf(adev)) /* for SRIOV preemption, Preamble CE ib must be inserted anyway */
> >> + break;
> >> + }
> >> + if (i >= num_ibs)
> >> + goto Done;
> >> + /* Setup initial TMZiness and send it off.
> >> + */
> >> + secure = false;
> >> + if (job && ring->funcs->emit_frame_cntl) {
> >> + if (ib->flags & AMDGPU_IB_FLAGS_SECURE)
> >> + secure = true;
> >> + else
> >> secure = false;
> >> - amdgpu_ring_emit_tmz(ring, false);
> >> - }
> >> -
> >> - amdgpu_ring_emit_ib(ring, job, ib, status);
> >> - status &= ~AMDGPU_HAVE_CTX_SWITCH;
> >> + amdgpu_ring_emit_frame_cntl(ring, true, secure);
> >> }
> >> + amdgpu_ring_emit_ib(ring, job, ib, status);
> >> + status &= ~AMDGPU_HAVE_CTX_SWITCH;
> >> + i += 1;
> >> + /* Send the rest of the IBs.
> >> + */
> >> + if (job && ring->funcs->emit_frame_cntl) {
> >> + for ( ; i < num_ibs; ++i) {
> >> + ib = &ibs[i];
> >> +
> >> + /* drop preamble IBs if we don't have a context switch */
> >> + if ((ib->flags & AMDGPU_IB_FLAG_PREAMBLE) &&
> >> + skip_preamble &&
> >> + !(status & AMDGPU_PREAMBLE_IB_PRESENT_FIRST) &&
> >> + !amdgpu_mcbp &&
> >> + !amdgpu_sriov_vf(adev)) /* for SRIOV preemption, Preamble CE ib must be inserted anyway */
> >> + continue;
> >
> > Snip.
> >
> >> +
> >> + if (!!secure ^ !!(ib->flags & AMDGPU_IB_FLAGS_SECURE)) {
> >> + amdgpu_ring_emit_frame_cntl(ring, false, secure);
> >> + secure = !secure;
> >> + amdgpu_ring_emit_frame_cntl(ring, true, secure);
> >> + }
> >
> > That's pretty good optimization! I spend quit a few time to understand this.
>
> I know. I know you did. It's called experience.
>
> When I saw you v1, it was a cringe. Seriously?
It may be a good optimization but if it's hard to understand it makes
the code harder to read and comprehend, that can lead to
misinterpretation and maintenance burdens. It took me a while to
understand what was intended here.
Alex
>
> >
> >>
> >> - if (secure) {
> >> - secure = false;
> >> - amdgpu_ring_emit_tmz(ring, false);
> >> + amdgpu_ring_emit_ib(ring, job, ib, status);
> >> + status &= ~AMDGPU_HAVE_CTX_SWITCH;
> >> + }
> >> + amdgpu_ring_emit_frame_cntl(ring, false, secure);
> >> + } else {
> >> + for ( ; i < num_ibs; ++i) {
> >> + ib = &ibs[i];
> >> +
> >> + /* drop preamble IBs if we don't have a context switch */
> >> + if ((ib->flags & AMDGPU_IB_FLAG_PREAMBLE) &&
> >> + skip_preamble &&
> >> + !(status & AMDGPU_PREAMBLE_IB_PRESENT_FIRST) &&
> >> + !amdgpu_mcbp &&
> >> + !amdgpu_sriov_vf(adev)) /* for SRIOV preemption, Preamble CE ib must be inserted anyway */
> >> + continue;
> >> +
> >> + amdgpu_ring_emit_ib(ring, job, ib, status);
> >> + status &= ~AMDGPU_HAVE_CTX_SWITCH;
> >
> > To pull the checking "job && ring->funcs->emit_frame_cntl" out of the loop,
> > that make the implemenation more duplicated like below, we have to write
>
> Yes, that is exactly what we want.
>
> As I explained before and will explain again, and again, and again,
> "job && ring->funcs->emit_frame_cntl" is static to the body of the loop,
> and as such can be pulled OUT of the loop and it should.
>
> This is a *formulaic* optimization exercise. Compilers and optimizers do it first.
>
> To extrapolate, consider that what you'd eventually have is a HUGE long long loop
> and everything inside it. Not good.
>
> Regards,
> Luben
>
>
> > the same codes multiple times. I hope the implementation is more simple and
> > readable. Please see my V2 patch that I just send out. We can try to use
> > minimum changes to fix the issue.
> >
> > for ( ; i < num_ibs; ++i) {
> > ib = &ibs[i];
> >
> > /* drop preamble IBs if we don't have a context switch */
> > if ((ib->flags & AMDGPU_IB_FLAG_PREAMBLE) &&
> > skip_preamble &&
> > !(status & AMDGPU_PREAMBLE_IB_PRESENT_FIRST) &&
> > !amdgpu_mcbp &&
> > !amdgpu_sriov_vf(adev)) /* for SRIOV preemption, Preamble CE ib must be inserted anyway */
> > continue;
> >
> > ...
> > amdgpu_ring_emit_ib(ring, job, ib, status);
> >
> > Thanks,
> > Ray
> >
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
More information about the amd-gfx
mailing list