[PATCH v2] drm/amdgpu: add workarounds for Raven VCN TMZ issue

Tue Mar 8 15:14:54 UTC 2022

On Tue, Mar 8, 2022 at 4:16 AM Christian König <christian.koenig at amd.com> wrote:
>
> Am 08.03.22 um 09:06 schrieb Lang Yu:
> > On 03/08/ , Christian König wrote:
> >> Am 08.03.22 um 08:33 schrieb Lang Yu:
> >>> On 03/08/ , Christian König wrote:
> >>>> Am 08.03.22 um 04:39 schrieb Lang Yu:
> >>>>> It is a hardware issue that VCN can't handle a GTT
> >>>>> backing stored TMZ buffer on Raven.
> >>>>>
> >>>>> Move such a TMZ buffer to VRAM domain before command
> >>>>> submission.
> >>>>>
> >>>>> v2:
> >>>>>     - Use patch_cs_in_place callback.
> >>>>>
> >>>>> Suggested-by: Christian König <christian.koenig at amd.com>
> >>>>> Signed-off-by: Lang Yu <Lang.Yu at amd.com>
> >>>>> ---
> >>>>>     drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 68 +++++++++++++++++++++++++++
> >>>>>     1 file changed, 68 insertions(+)
> >>>>>
> >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> >>>>> index 7bbb9ba6b80b..810932abd3af 100644
> >>>>> --- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> >>>>> @@ -24,6 +24,7 @@
> >>>>>     #include <linux/firmware.h>
> >>>>>     #include "amdgpu.h"
> >>>>> +#include "amdgpu_cs.h"
> >>>>>     #include "amdgpu_vcn.h"
> >>>>>     #include "amdgpu_pm.h"
> >>>>>     #include "soc15.h"
> >>>>> @@ -1905,6 +1906,72 @@ static const struct amd_ip_funcs vcn_v1_0_ip_funcs = {
> >>>>>           .set_powergating_state = vcn_v1_0_set_powergating_state,
> >>>>>     };
> >>>>> +/**
> >>>>> + * It is a hardware issue that Raven VCN can't handle a GTT TMZ buffer.
> >>>>> + * Move such a GTT TMZ buffer to VRAM domain before command submission.
> >>>>> + */
> >>>>> +static int vcn_v1_0_validate_bo(struct amdgpu_cs_parser *parser,
> >>>>> +                         struct amdgpu_job *job,
> >>>>> +                         uint64_t addr)
> >>>>> +{
> >>>>> + struct ttm_operation_ctx ctx = { false, false };
> >>>>> + struct amdgpu_fpriv *fpriv = parser->filp->driver_priv;
> >>>>> + struct amdgpu_vm *vm = &fpriv->vm;
> >>>>> + struct amdgpu_bo_va_mapping *mapping;
> >>>>> + struct amdgpu_bo *bo;
> >>>>> + int r;
> >>>>> +
> >>>>> + addr &= AMDGPU_GMC_HOLE_MASK;
> >>>>> + if (addr & 0x7) {
> >>>>> +         DRM_ERROR("VCN messages must be 8 byte aligned!\n");
> >>>>> +         return -EINVAL;
> >>>>> + }
> >>>>> +
> >>>>> + mapping = amdgpu_vm_bo_lookup_mapping(vm, addr/AMDGPU_GPU_PAGE_SIZE);
> >>>>> + if (!mapping || !mapping->bo_va || !mapping->bo_va->base.bo)
> >>>>> +         return -EINVAL;
> >>>>> +
> >>>>> + bo = mapping->bo_va->base.bo;
> >>>>> + if (!(bo->flags & AMDGPU_GEM_CREATE_ENCRYPTED))
> >>>>> +         return 0;
> >>>>> +
> >>>>> + amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_VRAM);
> >>>>> + r = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
> >>>>> + if (r) {
> >>>>> +         DRM_ERROR("Failed validating the VCN message BO (%d)!\n", r);
> >>>>> +         return r;
> >>>>> + }
> >>>> Well, exactly that won't work.
> >>>>
> >>>> The message structure isn't TMZ protected because otherwise the driver won't
> >>>> be able to stitch it together.
> >>>>
> >>>> What is TMZ protected are the surfaces the message structure is pointing to.
> >>>> So what you would need to do is to completely parse the structure and then
> >>>> move on the relevant buffers into VRAM.
> >>>>
> >>>> Leo or James, can you help with that?
> >>>   From my observations, when decoding secure contents, register
> >>> GPCOM_VCPU_DATA0 and GPCOM_VCPU_DATA1 are set to a TMZ buffer address.
> >>> And this way works when allocating TMZ buffers in GTT domain.
> >> As far as I remember that's only the case for the decoding, encoding works
> >> by putting the addresses into the message buffer.
> >>
> >> But could be that decoding is sufficient, Leo and James need to comment on
> >> this.
> > It seems that only decode needs TMZ buffers. Only observe si_vid_create_tmz_buffer()
> > was called in rvcn_dec_message_decode() in src/gallium/drivers/radeon/radeon_vcn_dec.c.
>
> Mhm, good point. Let's wait for Leo and James to wake up, when we don't
> need encode support than that would makes things much easier.

If we don't support encrypted encode, we should add a check to prevent
it like we do for compute.

Alex

>
> Regards,
> Christian.
>
> >
> > Regards,
> > Lang
> >
> >> Regards,
> >> Christian.
> >>
> >>> Regards,
> >>> Lang
> >>>
> >>>> Regards,
> >>>> Christian.
> >>>>
> >>>>> +
> >>>>> + return r;
> >>>>> +}
> >>>>> +
> >>>>> +static int vcn_v1_0_ring_patch_cs_in_place(struct amdgpu_cs_parser *p,
> >>>>> +                                    struct amdgpu_job *job,
> >>>>> +                                    struct amdgpu_ib *ib)
> >>>>> +{
> >>>>> + uint32_t msg_lo = 0, msg_hi = 0;
> >>>>> + int i, r;
> >>>>> +
> >>>>> + for (i = 0; i < ib->length_dw; i += 2) {
> >>>>> +         uint32_t reg = amdgpu_ib_get_value(ib, i);
> >>>>> +         uint32_t val = amdgpu_ib_get_value(ib, i + 1);
> >>>>> +
> >>>>> +         if (reg == PACKET0(p->adev->vcn.internal.data0, 0)) {
> >>>>> +                 msg_lo = val;
> >>>>> +         } else if (reg == PACKET0(p->adev->vcn.internal.data1, 0)) {
> >>>>> +                 msg_hi = val;
> >>>>> +         } else if (reg == PACKET0(p->adev->vcn.internal.cmd, 0)) {
> >>>>> +                 r = vcn_v1_0_validate_bo(p, job,
> >>>>> +                                      ((u64)msg_hi) << 32 | msg_lo);
> >>>>> +                 if (r)
> >>>>> +                         return r;
> >>>>> +         }
> >>>>> + }
> >>>>> +
> >>>>> + return 0;
> >>>>> +}
> >>>>> +
> >>>>> +
> >>>>>     static const struct amdgpu_ring_funcs vcn_v1_0_dec_ring_vm_funcs = {
> >>>>>           .type = AMDGPU_RING_TYPE_VCN_DEC,
> >>>>>           .align_mask = 0xf,
> >>>>> @@ -1914,6 +1981,7 @@ static const struct amdgpu_ring_funcs vcn_v1_0_dec_ring_vm_funcs = {
> >>>>>           .get_rptr = vcn_v1_0_dec_ring_get_rptr,
> >>>>>           .get_wptr = vcn_v1_0_dec_ring_get_wptr,
> >>>>>           .set_wptr = vcn_v1_0_dec_ring_set_wptr,
> >>>>> + .patch_cs_in_place = vcn_v1_0_ring_patch_cs_in_place,
> >>>>>           .emit_frame_size =
> >>>>>                   6 + 6 + /* hdp invalidate / flush */
> >>>>>                   SOC15_FLUSH_GPU_TLB_NUM_WREG * 6 +
>