[PATCH 1/2] drm/amdgpu: make sure to init common IP before gmc

Thu Sep 8 14:21:10 UTC 2022

On Thu, Sep 8, 2022 at 1:11 AM Lazar, Lijo <lijo.lazar at amd.com> wrote:
>
>
>
> On 9/8/2022 9:38 AM, Alex Deucher wrote:
> > Common is mainly golden register setting and HDP register
> > remapping, it shouldn't allocate any GPU memory.  Make sure
> > common happens before gmc so that the HDP registers are
> > remapped before gmc attempts to access them.
> >
> > This fixes the Unsupported Request error reported through
> > AER during driver load. The error happens as a write happens
> > to the remap offset before real remapping is done.
> >
> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=216373
> >
> > The error was unnoticed before and got visible because of the commit
> > referenced below. This doesn't fix anything in the commit below, rather
> > fixes the issue in amdgpu exposed by the commit. The reference is only
> > to associate this commit with below one so that both go together.
> >
> > Fixes: 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()")
> >
> > Signed-off-by: Alex Deucher <alexander.deucher at amd.com>
>
> Series is:
>         Reviewed-by: Lijo Lazar <lijo.lazar at amd.com>

@tseewald at gmail.com it would be good if you could verify that this
patch fixes the issue for you as well.

Thanks,

Alex

>
> Thanks,
> Lijo
>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 14 +++++++++++---
> >   1 file changed, 11 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index 899564ea8b4b..4da85ce9e3b1 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -2375,8 +2375,16 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
> >               }
> >               adev->ip_blocks[i].status.sw = true;
> >
> > -             /* need to do gmc hw init early so we can allocate gpu mem */
> > -             if (adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_GMC) {
> > +             if (adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_COMMON) {
> > +                     /* need to do common hw init early so everything is set up for gmc */
> > +                     r = adev->ip_blocks[i].version->funcs->hw_init((void *)adev);
> > +                     if (r) {
> > +                             DRM_ERROR("hw_init %d failed %d\n", i, r);
> > +                             goto init_failed;
> > +                     }
> > +                     adev->ip_blocks[i].status.hw = true;
> > +             } else if (adev->ip_blocks[i].version->type == AMD_IP_BLOCK_TYPE_GMC) {
> > +                     /* need to do gmc hw init early so we can allocate gpu mem */
> >                       /* Try to reserve bad pages early */
> >                       if (amdgpu_sriov_vf(adev))
> >                               amdgpu_virt_exchange_data(adev);
> > @@ -3062,8 +3070,8 @@ static int amdgpu_device_ip_reinit_early_sriov(struct amdgpu_device *adev)
> >       int i, r;
> >
> >       static enum amd_ip_block_type ip_order[] = {
> > -             AMD_IP_BLOCK_TYPE_GMC,
> >               AMD_IP_BLOCK_TYPE_COMMON,
> > +             AMD_IP_BLOCK_TYPE_GMC,
> >               AMD_IP_BLOCK_TYPE_PSP,
> >               AMD_IP_BLOCK_TYPE_IH,
> >       };
> >