[PATCH v2 1/2] drm/amdgpu: Move HDP remapping earlier during init

Lazar, Lijo lijo.lazar at amd.com
Tue Sep 6 15:56:37 UTC 2022



On 9/6/2022 8:55 PM, Alex Deucher wrote:
> On Mon, Sep 5, 2022 at 1:27 AM Lazar, Lijo <lijo.lazar at amd.com> wrote:
>>
>>
>>
>> On 9/2/2022 1:09 AM, Alex Deucher wrote:
>>> How about this?  We should just move HDP remap to gmc hw_init since
>>> that is mainly what uses it anyway.
>>>
>>
>> Sorry, I missed to R-B the previous version. Did you find any problem
>> when common block is initialized first?
>>
> 
> One of the users on the bug report reported an issue with that patch,
> that said, that user seems to be seeing a slightly different issue
> since he is on a gfx9 card and the original reporter was on gfx10.
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D216373&data=05%7C01%7Clijo.lazar%40amd.com%7C5d6d4da1be4a4194b7cc08da901c0bb4%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637980747398632310%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=hHoh2YpQ2vpjsRTEqI1vIRY4YQmzfFi5%2FG%2Fnh6vNuds%3D&reserved=0
> 

Yes, I see Bjorn already pointed to another potential issue in LTR 
enablement. So regardless of init-common-first patch, the new error will 
be reported and that is unrelated to the original reported error through 
AER. I still think init-common-first patch is good.

Thanks,
Lijo

> Alex
> 
> 
>> I think that patch provides a consistent IP init sequence during cold
>> init and resume.
>>
>> Thanks,
>> Lijo
>>
>>> Alex
>>>
>>> On Tue, Aug 30, 2022 at 2:05 PM Alex Deucher <alexdeucher at gmail.com> wrote:
>>>>
>>>> On Tue, Aug 30, 2022 at 12:06 PM Lazar, Lijo <lijo.lazar at amd.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 8/30/2022 8:39 PM, Alex Deucher wrote:
>>>>>> On Tue, Aug 30, 2022 at 10:45 AM Lazar, Lijo <lijo.lazar at amd.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 8/30/2022 7:18 PM, Alex Deucher wrote:
>>>>>>>> On Tue, Aug 30, 2022 at 12:05 AM Lazar, Lijo <lijo.lazar at amd.com> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 8/29/2022 10:20 PM, Alex Deucher wrote:
>>>>>>>>>> On Mon, Aug 29, 2022 at 4:18 AM Lijo Lazar <lijo.lazar at amd.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> HDP flush is used early in the init sequence as part of memory controller
>>>>>>>>>>> block initialization. Hence remapping of HDP registers needed for flush
>>>>>>>>>>> needs to happen earlier.
>>>>>>>>>>>
>>>>>>>>>>> This also fixes the Unsupported Request error reported through AER during
>>>>>>>>>>> driver load. The error happens as a write happens to the remap offset
>>>>>>>>>>> before real remapping is done.
>>>>>>>>>>>
>>>>>>>>>>> Link: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D216373&data=05%7C01%7Clijo.lazar%40amd.com%7C5d6d4da1be4a4194b7cc08da901c0bb4%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637980747398632310%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=hHoh2YpQ2vpjsRTEqI1vIRY4YQmzfFi5%2FG%2Fnh6vNuds%3D&reserved=0
>>>>>>>>>>>
>>>>>>>>>>> The error was unnoticed before and got visible because of the commit
>>>>>>>>>>> referenced below. This doesn't fix anything in the commit below, rather
>>>>>>>>>>> fixes the issue in amdgpu exposed by the commit. The reference is only
>>>>>>>>>>> to associate this commit with below one so that both go together.
>>>>>>>>>>>
>>>>>>>>>>> Fixes: 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()")
>>>>>>>>>>>
>>>>>>>>>>> Reported-by: Tom Seewald <tseewald at gmail.com>
>>>>>>>>>>> Signed-off-by: Lijo Lazar <lijo.lazar at amd.com>
>>>>>>>>>>> Cc: stable at vger.kernel.org
>>>>>>>>>>
>>>>>>>>>> How about something like the attached patch rather than these two
>>>>>>>>>> patches?  It's a bit bigger but seems cleaner and more defensive in my
>>>>>>>>>> opinion.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Whenever device goes to suspend/reset and then comes back, remap offset
>>>>>>>>> has to be set back to 0 to make sure it doesn't use the wrong offset
>>>>>>>>> when the register assumes default values again.
>>>>>>>>>
>>>>>>>>> To avoid the if-check in hdp_flush (which is more frequent), another way
>>>>>>>>> is to initialize the remap offset to default offset during early init
>>>>>>>>> and hw fini/suspend sequences. It won't be obvious (even with this
>>>>>>>>> patch) as to when remap offset vs default offset is used though.
>>>>>>>>
>>>>>>>> On resume, the common IP is resumed first so it will always be set.
>>>>>>>> The only case that is a problem is init because we init GMC out of
>>>>>>>> order.  We could init common before GMC in amdgpu_device_ip_init().  I
>>>>>>>> think that should be fine, but I wasn't sure if there might be some
>>>>>>>> fallout from that on certain cards.
>>>>>>>>
>>>>>>>
>>>>>>> There are other places where an IP order is forced like in
>>>>>>> amdgpu_device_ip_reinit_early_sriov(). This also may not affect this
>>>>>>> case as remapping is not done for VF.
>>>>>>>
>>>>>>> Agree that a better way is to have the common IP to be inited first.
>>>>>>
>>>>>> How about these patches?
>>>>>>
>>>>>
>>>>> Looks good to me. BTW, is nbio 7.7 for an APU (in which case hdp flush
>>>>> is not expected to be used)?
>>>>
>>>> It would be used in some cases, e.g., GPU VM passthrough where we use
>>>> the BAR rather than the carve out.
>>>>
>>>> Alex
>>>>
>>>>
>>>>>
>>>>> Thanks,
>>>>> Lijo
>>>>>
>>>>>> Alex
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Lijo
>>>>>>>
>>>>>>>> Alex
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Lijo
>>>>>>>>>
>>>>>>>>>> Alex
>>>>>>>>>>
>>>>>>>>>>> ---
>>>>>>>>>>> v2:
>>>>>>>>>>>              Take care of IP resume cases (Alex Deucher)
>>>>>>>>>>>              Add NULL check to nbio.funcs to cover older (GFXv8) ASICs (Felix Kuehling)
>>>>>>>>>>>              Add more details in commit message and associate with AER patch (Bjorn
>>>>>>>>>>> Helgaas)
>>>>>>>>>>>
>>>>>>>>>>>       drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 24 ++++++++++++++++++++++
>>>>>>>>>>>       drivers/gpu/drm/amd/amdgpu/nv.c            |  6 ------
>>>>>>>>>>>       drivers/gpu/drm/amd/amdgpu/soc15.c         |  6 ------
>>>>>>>>>>>       drivers/gpu/drm/amd/amdgpu/soc21.c         |  6 ------
>>>>>>>>>>>       4 files changed, 24 insertions(+), 18 deletions(-)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>>>>>> index ce7d117efdb5..e420118769a5 100644
>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>>>>>> @@ -2334,6 +2334,26 @@ static int amdgpu_device_init_schedulers(struct amdgpu_device *adev)
>>>>>>>>>>>              return 0;
>>>>>>>>>>>       }
>>>>>>>>>>>
>>>>>>>>>>> +/**
>>>>>>>>>>> + * amdgpu_device_prepare_ip - prepare IPs for hardware initialization
>>>>>>>>>>> + *
>>>>>>>>>>> + * @adev: amdgpu_device pointer
>>>>>>>>>>> + *
>>>>>>>>>>> + * Any common hardware initialization sequence that needs to be done before
>>>>>>>>>>> + * hw init of individual IPs is performed here. This is different from the
>>>>>>>>>>> + * 'common block' which initializes a set of IPs.
>>>>>>>>>>> + */
>>>>>>>>>>> +static void amdgpu_device_prepare_ip(struct amdgpu_device *adev)
>>>>>>>>>>> +{
>>>>>>>>>>> +       /* Remap HDP registers to a hole in mmio space, for the purpose
>>>>>>>>>>> +        * of exposing those registers to process space. This needs to be
>>>>>>>>>>> +        * done before hw init of ip blocks to take care of HDP flush
>>>>>>>>>>> +        * operations through registers during hw_init.
>>>>>>>>>>> +        */
>>>>>>>>>>> +       if (adev->nbio.funcs && adev->nbio.funcs->remap_hdp_registers &&
>>>>>>>>>>> +           !amdgpu_sriov_vf(adev))
>>>>>>>>>>> +               adev->nbio.funcs->remap_hdp_registers(adev);
>>>>>>>>>>> +}
>>>>>>>>>>>
>>>>>>>>>>>       /**
>>>>>>>>>>>        * amdgpu_device_ip_init - run init for hardware IPs
>>>>>>>>>>> @@ -2376,6 +2396,8 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
>>>>>>>>>>>                                      DRM_ERROR("amdgpu_vram_scratch_init failed %d\n", r);
>>>>>>>>>>>                                      goto init_failed;
>>>>>>>>>>>                              }
>>>>>>>>>>> +
>>>>>>>>>>> +                       amdgpu_device_prepare_ip(adev);
>>>>>>>>>>>                              r = adev->ip_blocks[i].version->funcs->hw_init((void *)adev);
>>>>>>>>>>>                              if (r) {
>>>>>>>>>>>                                      DRM_ERROR("hw_init %d failed %d\n", i, r);
>>>>>>>>>>> @@ -3058,6 +3080,7 @@ static int amdgpu_device_ip_reinit_early_sriov(struct amdgpu_device *adev)
>>>>>>>>>>>                      AMD_IP_BLOCK_TYPE_IH,
>>>>>>>>>>>              };
>>>>>>>>>>>
>>>>>>>>>>> +       amdgpu_device_prepare_ip(adev);
>>>>>>>>>>>              for (i = 0; i < adev->num_ip_blocks; i++) {
>>>>>>>>>>>                      int j;
>>>>>>>>>>>                      struct amdgpu_ip_block *block;
>>>>>>>>>>> @@ -3139,6 +3162,7 @@ static int amdgpu_device_ip_resume_phase1(struct amdgpu_device *adev)
>>>>>>>>>>>       {
>>>>>>>>>>>              int i, r;
>>>>>>>>>>>
>>>>>>>>>>> +       amdgpu_device_prepare_ip(adev);
>>>>>>>>>>>              for (i = 0; i < adev->num_ip_blocks; i++) {
>>>>>>>>>>>                      if (!adev->ip_blocks[i].status.valid || adev->ip_blocks[i].status.hw)
>>>>>>>>>>>                              continue;
>>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c
>>>>>>>>>>> index b3fba8dea63c..3ac7fef74277 100644
>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/nv.c
>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/nv.c
>>>>>>>>>>> @@ -1032,12 +1032,6 @@ static int nv_common_hw_init(void *handle)
>>>>>>>>>>>              nv_program_aspm(adev);
>>>>>>>>>>>              /* setup nbio registers */
>>>>>>>>>>>              adev->nbio.funcs->init_registers(adev);
>>>>>>>>>>> -       /* remap HDP registers to a hole in mmio space,
>>>>>>>>>>> -        * for the purpose of expose those registers
>>>>>>>>>>> -        * to process space
>>>>>>>>>>> -        */
>>>>>>>>>>> -       if (adev->nbio.funcs->remap_hdp_registers && !amdgpu_sriov_vf(adev))
>>>>>>>>>>> -               adev->nbio.funcs->remap_hdp_registers(adev);
>>>>>>>>>>>              /* enable the doorbell aperture */
>>>>>>>>>>>              nv_enable_doorbell_aperture(adev, true);
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c
>>>>>>>>>>> index fde6154f2009..a0481e37d7cf 100644
>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
>>>>>>>>>>> @@ -1240,12 +1240,6 @@ static int soc15_common_hw_init(void *handle)
>>>>>>>>>>>              soc15_program_aspm(adev);
>>>>>>>>>>>              /* setup nbio registers */
>>>>>>>>>>>              adev->nbio.funcs->init_registers(adev);
>>>>>>>>>>> -       /* remap HDP registers to a hole in mmio space,
>>>>>>>>>>> -        * for the purpose of expose those registers
>>>>>>>>>>> -        * to process space
>>>>>>>>>>> -        */
>>>>>>>>>>> -       if (adev->nbio.funcs->remap_hdp_registers && !amdgpu_sriov_vf(adev))
>>>>>>>>>>> -               adev->nbio.funcs->remap_hdp_registers(adev);
>>>>>>>>>>>
>>>>>>>>>>>              /* enable the doorbell aperture */
>>>>>>>>>>>              soc15_enable_doorbell_aperture(adev, true);
>>>>>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/soc21.c b/drivers/gpu/drm/amd/amdgpu/soc21.c
>>>>>>>>>>> index 55284b24f113..16b447055102 100644
>>>>>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/soc21.c
>>>>>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/soc21.c
>>>>>>>>>>> @@ -660,12 +660,6 @@ static int soc21_common_hw_init(void *handle)
>>>>>>>>>>>              soc21_program_aspm(adev);
>>>>>>>>>>>              /* setup nbio registers */
>>>>>>>>>>>              adev->nbio.funcs->init_registers(adev);
>>>>>>>>>>> -       /* remap HDP registers to a hole in mmio space,
>>>>>>>>>>> -        * for the purpose of expose those registers
>>>>>>>>>>> -        * to process space
>>>>>>>>>>> -        */
>>>>>>>>>>> -       if (adev->nbio.funcs->remap_hdp_registers)
>>>>>>>>>>> -               adev->nbio.funcs->remap_hdp_registers(adev);
>>>>>>>>>>>              /* enable the doorbell aperture */
>>>>>>>>>>>              soc21_enable_doorbell_aperture(adev, true);
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> 2.25.1
>>>>>>>>>>>


More information about the amd-gfx mailing list