[PATCH v2 1/2] drm/amdgpu: Move HDP remapping earlier during init
Lazar, Lijo
lijo.lazar at amd.com
Tue Aug 30 14:45:40 UTC 2022
On 8/30/2022 7:18 PM, Alex Deucher wrote:
> On Tue, Aug 30, 2022 at 12:05 AM Lazar, Lijo <lijo.lazar at amd.com> wrote:
>>
>>
>>
>> On 8/29/2022 10:20 PM, Alex Deucher wrote:
>>> On Mon, Aug 29, 2022 at 4:18 AM Lijo Lazar <lijo.lazar at amd.com> wrote:
>>>>
>>>> HDP flush is used early in the init sequence as part of memory controller
>>>> block initialization. Hence remapping of HDP registers needed for flush
>>>> needs to happen earlier.
>>>>
>>>> This also fixes the Unsupported Request error reported through AER during
>>>> driver load. The error happens as a write happens to the remap offset
>>>> before real remapping is done.
>>>>
>>>> Link: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D216373&data=05%7C01%7Clijo.lazar%40amd.com%7C66066549213c45b1341508da8a8e4f1b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637974641082680316%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=ti%2FktO6Gmrio3TLDoteeJil28PY3D%2BLAinB6TU2mI9s%3D&reserved=0
>>>>
>>>> The error was unnoticed before and got visible because of the commit
>>>> referenced below. This doesn't fix anything in the commit below, rather
>>>> fixes the issue in amdgpu exposed by the commit. The reference is only
>>>> to associate this commit with below one so that both go together.
>>>>
>>>> Fixes: 8795e182b02d ("PCI/portdrv: Don't disable AER reporting in get_port_device_capability()")
>>>>
>>>> Reported-by: Tom Seewald <tseewald at gmail.com>
>>>> Signed-off-by: Lijo Lazar <lijo.lazar at amd.com>
>>>> Cc: stable at vger.kernel.org
>>>
>>> How about something like the attached patch rather than these two
>>> patches? It's a bit bigger but seems cleaner and more defensive in my
>>> opinion.
>>>
>>
>> Whenever device goes to suspend/reset and then comes back, remap offset
>> has to be set back to 0 to make sure it doesn't use the wrong offset
>> when the register assumes default values again.
>>
>> To avoid the if-check in hdp_flush (which is more frequent), another way
>> is to initialize the remap offset to default offset during early init
>> and hw fini/suspend sequences. It won't be obvious (even with this
>> patch) as to when remap offset vs default offset is used though.
>
> On resume, the common IP is resumed first so it will always be set.
> The only case that is a problem is init because we init GMC out of
> order. We could init common before GMC in amdgpu_device_ip_init(). I
> think that should be fine, but I wasn't sure if there might be some
> fallout from that on certain cards.
>
There are other places where an IP order is forced like in
amdgpu_device_ip_reinit_early_sriov(). This also may not affect this
case as remapping is not done for VF.
Agree that a better way is to have the common IP to be inited first.
Thanks,
Lijo
> Alex
>
>>
>> Thanks,
>> Lijo
>>
>>> Alex
>>>
>>>> ---
>>>> v2:
>>>> Take care of IP resume cases (Alex Deucher)
>>>> Add NULL check to nbio.funcs to cover older (GFXv8) ASICs (Felix Kuehling)
>>>> Add more details in commit message and associate with AER patch (Bjorn
>>>> Helgaas)
>>>>
>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 24 ++++++++++++++++++++++
>>>> drivers/gpu/drm/amd/amdgpu/nv.c | 6 ------
>>>> drivers/gpu/drm/amd/amdgpu/soc15.c | 6 ------
>>>> drivers/gpu/drm/amd/amdgpu/soc21.c | 6 ------
>>>> 4 files changed, 24 insertions(+), 18 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> index ce7d117efdb5..e420118769a5 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> @@ -2334,6 +2334,26 @@ static int amdgpu_device_init_schedulers(struct amdgpu_device *adev)
>>>> return 0;
>>>> }
>>>>
>>>> +/**
>>>> + * amdgpu_device_prepare_ip - prepare IPs for hardware initialization
>>>> + *
>>>> + * @adev: amdgpu_device pointer
>>>> + *
>>>> + * Any common hardware initialization sequence that needs to be done before
>>>> + * hw init of individual IPs is performed here. This is different from the
>>>> + * 'common block' which initializes a set of IPs.
>>>> + */
>>>> +static void amdgpu_device_prepare_ip(struct amdgpu_device *adev)
>>>> +{
>>>> + /* Remap HDP registers to a hole in mmio space, for the purpose
>>>> + * of exposing those registers to process space. This needs to be
>>>> + * done before hw init of ip blocks to take care of HDP flush
>>>> + * operations through registers during hw_init.
>>>> + */
>>>> + if (adev->nbio.funcs && adev->nbio.funcs->remap_hdp_registers &&
>>>> + !amdgpu_sriov_vf(adev))
>>>> + adev->nbio.funcs->remap_hdp_registers(adev);
>>>> +}
>>>>
>>>> /**
>>>> * amdgpu_device_ip_init - run init for hardware IPs
>>>> @@ -2376,6 +2396,8 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
>>>> DRM_ERROR("amdgpu_vram_scratch_init failed %d\n", r);
>>>> goto init_failed;
>>>> }
>>>> +
>>>> + amdgpu_device_prepare_ip(adev);
>>>> r = adev->ip_blocks[i].version->funcs->hw_init((void *)adev);
>>>> if (r) {
>>>> DRM_ERROR("hw_init %d failed %d\n", i, r);
>>>> @@ -3058,6 +3080,7 @@ static int amdgpu_device_ip_reinit_early_sriov(struct amdgpu_device *adev)
>>>> AMD_IP_BLOCK_TYPE_IH,
>>>> };
>>>>
>>>> + amdgpu_device_prepare_ip(adev);
>>>> for (i = 0; i < adev->num_ip_blocks; i++) {
>>>> int j;
>>>> struct amdgpu_ip_block *block;
>>>> @@ -3139,6 +3162,7 @@ static int amdgpu_device_ip_resume_phase1(struct amdgpu_device *adev)
>>>> {
>>>> int i, r;
>>>>
>>>> + amdgpu_device_prepare_ip(adev);
>>>> for (i = 0; i < adev->num_ip_blocks; i++) {
>>>> if (!adev->ip_blocks[i].status.valid || adev->ip_blocks[i].status.hw)
>>>> continue;
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c b/drivers/gpu/drm/amd/amdgpu/nv.c
>>>> index b3fba8dea63c..3ac7fef74277 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/nv.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/nv.c
>>>> @@ -1032,12 +1032,6 @@ static int nv_common_hw_init(void *handle)
>>>> nv_program_aspm(adev);
>>>> /* setup nbio registers */
>>>> adev->nbio.funcs->init_registers(adev);
>>>> - /* remap HDP registers to a hole in mmio space,
>>>> - * for the purpose of expose those registers
>>>> - * to process space
>>>> - */
>>>> - if (adev->nbio.funcs->remap_hdp_registers && !amdgpu_sriov_vf(adev))
>>>> - adev->nbio.funcs->remap_hdp_registers(adev);
>>>> /* enable the doorbell aperture */
>>>> nv_enable_doorbell_aperture(adev, true);
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c b/drivers/gpu/drm/amd/amdgpu/soc15.c
>>>> index fde6154f2009..a0481e37d7cf 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
>>>> @@ -1240,12 +1240,6 @@ static int soc15_common_hw_init(void *handle)
>>>> soc15_program_aspm(adev);
>>>> /* setup nbio registers */
>>>> adev->nbio.funcs->init_registers(adev);
>>>> - /* remap HDP registers to a hole in mmio space,
>>>> - * for the purpose of expose those registers
>>>> - * to process space
>>>> - */
>>>> - if (adev->nbio.funcs->remap_hdp_registers && !amdgpu_sriov_vf(adev))
>>>> - adev->nbio.funcs->remap_hdp_registers(adev);
>>>>
>>>> /* enable the doorbell aperture */
>>>> soc15_enable_doorbell_aperture(adev, true);
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/soc21.c b/drivers/gpu/drm/amd/amdgpu/soc21.c
>>>> index 55284b24f113..16b447055102 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/soc21.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/soc21.c
>>>> @@ -660,12 +660,6 @@ static int soc21_common_hw_init(void *handle)
>>>> soc21_program_aspm(adev);
>>>> /* setup nbio registers */
>>>> adev->nbio.funcs->init_registers(adev);
>>>> - /* remap HDP registers to a hole in mmio space,
>>>> - * for the purpose of expose those registers
>>>> - * to process space
>>>> - */
>>>> - if (adev->nbio.funcs->remap_hdp_registers)
>>>> - adev->nbio.funcs->remap_hdp_registers(adev);
>>>> /* enable the doorbell aperture */
>>>> soc21_enable_doorbell_aperture(adev, true);
>>>>
>>>> --
>>>> 2.25.1
>>>>
More information about the amd-gfx
mailing list