Re: Re: Re: 回复: Re: [PATCH] drm/amdgpu: fixup bad vram size on gmc v8

Fri Dec 17 21:19:01 UTC 2021

If you could get me a copy of the vbios image from a problematic board,
that would be helpful.  In the meantime, I've applied the patch.

Alex

On Thu, Dec 16, 2021 at 9:38 PM 周宗敏 <zhouzongmin at kylinos.cn> wrote:

> Dear Alex:
>
>
> >Is the issue reproducible with the same board in bare metal on x86?Or
> does it only happen with passthrough on ARM?
>
>
> Unfortunately, my current environment is not convenient to test this GPU
> board on x86 platform.
>
> but I can tell you the problem still occurs on ARM without passthrough to
> virtual machine.
>
>
> In addition,at end of 2020,my colleagues also found similar problems on
> MIPS platforms with Graphics chips of Radeon R7 340.
>
> So,I may think it can happen to no matter based on x86 ,ARM or mips.
>
>
> I hope the above information is helpful to you，and I also think it will be
> better for user if can root cause this issue.
>
>
> Best regards.
>
>
>
>
> ----
>
>
>
>
>
>
> *主 题：*Re: Re: 回复: Re: [PATCH] drm/amdgpu: fixup bad vram size on gmc v8
>
> *日 期：*2021-12-16 23:28
> *发件人：*Alex Deucher
> *收件人：*周宗敏
>
>
> Is the issue reproducible with the same board in bare metal on x86?  Or
> does it only happen with passthrough on ARM?  Looking through the archives,
> the SI patch I made was for an x86 laptop.  It would be nice to root
> cause this, but there weren't any gfx8 boards with more than 64G of vram,
> so I think it's safe.  That said, if you see similar issues with newer gfx
> IPs then we have an issue since the upper bit will be meaningful, so it
> would be nice to root cause this.
>
> Alex
>
>
> On Thu, Dec 16, 2021 at 4:36 AM 周宗敏 <zhouzongmin at kylinos.cn> wrote:
>
>> Hi  Christian,
>>
>>
>> I'm  testing for GPU passthrough feature, so I pass through this GPU to
>> virtual machine to use. It  based on arm64 system.
>>
>> As far as i know, Alex had dealt with a similar problems on
>> dri/radeon/si.c .  Maybe they have a same reason to cause it?
>>
>> the history commit message is below:
>>
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0ca223b029a261e82fb2f50c52eb85d510f4260e
>>
>> [image: image.png]
>>
>>
>> Thanks very much.
>>
>>
>>
>> ----
>>
>>
>>
>> *主 题：*Re: 回复: Re: [PATCH] drm/amdgpu: fixup bad vram size on gmc v8
>>
>> *日 期：*2021-12-16 16:15
>> *发件人：*Christian König
>> *收件人：*周宗敏Alex Deucher
>>
>>
>>
>>
>> Hi Zongmin,
>>
>>    that strongly sounds like the ASIC is not correctly initialized when
>>  trying to read the register.
>>
>>    What board and environment are you using this GPU with? Is that a
>>  normal x86 system?
>>
>>    Regards,
>>    Christian.
>>
>>
>>
>> Am 16.12.21 um 04:11 schrieb 周宗敏:
>>
>>
>>
>>    1.
>>
>>    the problematic boards that I have tested is [AMD/ATI] Lexa
>>     PRO [Radeon RX 550/550X] ;  and the vbios version :
>>     113-RXF9310-C09-BT
>>    2.
>>
>>    When an exception occurs I can see the following changes in
>>     the values of vram size get from RREG32(mmCONFIG_MEMSIZE) ,
>>
>>    it seems to have garbage in the upper 16 bits
>>
>>    [image: image.png]
>>
>>
>>
>>
>>    3.
>>
>>    and then I can also see some dmesg like below:
>>
>>    when vram size register have garbage,we may see error
>>     message like below:
>>
>>    amdgpu 0000:09:00.0: VRAM: 4286582784M 0x000000F400000000 -
>>     0x000FF8F4FFFFFFFF (4286582784M used)
>>
>>    the correct message should like below:
>>
>>    amdgpu 0000:09:00.0: VRAM: 4096M 0x000000F400000000 -
>>     0x000000F4FFFFFFFF (4096M used)
>>
>>
>>
>>
>>    if you have any problems,please send me mail.
>>
>>    thanks very much.
>>
>>
>>
>>
>> ----
>>
>> *主 题：*Re: [PATCH] drm/amdgpu:          fixup bad vram size on gmc v8
>>
>>        *日 期：*2021-12-16 04:23
>>        *发件人：*Alex Deucher
>>        *收件人：*Zongmin Zhou
>>
>>
>>
>>
>> On Wed, Dec 15, 2021 at 10:31 AM Zongmin Zhouwrote:
>>          >
>>          > Some boards(like RX550) seem to have garbage in the upper
>>          > 16 bits of the vram size register.  Check for
>>          > this and clamp the size properly.  Fixes
>>          > boards reporting bogus amounts of vram.
>>          >
>>          > after add this patch,the maximum GPU VRAM size is 64GB,
>>          > otherwise only 64GB vram size will be used.
>>
>>          Can you provide some examples of problematic boards and
>>  possibly a
>>          vbios image from the problematic board?  What values are you
>>      seeing?
>>          It would be nice to see what the boards are reporting and
>>    whether the
>>          lower 16 bits are actually correct or if it is some other
>>    issue.  This
>>          register is undefined until the asic has been initialized.
>>     The vbios
>>          programs it as part of it's asic init sequence (either via
>>    vesa/gop or
>>          the OS driver).
>>
>>          Alex
>>
>>
>>          >
>>          > Signed-off-by: Zongmin Zhou
>>            > ---
>>            >  drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 13
>>  ++++++++++---
>>            >  1 file changed, 10 insertions(+), 3 deletions(-)
>>            >
>>            > diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
>>  b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
>>            > index 492ebed2915b..63b890f1e8af 100644
>>            > --- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
>>            > +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
>>            > @@ -515,10 +515,10 @@ static void
>>  gmc_v8_0_mc_program(struct amdgpu_device *adev)
>>            >  static int gmc_v8_0_mc_init(struct amdgpu_device
>>  *adev)
>>            >  {
>>            >         int r;
>>            > +       u32 tmp;
>>            >
>>            >         adev->gmc.vram_width =
>>  amdgpu_atombios_get_vram_width(adev);
>>            >         if (!adev->gmc.vram_width) {
>>            > -               u32 tmp;
>>            >                 int chansize, numchan;
>>            >
>>            >                 /* Get VRAM informations */
>>            > @@ -562,8 +562,15 @@ static int gmc_v8_0_mc_init(struct
>>        amdgpu_device *adev)
>>            >                 adev->gmc.vram_width = numchan *
>>  chansize;
>>            >         }
>>            >         /* size in MB on si */
>>            > -       adev->gmc.mc_vram_size =
>>  RREG32(mmCONFIG_MEMSIZE) * 1024ULL * 1024ULL;
>>            > -       adev->gmc.real_vram_size =
>>  RREG32(mmCONFIG_MEMSIZE) * 1024ULL * 1024ULL;
>>            > +       tmp = RREG32(mmCONFIG_MEMSIZE);
>>            > +       /* some boards may have garbage in the upper 16
>>        bits */
>>            > +       if (tmp & 0xffff0000) {
>>            > +               DRM_INFO("Probable bad vram size:
>>  0x%08x\n", tmp);
>>            > +               if (tmp & 0xffff)
>>            > +                       tmp &= 0xffff;
>>            > +       }
>>            > +       adev->gmc.mc_vram_size = tmp * 1024ULL *
>>  1024ULL;
>>            > +       adev->gmc.real_vram_size =
>>  adev->gmc.mc_vram_size;
>>            >
>>            >         if (!(adev->flags & AMD_IS_APU)) {
>>            >                 r = amdgpu_device_resize_fb_bar(adev);
>>            > --
>>            > 2.25.1
>>            >
>>            >
>>            > No virus found
>>            >                 Checked by Hillstone Network AntiVirus
>>
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20211217/9ca5cc58/attachment-0001.htm>