Re: Re: Re: Re: 回复: Re: [PATCH] drm/amdgpu: fixup bad vram size on gmc v8

Alex Deucher alexdeucher at gmail.com
Tue Dec 21 20:34:25 UTC 2021


Yes, you can either do that, or if amdgpu is loaded, just read the data
from /sys/kernel/debug/dri/0/amdgpu_vbios

Alex


On Mon, Dec 20, 2021 at 3:06 AM 周宗敏 <zhouzongmin at kylinos.cn> wrote:

>
>
> Dear Alex:
>
>
> I've never tried to get a VBIOS before, so can you tell me how to  get a
> vbios image copy for you?
>
> I  try to google, just get the message that maybe can get from the
> following way:
>
> echo 1 > /sys/devices/pci0000:00/0000:00:02.0/rom
>
> cat /sys/devices/pci0000:00/0000:00:02.0/rom > vbios.dump
>
> echo 0 > /sys/devices/pci0000:00/0000:00:02.0/rom
>
>
> Is that right?
>
>
> Thanks very much.
>
>
> ----
>
>
>
>
>
>
> *主 题:*Re: Re: Re: 回复: Re: [PATCH] drm/amdgpu: fixup bad vram size on gmc
> v8
> *日 期:*2021-12-18 05:19
> *发件人:*Alex Deucher
> *收件人:*周宗敏
>
>
> If you could get me a copy of the vbios image from a problematic board,
> that would be helpful.  In the meantime, I've applied the patch.
>
> Alex
>
>
> On Thu, Dec 16, 2021 at 9:38 PM 周宗敏 <zhouzongmin at kylinos.cn> wrote:
>
>> Dear Alex:
>>
>>
>> >Is the issue reproducible with the same board in bare metal on x86?Or
>> does it only happen with passthrough on ARM?
>>
>>
>> Unfortunately, my current environment is not convenient to test this GPU
>> board on x86 platform.
>>
>> but I can tell you the problem still occurs on ARM without passthrough to
>> virtual machine.
>>
>>
>> In addition,at end of 2020,my colleagues also found similar problems on
>> MIPS platforms with Graphics chips of Radeon R7 340.
>>
>> So,I may think it can happen to no matter based on x86 ,ARM or mips.
>>
>>
>> I hope the above information is helpful to you,and I also think it will
>> be better for user if can root cause this issue.
>>
>>
>> Best regards.
>>
>>
>>
>>
>> ----
>>
>>
>>
>>
>>
>>
>> *主 题:*Re: Re: 回复: Re: [PATCH] drm/amdgpu: fixup bad vram size on gmc v8
>>
>> *日 期:*2021-12-16 23:28
>> *发件人:*Alex Deucher
>> *收件人:*周宗敏
>>
>>
>> Is the issue reproducible with the same board in bare metal on x86?  Or
>> does it only happen with passthrough on ARM?  Looking through the archives,
>> the SI patch I made was for an x86 laptop.  It would be nice to root cause
>> this, but there weren't any gfx8 boards with more than 64G of vram, so I
>> think it's safe.  That said, if you see similar issues with newer gfx IPs
>> then we have an issue since the upper bit will be meaningful, so it would
>> be nice to root cause this.
>>
>> Alex
>>
>>
>> On Thu, Dec 16, 2021 at 4:36 AM 周宗敏 <zhouzongmin at kylinos.cn> wrote:
>>
>>> Hi  Christian,
>>>
>>>
>>> I'm  testing for GPU passthrough feature, so I pass through this GPU to
>>> virtual machine to use. It  based on arm64 system.
>>>
>>> As far as i know, Alex had dealt with a similar problems on
>>> dri/radeon/si.c .  Maybe they have a same reason to cause it?
>>>
>>> the history commit message is below:
>>>
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0ca223b029a261e82fb2f50c52eb85d510f4260e
>>>
>>> [image: image.png]
>>>
>>>
>>> Thanks very much.
>>>
>>>
>>>
>>> ----
>>>
>>>
>>>
>>> *主 题:*Re: 回复: Re: [PATCH] drm/amdgpu: fixup bad vram size on gmc v8
>>>
>>> *日 期:*2021-12-16 16:15
>>> *发件人:*Christian König
>>> *收件人:*周宗敏Alex Deucher
>>>
>>>
>>>
>>>
>>> Hi Zongmin,
>>>
>>>    that strongly sounds like the ASIC is not correctly initialized when
>>>    trying to read the register.
>>>
>>>    What board and environment are you using this GPU with? Is that a
>>>  normal x86 system?
>>>
>>>    Regards,
>>>    Christian.
>>>
>>>
>>>
>>> Am 16.12.21 um 04:11 schrieb 周宗敏:
>>>
>>>
>>>
>>>    1.
>>>
>>>    the problematic boards that I have tested is [AMD/ATI] Lexa
>>>       PRO [Radeon RX 550/550X] ;  and the vbios version :
>>>     113-RXF9310-C09-BT
>>>    2.
>>>
>>>    When an exception occurs I can see the following changes in
>>>       the values of vram size get from RREG32(mmCONFIG_MEMSIZE) ,
>>>
>>>    it seems to have garbage in the upper 16 bits
>>>
>>>    [image: image.png]
>>>
>>>
>>>
>>>
>>>    3.
>>>
>>>    and then I can also see some dmesg like below:
>>>
>>>    when vram size register have garbage,we may see error
>>>     message like below:
>>>
>>>    amdgpu 0000:09:00.0: VRAM: 4286582784M 0x000000F400000000 -
>>>       0x000FF8F4FFFFFFFF (4286582784M used)
>>>
>>>    the correct message should like below:
>>>
>>>    amdgpu 0000:09:00.0: VRAM: 4096M 0x000000F400000000 -
>>>     0x000000F4FFFFFFFF (4096M used)
>>>
>>>
>>>
>>>
>>>    if you have any problems,please send me mail.
>>>
>>>    thanks very much.
>>>
>>>
>>>
>>>
>>> ----
>>>
>>> *主 题:*Re: [PATCH] drm/amdgpu:          fixup bad vram size on gmc v8
>>>
>>>        *日 期:*2021-12-16 04:23
>>>        *发件人:*Alex Deucher
>>>        *收件人:*Zongmin Zhou
>>>
>>>
>>>
>>>
>>> On Wed, Dec 15, 2021 at 10:31 AM Zongmin Zhouwrote:
>>>          >
>>>          > Some boards(like RX550) seem to have garbage in the upper
>>>          > 16 bits of the vram size register.  Check for
>>>          > this and clamp the size properly.  Fixes
>>>          > boards reporting bogus amounts of vram.
>>>          >
>>>          > after add this patch,the maximum GPU VRAM size is 64GB,
>>>          > otherwise only 64GB vram size will be used.
>>>
>>>          Can you provide some examples of problematic boards and
>>>  possibly a
>>>          vbios image from the problematic board?  What values are you
>>>        seeing?
>>>          It would be nice to see what the boards are reporting and
>>>    whether the
>>>          lower 16 bits are actually correct or if it is some other
>>>    issue.  This
>>>          register is undefined until the asic has been initialized.
>>>       The vbios
>>>          programs it as part of it's asic init sequence (either via
>>>      vesa/gop or
>>>          the OS driver).
>>>
>>>          Alex
>>>
>>>
>>>          >
>>>          > Signed-off-by: Zongmin Zhou
>>>            > ---
>>>            >  drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 13
>>>  ++++++++++---
>>>            >  1 file changed, 10 insertions(+), 3 deletions(-)
>>>            >
>>>            > diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
>>>    b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
>>>            > index 492ebed2915b..63b890f1e8af 100644
>>>            > --- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
>>>            > +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
>>>            > @@ -515,10 +515,10 @@ static void
>>>  gmc_v8_0_mc_program(struct amdgpu_device *adev)
>>>            >  static int gmc_v8_0_mc_init(struct amdgpu_device
>>>  *adev)
>>>            >  {
>>>            >         int r;
>>>            > +       u32 tmp;
>>>            >
>>>            >         adev->gmc.vram_width =
>>>  amdgpu_atombios_get_vram_width(adev);
>>>            >         if (!adev->gmc.vram_width) {
>>>            > -               u32 tmp;
>>>            >                 int chansize, numchan;
>>>            >
>>>            >                 /* Get VRAM informations */
>>>            > @@ -562,8 +562,15 @@ static int gmc_v8_0_mc_init(struct
>>>        amdgpu_device *adev)
>>>            >                 adev->gmc.vram_width = numchan *
>>>  chansize;
>>>            >         }
>>>            >         /* size in MB on si */
>>>            > -       adev->gmc.mc_vram_size =
>>>  RREG32(mmCONFIG_MEMSIZE) * 1024ULL * 1024ULL;
>>>            > -       adev->gmc.real_vram_size =
>>>  RREG32(mmCONFIG_MEMSIZE) * 1024ULL * 1024ULL;
>>>            > +       tmp = RREG32(mmCONFIG_MEMSIZE);
>>>            > +       /* some boards may have garbage in the upper 16
>>>        bits */
>>>            > +       if (tmp & 0xffff0000) {
>>>            > +               DRM_INFO("Probable bad vram size:
>>>  0x%08x\n", tmp);
>>>            > +               if (tmp & 0xffff)
>>>            > +                       tmp &= 0xffff;
>>>            > +       }
>>>            > +       adev->gmc.mc_vram_size = tmp * 1024ULL *
>>>  1024ULL;
>>>            > +       adev->gmc.real_vram_size =
>>>  adev->gmc.mc_vram_size;
>>>            >
>>>            >         if (!(adev->flags & AMD_IS_APU)) {
>>>            >                 r = amdgpu_device_resize_fb_bar(adev);
>>>            > --
>>>            > 2.25.1
>>>            >
>>>            >
>>>            > No virus found
>>>            >                 Checked by Hillstone Network AntiVirus
>>>
>>>
>>>
>>>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20211221/f2281a61/attachment-0001.htm>


More information about the amd-gfx mailing list