Re: Re: Re: Re: 回复: Re: [PATCH] drm/amdgpu: fixup bad vram size on gmc v8
Alex Deucher
alexdeucher at gmail.com
Tue Dec 21 20:34:25 UTC 2021
Yes, you can either do that, or if amdgpu is loaded, just read the data
from /sys/kernel/debug/dri/0/amdgpu_vbios
Alex
On Mon, Dec 20, 2021 at 3:06 AM 周宗敏 <zhouzongmin at kylinos.cn> wrote:
>
>
> Dear Alex:
>
>
> I've never tried to get a VBIOS before, so can you tell me how to get a
> vbios image copy for you?
>
> I try to google, just get the message that maybe can get from the
> following way:
>
> echo 1 > /sys/devices/pci0000:00/0000:00:02.0/rom
>
> cat /sys/devices/pci0000:00/0000:00:02.0/rom > vbios.dump
>
> echo 0 > /sys/devices/pci0000:00/0000:00:02.0/rom
>
>
> Is that right?
>
>
> Thanks very much.
>
>
> ----
>
>
>
>
>
>
> *主 题:*Re: Re: Re: 回复: Re: [PATCH] drm/amdgpu: fixup bad vram size on gmc
> v8
> *日 期:*2021-12-18 05:19
> *发件人:*Alex Deucher
> *收件人:*周宗敏
>
>
> If you could get me a copy of the vbios image from a problematic board,
> that would be helpful. In the meantime, I've applied the patch.
>
> Alex
>
>
> On Thu, Dec 16, 2021 at 9:38 PM 周宗敏 <zhouzongmin at kylinos.cn> wrote:
>
>> Dear Alex:
>>
>>
>> >Is the issue reproducible with the same board in bare metal on x86?Or
>> does it only happen with passthrough on ARM?
>>
>>
>> Unfortunately, my current environment is not convenient to test this GPU
>> board on x86 platform.
>>
>> but I can tell you the problem still occurs on ARM without passthrough to
>> virtual machine.
>>
>>
>> In addition,at end of 2020,my colleagues also found similar problems on
>> MIPS platforms with Graphics chips of Radeon R7 340.
>>
>> So,I may think it can happen to no matter based on x86 ,ARM or mips.
>>
>>
>> I hope the above information is helpful to you,and I also think it will
>> be better for user if can root cause this issue.
>>
>>
>> Best regards.
>>
>>
>>
>>
>> ----
>>
>>
>>
>>
>>
>>
>> *主 题:*Re: Re: 回复: Re: [PATCH] drm/amdgpu: fixup bad vram size on gmc v8
>>
>> *日 期:*2021-12-16 23:28
>> *发件人:*Alex Deucher
>> *收件人:*周宗敏
>>
>>
>> Is the issue reproducible with the same board in bare metal on x86? Or
>> does it only happen with passthrough on ARM? Looking through the archives,
>> the SI patch I made was for an x86 laptop. It would be nice to root cause
>> this, but there weren't any gfx8 boards with more than 64G of vram, so I
>> think it's safe. That said, if you see similar issues with newer gfx IPs
>> then we have an issue since the upper bit will be meaningful, so it would
>> be nice to root cause this.
>>
>> Alex
>>
>>
>> On Thu, Dec 16, 2021 at 4:36 AM 周宗敏 <zhouzongmin at kylinos.cn> wrote:
>>
>>> Hi Christian,
>>>
>>>
>>> I'm testing for GPU passthrough feature, so I pass through this GPU to
>>> virtual machine to use. It based on arm64 system.
>>>
>>> As far as i know, Alex had dealt with a similar problems on
>>> dri/radeon/si.c . Maybe they have a same reason to cause it?
>>>
>>> the history commit message is below:
>>>
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0ca223b029a261e82fb2f50c52eb85d510f4260e
>>>
>>> [image: image.png]
>>>
>>>
>>> Thanks very much.
>>>
>>>
>>>
>>> ----
>>>
>>>
>>>
>>> *主 题:*Re: 回复: Re: [PATCH] drm/amdgpu: fixup bad vram size on gmc v8
>>>
>>> *日 期:*2021-12-16 16:15
>>> *发件人:*Christian König
>>> *收件人:*周宗敏Alex Deucher
>>>
>>>
>>>
>>>
>>> Hi Zongmin,
>>>
>>> that strongly sounds like the ASIC is not correctly initialized when
>>> trying to read the register.
>>>
>>> What board and environment are you using this GPU with? Is that a
>>> normal x86 system?
>>>
>>> Regards,
>>> Christian.
>>>
>>>
>>>
>>> Am 16.12.21 um 04:11 schrieb 周宗敏:
>>>
>>>
>>>
>>> 1.
>>>
>>> the problematic boards that I have tested is [AMD/ATI] Lexa
>>> PRO [Radeon RX 550/550X] ; and the vbios version :
>>> 113-RXF9310-C09-BT
>>> 2.
>>>
>>> When an exception occurs I can see the following changes in
>>> the values of vram size get from RREG32(mmCONFIG_MEMSIZE) ,
>>>
>>> it seems to have garbage in the upper 16 bits
>>>
>>> [image: image.png]
>>>
>>>
>>>
>>>
>>> 3.
>>>
>>> and then I can also see some dmesg like below:
>>>
>>> when vram size register have garbage,we may see error
>>> message like below:
>>>
>>> amdgpu 0000:09:00.0: VRAM: 4286582784M 0x000000F400000000 -
>>> 0x000FF8F4FFFFFFFF (4286582784M used)
>>>
>>> the correct message should like below:
>>>
>>> amdgpu 0000:09:00.0: VRAM: 4096M 0x000000F400000000 -
>>> 0x000000F4FFFFFFFF (4096M used)
>>>
>>>
>>>
>>>
>>> if you have any problems,please send me mail.
>>>
>>> thanks very much.
>>>
>>>
>>>
>>>
>>> ----
>>>
>>> *主 题:*Re: [PATCH] drm/amdgpu: fixup bad vram size on gmc v8
>>>
>>> *日 期:*2021-12-16 04:23
>>> *发件人:*Alex Deucher
>>> *收件人:*Zongmin Zhou
>>>
>>>
>>>
>>>
>>> On Wed, Dec 15, 2021 at 10:31 AM Zongmin Zhouwrote:
>>> >
>>> > Some boards(like RX550) seem to have garbage in the upper
>>> > 16 bits of the vram size register. Check for
>>> > this and clamp the size properly. Fixes
>>> > boards reporting bogus amounts of vram.
>>> >
>>> > after add this patch,the maximum GPU VRAM size is 64GB,
>>> > otherwise only 64GB vram size will be used.
>>>
>>> Can you provide some examples of problematic boards and
>>> possibly a
>>> vbios image from the problematic board? What values are you
>>> seeing?
>>> It would be nice to see what the boards are reporting and
>>> whether the
>>> lower 16 bits are actually correct or if it is some other
>>> issue. This
>>> register is undefined until the asic has been initialized.
>>> The vbios
>>> programs it as part of it's asic init sequence (either via
>>> vesa/gop or
>>> the OS driver).
>>>
>>> Alex
>>>
>>>
>>> >
>>> > Signed-off-by: Zongmin Zhou
>>> > ---
>>> > drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 13
>>> ++++++++++---
>>> > 1 file changed, 10 insertions(+), 3 deletions(-)
>>> >
>>> > diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
>>> b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
>>> > index 492ebed2915b..63b890f1e8af 100644
>>> > --- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
>>> > +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
>>> > @@ -515,10 +515,10 @@ static void
>>> gmc_v8_0_mc_program(struct amdgpu_device *adev)
>>> > static int gmc_v8_0_mc_init(struct amdgpu_device
>>> *adev)
>>> > {
>>> > int r;
>>> > + u32 tmp;
>>> >
>>> > adev->gmc.vram_width =
>>> amdgpu_atombios_get_vram_width(adev);
>>> > if (!adev->gmc.vram_width) {
>>> > - u32 tmp;
>>> > int chansize, numchan;
>>> >
>>> > /* Get VRAM informations */
>>> > @@ -562,8 +562,15 @@ static int gmc_v8_0_mc_init(struct
>>> amdgpu_device *adev)
>>> > adev->gmc.vram_width = numchan *
>>> chansize;
>>> > }
>>> > /* size in MB on si */
>>> > - adev->gmc.mc_vram_size =
>>> RREG32(mmCONFIG_MEMSIZE) * 1024ULL * 1024ULL;
>>> > - adev->gmc.real_vram_size =
>>> RREG32(mmCONFIG_MEMSIZE) * 1024ULL * 1024ULL;
>>> > + tmp = RREG32(mmCONFIG_MEMSIZE);
>>> > + /* some boards may have garbage in the upper 16
>>> bits */
>>> > + if (tmp & 0xffff0000) {
>>> > + DRM_INFO("Probable bad vram size:
>>> 0x%08x\n", tmp);
>>> > + if (tmp & 0xffff)
>>> > + tmp &= 0xffff;
>>> > + }
>>> > + adev->gmc.mc_vram_size = tmp * 1024ULL *
>>> 1024ULL;
>>> > + adev->gmc.real_vram_size =
>>> adev->gmc.mc_vram_size;
>>> >
>>> > if (!(adev->flags & AMD_IS_APU)) {
>>> > r = amdgpu_device_resize_fb_bar(adev);
>>> > --
>>> > 2.25.1
>>> >
>>> >
>>> > No virus found
>>> > Checked by Hillstone Network AntiVirus
>>>
>>>
>>>
>>>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20211221/f2281a61/attachment-0001.htm>
More information about the amd-gfx
mailing list