[PATCH] drm/amdgpu: csa_vaddr should not larger than AMDGPU_GMC_HOLE_START

Fri Jan 18 11:07:47 UTC 2019

HI Christian

Regarding with range 0->hole_start being occupied by ATC,  Can you share me where you see such limitation, any hardware document ?

/Monk
From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> On Behalf Of Koenig, Christian
Sent: Friday, January 18, 2019 5:02 PM
To: Lou, Wentao <Wentao.Lou at amd.com>; Liu, Monk <Monk.Liu at amd.com>; amd-gfx at lists.freedesktop.org; Zhu, Rex <Rex.Zhu at amd.com>
Cc: Deng, Emily <Emily.Deng at amd.com>
Subject: Re: [PATCH] drm/amdgpu: csa_vaddr should not larger than AMDGPU_GMC_HOLE_START

0xFFFF FFF0 0000 should be inside the hole, so why it is a correct address?

I was assuming that this is the linear address which is then "sign extended" to 0xFFFF FFFF FFF0 0000.

If that address is used somewhere without extending bit 47 into bits 48-63 then we have found the root cause of the issue.

Regards,
Christian.

Am 18.01.19 um 05:44 schrieb Lou, Wentao:
Hi Christian,

Thanks for the explanation. I have another question for the HOLE.

As you mentioned “Trying to access the hole results in a range fault interrupt IIRC.”
0x8000 0000 0000
.... hole
0xFFFF 8000 0000 0000

But you said “0xFFFF FFF0 0000 is the correct address, if that is causing a problem then there is a bug somewhere else.”
0xFFFF FFF0 0000 should be inside the hole, so why it is a correct address?
Much thanks.

BR,
Wentao

From: Koenig, Christian <Christian.Koenig at amd.com><mailto:Christian.Koenig at amd.com>
Sent: Thursday, January 17, 2019 4:30 PM
To: Liu, Monk <Monk.Liu at amd.com><mailto:Monk.Liu at amd.com>; Lou, Wentao <Wentao.Lou at amd.com><mailto:Wentao.Lou at amd.com>; amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org>; Zhu, Rex <Rex.Zhu at amd.com><mailto:Rex.Zhu at amd.com>
Cc: Deng, Emily <Emily.Deng at amd.com><mailto:Emily.Deng at amd.com>
Subject: Re: [PATCH] drm/amdgpu: csa_vaddr should not larger than AMDGPU_GMC_HOLE_START

Hi Monk,

ok let me explain a bit more how the hardware works.

The GMC manages a virtual 64bit address space, but only 48bit of that virtual address space are handled by the page table walker.

The 48bits of address space are sign extended, so bit 47 of that are extended into bits 48-63.

This gives us the following memory layout:
0x0
.... virtual address space
0x8000 0000 0000
.... hole
0xFFFF 8000 0000 0000
.... virtual address space
0xFFFF FFFF FFFF FFFF

Trying to access the hole results in a range fault interrupt IIRC.

When doing the VM page table walk the topmost 16bits are ignored, so when programming the page table walker you cut those of and use a linear address again. This is what AMDGPU_GMC_HOLE_MASK is good for.

Now on Vega/Raven/Picasso etc.. (everything with a GFX9) the lower range (0x0-0x8000 0000 0000) is reserved for SVA/ATC use. Since we unfortunately didn't knew that initially we exposed those to older userspace as usable and also put the CSA in there.

The most likely cause of this is that we still have a bug somewhere about this, e.g. not correctly using sign extended addresses *OR* using sign extended addresses where we should use linear instead.

Regards,
Christian.

Am 17.01.19 um 09:04 schrieb Liu, Monk:
Hi Christian

I believe Wentao can fix the issue we it by below step:

  1.  Return Virtual_address_max (UMD use it) to HOLE_START – RESERVED_SIZE
  2.  [optional] Still Keep virtual_address_offset to RESERVED_SIZE (current way, I think it’s because previously we put CSA in 0 --> RESERVED_SIZE space)
  3.  Put CSA in HOLE_START – RESERVED_SIZE  ==> HOLE_START (it’s current design)

I don’t get where above scheme is not correct … can you give more explain for the GMC_HOLE_START ?

e.g.

  1.  why you set GMC_HOLE_START to 0x8’000’0000’0000 (half size of MAX of 48bit address space) ? is it for HSA purpose to make sure GPU address can also be used for CPU address ?
  2.  now MAX_PFN is 1’000’0000’0000, do you need to change GMC_HOLE_START ?

thanks
we need some catch up

/Monk

From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org><mailto:amd-gfx-bounces at lists.freedesktop.org> On Behalf Of Koenig, Christian
Sent: Thursday, January 17, 2019 3:39 PM
To: Lou, Wentao <Wentao.Lou at amd.com><mailto:Wentao.Lou at amd.com>; Liu, Monk <Monk.Liu at amd.com><mailto:Monk.Liu at amd.com>; amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org>; Zhu, Rex <Rex.Zhu at amd.com><mailto:Rex.Zhu at amd.com>
Cc: Deng, Emily <Emily.Deng at amd.com><mailto:Emily.Deng at amd.com>
Subject: Re: [PATCH] drm/amdgpu: csa_vaddr should not larger than AMDGPU_GMC_HOLE_START

Am 17.01.19 um 04:17 schrieb Lou, Wentao:
Hi Christian,

Your solution as:
addr = (max_pfn - (AMDGPU_VA_RESERVED_SIZE >> AMDGPU_PAGE_SHIFT)) << AMDGPU_PAGE_SHIFT;
now max_pfn = 0x10 0000 0000, AMDGPU_VA_RESERVED_SIZE = 0x10 0000, AMDGPU_PAGE_SHIFT = 12
Still got addr = 0xFFFF FFF0 0000, which would cause ring gfx timeout.

But 0xFFFF FFF0 0000 is the correct address, if that is causing a problem then there is a bug somewhere else.

Please try to use AMDGPU_GMC_HOLE_START-AMDGPU_VA_RESERVED_SIZE as well. Does that work?

Before commit 1bf621c42137926ac249af761c0190a9258aa0db, vm_size was 32GB, and csa_addr was under AMDGPU_GMC_HOLE_START.

Wait a second why was the vm_size 32GB? This is on a Vega10 isn't it?

I didn’t understand why csa_addr need to be above AMDGPU_GMC_HOLE_START now.

On Vega10 the lower range, e.g. everything below AMDGPU_GMC_HOLE_START is reserved for SVA.

Regards,
Christian.

Thanks.

BR,
Wentao

From: Koenig, Christian <Christian.Koenig at amd.com><mailto:Christian.Koenig at amd.com>
Sent: Wednesday, January 16, 2019 5:48 PM
To: Lou, Wentao <Wentao.Lou at amd.com><mailto:Wentao.Lou at amd.com>; Liu, Monk <Monk.Liu at amd.com><mailto:Monk.Liu at amd.com>; amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org>; Zhu, Rex <Rex.Zhu at amd.com><mailto:Rex.Zhu at amd.com>
Cc: Deng, Emily <Emily.Deng at amd.com><mailto:Emily.Deng at amd.com>
Subject: Re: [PATCH] drm/amdgpu: csa_vaddr should not larger than AMDGPU_GMC_HOLE_START

Hi Wentao,

well the problem is you don't seem to understand how the hardware works.

See the engines see an MC address space with a hole in the middle, similar to the how x86 64bit CPU address space works. But the page tables are programmed linearly.

So the calculation in amdgpu_driver_open_kms() is correct because it takes the MC address and mages a linear page table index from it again.

The only thing we might need to fix here is shifting max_pfn before the subtraction and I doubt that even that is necessary.

Regards,
Christian.

Am 16.01.19 um 10:34 schrieb Lou, Wentao:

Hi Christian,

Now vm_size was set to 0x4 0000 GB by below commit:

1bf621c42137926ac249af761c0190a9258aa0db drm/amdgpu: Remove unnecessary VM size calculations

So that max_pfn would be 0x10 0000 0000.

amdgpu_csa_vaddr would make max_pfn << 12 to get 0x1 0000 0000 0000, and then minus AMDGPU_VA_RESERVED_SIZE, to get 0xFFFF FFF0 0000

unfortunately this number was between AMDGPU_GMC_HOLE_START and AMDGPU_GMC_HOLE_END, so that amdgpu_gmc_sign_extend was called to make it 0xFFFF FFFF FFF0 0000

in amdgpu_driver_open_kms, extended csa_addr cannot be passed into amdgpu_map_static_csa directly, it would be above the limit of max_pfn.

So that csa_addr was restricted by AMDGPU_GMC_HOLE_MASK to make it possible for amdgpu_vm_alloc_pts.

But this restriction by AMDGPU_GMC_HOLE_MASK would make the address fall back into AMDGPU_GMC_HOLE again,  which causing GPU reset.

We just put amdgpu_csa_vaddr back to AMDGPU_GMC_HOLE_START, to avoid the address touching AMDGPU_GMC_HOLE.

By the way, if max_pfn was shift much to the left, it would always get zero, with or without min(*,*).

BR,

Wentao

-----Original Message-----
From: Koenig, Christian <Christian.Koenig at amd.com><mailto:Christian.Koenig at amd.com>
Sent: Tuesday, January 15, 2019 4:02 PM
To: Liu, Monk <Monk.Liu at amd.com><mailto:Monk.Liu at amd.com>; Lou, Wentao <Wentao.Lou at amd.com><mailto:Wentao.Lou at amd.com>; amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org>; Zhu, Rex <Rex.Zhu at amd.com><mailto:Rex.Zhu at amd.com>
Subject: Re: [PATCH] drm/amdgpu: csa_vaddr should not larger than AMDGPU_GMC_HOLE_START

Am 15.01.19 um 07:19 schrieb Liu, Monk:

> The max_pfn is now 1'0000'0000'0000'0000 (bytes) which is above 48 bit now, and it with AMDGPU_GMC_HOLE_MASK make it to zero ....

>

> And in code "amdgpu_driver_open_kms()" I saw @Zhu, Rex write the code as :

>

> "csa_addr = amdgpu_csa_vadr(adev) & AMDGPU_GMC_HOLE_MASK", I think this is wrong since you intentionally place the csa above GMC hole, right ?

The fix is just completely incorrect since min(adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT, AMDGPU_GMC_HOLE_START) still gives you 0 when we shift max_pfn to much to the left.

The correct solution is to substract the reserved size first and then shift. E.g.:

addr = (max_pfn - (AMDGPU_VA_RESERVED_SIZE >> AMDGPU_PAGE_SHIFT)) << AMDGPU_PAGE_SHIFT;

Regards,

Christian.

>

> Looks like  we should modify this place

>

> /Monk

>

> -----Original Message-----

> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org<mailto:amd-gfx-bounces at lists.freedesktop.org>> On Behalf Of

> Christian K?nig

> Sent: Monday, January 14, 2019 9:05 PM

> To: Lou, Wentao <Wentao.Lou at amd.com<mailto:Wentao.Lou at amd.com>>; amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org>

> Subject: Re: [PATCH] drm/amdgpu: csa_vaddr should not larger than

> AMDGPU_GMC_HOLE_START

>

> Am 14.01.19 um 09:40 schrieb wentalou:

>> After removing unnecessary VM size calculations, vm_manager.max_pfn

>> would reach 0x10,0000,0000 max_pfn << AMDGPU_GPU_PAGE_SHIFT exceeding

>> AMDGPU_GMC_HOLE_START would caused GPU reset.

>>

>> Change-Id: I47ad0be2b0bd9fb7490c4e1d7bb7bdacf71132cb

>> Signed-off-by: wentalou <Wentao.Lou at amd.com<mailto:Wentao.Lou at amd.com>>

> NAK, that is incorrect. We intentionally place the csa above the GMC hole.

>

> Regards,

> Christian.

>

>> ---

>>    drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c | 3 ++-

>>    1 file changed, 2 insertions(+), 1 deletion(-)

>>

>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c

>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c

>> index 7e22be7..dd3bd01 100644

>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c

>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c

>> @@ -26,7 +26,8 @@

>>

>>    uint64_t amdgpu_csa_vaddr(struct amdgpu_device *adev)

>>    {

>> -        uint64_t addr = adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT;

>> +       uint64_t addr = min(adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT,

>> +                                                    AMDGPU_GMC_HOLE_START);

>>

>>          addr -= AMDGPU_VA_RESERVED_SIZE;

>>          addr = amdgpu_gmc_sign_extend(addr);

> _______________________________________________

> amd-gfx mailing list

> amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org>

> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20190118/9ef80595/attachment-0001.html>