[PATCH] drm/amdgpu: csa_vaddr should not larger than AMDGPU_GMC_HOLE_START

Fri Jan 18 09:11:47 UTC 2019

Hi Monk,

> You see that for UMD, it can use 0 to HOLE_START
Let me say it once more: The UMD nor anybody else CAN'T use 0 to 
HOLE_START, that region is reserved for the ATC hardware!

We unfortunately didn't knew that initially and also didn't used the 
ATC, so we didn't ran into a problem.

But ROCm now uses the ATC on Raven/Picasso and I have a branch where I 
enable it on Vega as well. So when we don't fix that we will run into 
problems here.

The ATC isn't usable in combination with SRIOV and I don't think Windows 
uses it either, so they probably never ran into any issues.

> Do you mean even UMD should  not use virtual address that dropped in 
> range from 0 to HOLE_START ?
Yes, exactly! That in combination with ATC use can have quite a bunch of 
strange and hard to track down effects because two parts of the driver 
are using the same address space.

> That way where should UMD work in ?and I assume our UMD now still 
> using this range, this part make me puzzle 
At least Mesa now uses the high address space from HOLE_END..0xFFFF FFFF 
FFFF FFFF.

Regards,
Christian.

Am 18.01.19 um 02:32 schrieb Liu, Monk:
>
> Thanks Christian,
>
> Questions I have now:
>
>  1. You see that for UMD, it can use 0 to HOLE_START, so why CSA
>     cannot use that range although the range is as you said reserved
>     to ATC h/w ? Be note that for windows KMD, the CSA is allocated by
>     UMD driver so CSA shares the same aperture /space range with other
>     UMD BO, which mean CSA in windows also located in ATC range, if
>     that’s a problem why windows still works well.
>      1. Can you illustrate this limitation with more details ? we need
>         to understand why CSA couldn’t be put in ATC range.
>  2. According to your previous description :” Now on
>     Vega/Raven/Picasso etc.. (everything with a GFX9) the lower range
>     (0x0-0x8000 0000 0000) is reserved for SVA/ATC use. Since we
>     *unfortunately didn't knew that initially we exposed those to
>     older user space as usable* and also put the CSA in there.”
>      1. Do you mean even UMD should not use virtual address that
>         dropped in range from 0 to HOLE_START ?
>
> that way where should UMD work in ?and I assume our UMD now still 
> using this range, this part make me puzzle
>
> /Monk
>
> *From:*amd-gfx <amd-gfx-bounces at lists.freedesktop.org> *On Behalf Of 
> *Koenig, Christian
> *Sent:* Thursday, January 17, 2019 9:26 PM
> *To:* Liu, Monk <Monk.Liu at amd.com>; Lou, Wentao <Wentao.Lou at amd.com>; 
> amd-gfx at lists.freedesktop.org; Zhu, Rex <Rex.Zhu at amd.com>
> *Cc:* Deng, Emily <Emily.Deng at amd.com>
> *Subject:* Re: [PATCH] drm/amdgpu: csa_vaddr should not larger than 
> AMDGPU_GMC_HOLE_START
>
> Hi Monk,
>
>
>     Regarding with above sentence, do you mean this range
>     (0->HOLE_START) shouldn’t be exposed to user space ? I don’t get
>     your point here …
>
> Yes exactly. As I said the problem is that 0->HOLE_START is reserved 
> for the ATC hardware, we should not touch it at all.
>
>
>     Putting CSA in 0~HOLD_START is the legacy approach we selected for
>     a long time since very early stage, how comes that you think it is
>     a problem now ?
>
> That turned out to be never a good idea in the first place.
>
> What we could do is reduce the max_pfn for SRIOV because the ATC 
> doesn't work in that configuration anyway. But I would only do this as 
> last resort.
>
> Any idea why an address above the hole doesn't work with SRIOV? It 
> seems to work fine in the bare metal case.
>
> Regards,
> Christian.
>
> Am 17.01.19 um 14:19 schrieb Liu, Monk:
>
>     Hi Christian
>
>     Thanks for explaining the HOLD for us,
>
>     My understanding is we still could put CSA to 0~HOLE_START,
>     because we can report UMD the max space is HOLD_START-CSA_SIZE ,
>     thus no colliding will hit.
>
>     > Now on Vega/Raven/Picasso etc.. (everything with a GFX9) the
>     lower range (0x0-0x8000 0000 0000) is reserved for SVA/ATC use.
>     Since we unfortunately didn't knew that initially we exposed those
>     to older userspace as usable and also put the CSA in there.
>
>
>     Regarding with above sentence, do you mean this range
>     (0->HOLE_START) shouldn’t be exposed to user space ? I don’t get
>     your point here …
>
>     Putting CSA in 0~HOLD_START is the legacy approach we selected for
>     a long time since very early stage, how comes that you think it is
>     a problem now ?
>
>     /Monk
>
>     *From:*amd-gfx <amd-gfx-bounces at lists.freedesktop.org>
>     <mailto:amd-gfx-bounces at lists.freedesktop.org> *On Behalf Of
>     *Koenig, Christian
>     *Sent:* Thursday, January 17, 2019 4:30 PM
>     *To:* Liu, Monk <Monk.Liu at amd.com> <mailto:Monk.Liu at amd.com>; Lou,
>     Wentao <Wentao.Lou at amd.com> <mailto:Wentao.Lou at amd.com>;
>     amd-gfx at lists.freedesktop.org
>     <mailto:amd-gfx at lists.freedesktop.org>; Zhu, Rex <Rex.Zhu at amd.com>
>     <mailto:Rex.Zhu at amd.com>
>     *Cc:* Deng, Emily <Emily.Deng at amd.com> <mailto:Emily.Deng at amd.com>
>     *Subject:* Re: [PATCH] drm/amdgpu: csa_vaddr should not larger
>     than AMDGPU_GMC_HOLE_START
>
>     Hi Monk,
>
>     ok let me explain a bit more how the hardware works.
>
>     The GMC manages a virtual 64bit address space, but only 48bit of
>     that virtual address space are handled by the page table walker.
>
>     The 48bits of address space are sign extended, so bit 47 of that
>     are extended into bits 48-63.
>
>     This gives us the following memory layout:
>     0x0
>     .... virtual address space
>     0x8000 0000 0000
>     .... hole
>     0xFFFF 8000 0000 0000
>     .... virtual address space
>     0xFFFF FFFF FFFF FFFF
>
>     Trying to access the hole results in a range fault interrupt IIRC.
>
>     When doing the VM page table walk the topmost 16bits are ignored,
>     so when programming the page table walker you cut those of and use
>     a linear address again. This is what AMDGPU_GMC_HOLE_MASK is good for.
>
>     Now on Vega/Raven/Picasso etc.. (everything with a GFX9) the lower
>     range (0x0-0x8000 0000 0000) is reserved for SVA/ATC use. Since we
>     unfortunately didn't knew that initially we exposed those to older
>     userspace as usable and also put the CSA in there.
>
>     The most likely cause of this is that we still have a bug
>     somewhere about this, e.g. not correctly using sign extended
>     addresses *OR* using sign extended addresses where we should use
>     linear instead.
>
>     Regards,
>     Christian.
>
>     Am 17.01.19 um 09:04 schrieb Liu, Monk:
>
>         Hi Christian
>
>         I believe Wentao can fix the issue we it by below step:
>
>          1. Return *Virtual_address_max* (UMD use it) to HOLE_START –
>             RESERVED_SIZE
>          2. [optional] Still Keep virtual_address_offset to
>             RESERVED_SIZE (current way, I think it’s because
>             previously we put CSA in 0 à RESERVED_SIZE space)
>          3. Put CSA in HOLE_START – RESERVED_SIZE è HOLE_START (it’s
>             current design)
>
>         I don’t get where above scheme is not correct … can you give
>         more explain for the GMC_HOLE_START ?
>
>         e.g.
>
>          1. why you set GMC_HOLE_START to 0x8’000’0000’0000 (half size
>             of MAX of 48bit address space) ? is it for HSA purpose to
>             make sure GPU address can also be used for CPU address ?
>          2. now MAX_PFN is 1’000’0000’0000, do you need to change
>             GMC_HOLE_START ?
>
>         thanks
>
>         we need some catch up
>
>         /Monk
>
>         *From:*amd-gfx <amd-gfx-bounces at lists.freedesktop.org>
>         <mailto:amd-gfx-bounces at lists.freedesktop.org> *On Behalf Of
>         *Koenig, Christian
>         *Sent:* Thursday, January 17, 2019 3:39 PM
>         *To:* Lou, Wentao <Wentao.Lou at amd.com>
>         <mailto:Wentao.Lou at amd.com>; Liu, Monk <Monk.Liu at amd.com>
>         <mailto:Monk.Liu at amd.com>; amd-gfx at lists.freedesktop.org
>         <mailto:amd-gfx at lists.freedesktop.org>; Zhu, Rex
>         <Rex.Zhu at amd.com> <mailto:Rex.Zhu at amd.com>
>         *Cc:* Deng, Emily <Emily.Deng at amd.com> <mailto:Emily.Deng at amd.com>
>         *Subject:* Re: [PATCH] drm/amdgpu: csa_vaddr should not larger
>         than AMDGPU_GMC_HOLE_START
>
>         Am 17.01.19 um 04:17 schrieb Lou, Wentao:
>
>             Hi Christian,
>
>             Your solution as:
>
>             addr = (max_pfn - (AMDGPU_VA_RESERVED_SIZE >>
>             AMDGPU_PAGE_SHIFT)) << AMDGPU_PAGE_SHIFT;
>
>             now max_pfn = 0x10 0000 0000, AMDGPU_VA_RESERVED_SIZE =
>             0x10 0000, AMDGPU_PAGE_SHIFT = 12
>
>             Still got addr = 0xFFFF FFF0 0000, which would cause ring
>             gfx timeout.
>
>
>         But 0xFFFF FFF0 0000 is the correct address, if that is
>         causing a problem then there is a bug somewhere else.
>
>         Please try to use
>         AMDGPU_GMC_HOLE_START-AMDGPU_VA_RESERVED_SIZE as well. Does
>         that work?
>
>
>
>
>             Before commit 1bf621c42137926ac249af761c0190a9258aa0db,
>             vm_size was 32GB, and csa_addr was under
>             AMDGPU_GMC_HOLE_START.
>
>
>         Wait a second why was the vm_size 32GB? This is on a Vega10
>         isn't it?
>
>
>
>
>             I didn’t understand why csa_addr need to be above
>             AMDGPU_GMC_HOLE_START now.
>
>
>         On Vega10 the lower range, e.g. everything below
>         AMDGPU_GMC_HOLE_START is reserved for SVA.
>
>         Regards,
>         Christian.
>
>
>
>
>             Thanks.
>
>             BR,
>
>             Wentao
>
>             *From:* Koenig, Christian <Christian.Koenig at amd.com>
>             <mailto:Christian.Koenig at amd.com>
>             *Sent:* Wednesday, January 16, 2019 5:48 PM
>             *To:* Lou, Wentao <Wentao.Lou at amd.com>
>             <mailto:Wentao.Lou at amd.com>; Liu, Monk <Monk.Liu at amd.com>
>             <mailto:Monk.Liu at amd.com>; amd-gfx at lists.freedesktop.org
>             <mailto:amd-gfx at lists.freedesktop.org>; Zhu, Rex
>             <Rex.Zhu at amd.com> <mailto:Rex.Zhu at amd.com>
>             *Cc:* Deng, Emily <Emily.Deng at amd.com>
>             <mailto:Emily.Deng at amd.com>
>             *Subject:* Re: [PATCH] drm/amdgpu: csa_vaddr should not
>             larger than AMDGPU_GMC_HOLE_START
>
>             Hi Wentao,
>
>             well the problem is you don't seem to understand how the
>             hardware works.
>
>             See the engines see an MC address space with a hole in the
>             middle, similar to the how x86 64bit CPU address space
>             works. But the page tables are programmed linearly.
>
>             So the calculation in amdgpu_driver_open_kms() is correct
>             because it takes the MC address and mages a linear page
>             table index from it again.
>
>             The only thing we might need to fix here is shifting
>             max_pfn before the subtraction and I doubt that even that
>             is necessary.
>
>             Regards,
>             Christian.
>
>             Am 16.01.19 um 10:34 schrieb Lou, Wentao:
>
>                 Hi Christian,
>
>                 Now vm_size was set to 0x4 0000 GB by below commit:
>
>                 1bf621c42137926ac249af761c0190a9258aa0db drm/amdgpu:
>                 Remove unnecessary VM size calculations
>
>                 So that max_pfn would be 0x10 0000 0000.
>
>                 amdgpu_csa_vaddr would make max_pfn << 12 to get 0x1
>                 0000 0000 0000, and then minus
>                 AMDGPU_VA_RESERVED_SIZE, to get 0xFFFF FFF0 0000
>
>                 unfortunately this number was between
>                 AMDGPU_GMC_HOLE_START and AMDGPU_GMC_HOLE_END, so that
>                 amdgpu_gmc_sign_extend was called to make it 0xFFFF
>                 FFFF FFF0 0000
>
>                 in amdgpu_driver_open_kms, extended csa_addr cannot be
>                 passed into amdgpu_map_static_csa directly, it would
>                 be above the limit of max_pfn.
>
>                 So that csa_addr was restricted by
>                 AMDGPU_GMC_HOLE_MASK to make it possible for
>                 amdgpu_vm_alloc_pts.
>
>                 But this restriction by AMDGPU_GMC_HOLE_MASK would
>                 make the address fall back into AMDGPU_GMC_HOLE again,
>                  which causing GPU reset.
>
>                 We just put amdgpu_csa_vaddr back to
>                 AMDGPU_GMC_HOLE_START, to avoid the address touching
>                 AMDGPU_GMC_HOLE.
>
>                 By the way, if max_pfn was shift much to the left, it
>                 would always get zero, with or without min(*,*).
>
>                 BR,
>
>                 Wentao
>
>                 -----Original Message-----
>                 From: Koenig, Christian <Christian.Koenig at amd.com>
>                 <mailto:Christian.Koenig at amd.com>
>                 Sent: Tuesday, January 15, 2019 4:02 PM
>                 To: Liu, Monk <Monk.Liu at amd.com>
>                 <mailto:Monk.Liu at amd.com>; Lou, Wentao
>                 <Wentao.Lou at amd.com> <mailto:Wentao.Lou at amd.com>;
>                 amd-gfx at lists.freedesktop.org
>                 <mailto:amd-gfx at lists.freedesktop.org>; Zhu, Rex
>                 <Rex.Zhu at amd.com> <mailto:Rex.Zhu at amd.com>
>                 Subject: Re: [PATCH] drm/amdgpu: csa_vaddr should not
>                 larger than AMDGPU_GMC_HOLE_START
>
>                 Am 15.01.19 um 07:19 schrieb Liu, Monk:
>
>                 > The max_pfn is now 1'0000'0000'0000'0000 (bytes)
>                 which is above 48 bit now, and it with
>                 AMDGPU_GMC_HOLE_MASK make it to zero ....
>
>                 >
>
>                 > And in code "amdgpu_driver_open_kms()" I saw @Zhu,
>                 Rex write the code as :
>
>                 >
>
>                 > "csa_addr = amdgpu_csa_vadr(adev) &
>                 AMDGPU_GMC_HOLE_MASK", I think this is wrong since you
>                 intentionally place the csa above GMC hole, right ?
>
>                 The fix is just completely incorrect since
>                 min(adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT,
>                 AMDGPU_GMC_HOLE_START) still gives you 0 when we shift
>                 max_pfn to much to the left.
>
>                 The correct solution is to substract the reserved size
>                 first and then shift. E.g.:
>
>                 addr = (max_pfn - (AMDGPU_VA_RESERVED_SIZE >>
>                 AMDGPU_PAGE_SHIFT)) << AMDGPU_PAGE_SHIFT;
>
>                 Regards,
>
>                 Christian.
>
>                 >
>
>                 > Looks like  we should modify this place
>
>                 >
>
>                 > /Monk
>
>                 >
>
>                 > -----Original Message-----
>
>                 > From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org
>                 <mailto:amd-gfx-bounces at lists.freedesktop.org>> On
>                 Behalf Of
>
>                 > Christian K?nig
>
>                 > Sent: Monday, January 14, 2019 9:05 PM
>
>                 > To: Lou, Wentao <Wentao.Lou at amd.com
>                 <mailto:Wentao.Lou at amd.com>>;
>                 amd-gfx at lists.freedesktop.org
>                 <mailto:amd-gfx at lists.freedesktop.org>
>
>                 > Subject: Re: [PATCH] drm/amdgpu: csa_vaddr should
>                 not larger than
>
>                 > AMDGPU_GMC_HOLE_START
>
>                 >
>
>                 > Am 14.01.19 um 09:40 schrieb wentalou:
>
>                 >> After removing unnecessary VM size calculations,
>                 vm_manager.max_pfn
>
>                 >> would reach 0x10,0000,0000 max_pfn <<
>                 AMDGPU_GPU_PAGE_SHIFT exceeding
>
>                 >> AMDGPU_GMC_HOLE_START would caused GPU reset.
>
>                 >>
>
>                 >> Change-Id: I47ad0be2b0bd9fb7490c4e1d7bb7bdacf71132cb
>
>                 >> Signed-off-by: wentalou <Wentao.Lou at amd.com
>                 <mailto:Wentao.Lou at amd.com>>
>
>                 > NAK, that is incorrect. We intentionally place the
>                 csa above the GMC hole.
>
>                 >
>
>                 > Regards,
>
>                 > Christian.
>
>                 >
>
>                 >> ---
>
>                 >> drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c | 3 ++-
>
>                 >>    1 file changed, 2 insertions(+), 1 deletion(-)
>
>                 >>
>
>                 >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
>
>                 >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
>
>                 >> index 7e22be7..dd3bd01 100644
>
>                 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
>
>                 >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
>
>                 >> @@ -26,7 +26,8 @@
>
>                 >>
>
>                 >>    uint64_t amdgpu_csa_vaddr(struct amdgpu_device
>                 *adev)
>
>                 >>    {
>
>                 >> -        uint64_t addr = adev->vm_manager.max_pfn
>                 << AMDGPU_GPU_PAGE_SHIFT;
>
>                 >> +       uint64_t addr =
>                 min(adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT,
>
>                 >> + AMDGPU_GMC_HOLE_START);
>
>                 >>
>
>                 >>          addr -= AMDGPU_VA_RESERVED_SIZE;
>
>                 >>          addr = amdgpu_gmc_sign_extend(addr);
>
>                 > _______________________________________________
>
>                 > amd-gfx mailing list
>
>                 > amd-gfx at lists.freedesktop.org
>                 <mailto:amd-gfx at lists.freedesktop.org>
>
>                 > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20190118/a498957d/attachment-0001.html>