[PATCH] drm/amdgpu: csa_vaddr should not larger than AMDGPU_GMC_HOLE_START
Christian König
ckoenig.leichtzumerken at gmail.com
Fri Jan 18 09:11:47 UTC 2019
Hi Monk,
> You see that for UMD, it can use 0 to HOLE_START
Let me say it once more: The UMD nor anybody else CAN'T use 0 to
HOLE_START, that region is reserved for the ATC hardware!
We unfortunately didn't knew that initially and also didn't used the
ATC, so we didn't ran into a problem.
But ROCm now uses the ATC on Raven/Picasso and I have a branch where I
enable it on Vega as well. So when we don't fix that we will run into
problems here.
The ATC isn't usable in combination with SRIOV and I don't think Windows
uses it either, so they probably never ran into any issues.
> Do you mean even UMD should not use virtual address that dropped in
> range from 0 to HOLE_START ?
Yes, exactly! That in combination with ATC use can have quite a bunch of
strange and hard to track down effects because two parts of the driver
are using the same address space.
> That way where should UMD work in ?and I assume our UMD now still
> using this range, this part make me puzzle
At least Mesa now uses the high address space from HOLE_END..0xFFFF FFFF
FFFF FFFF.
Regards,
Christian.
Am 18.01.19 um 02:32 schrieb Liu, Monk:
>
> Thanks Christian,
>
> Questions I have now:
>
> 1. You see that for UMD, it can use 0 to HOLE_START, so why CSA
> cannot use that range although the range is as you said reserved
> to ATC h/w ? Be note that for windows KMD, the CSA is allocated by
> UMD driver so CSA shares the same aperture /space range with other
> UMD BO, which mean CSA in windows also located in ATC range, if
> that’s a problem why windows still works well.
> 1. Can you illustrate this limitation with more details ? we need
> to understand why CSA couldn’t be put in ATC range.
> 2. According to your previous description :” Now on
> Vega/Raven/Picasso etc.. (everything with a GFX9) the lower range
> (0x0-0x8000 0000 0000) is reserved for SVA/ATC use. Since we
> *unfortunately didn't knew that initially we exposed those to
> older user space as usable* and also put the CSA in there.”
> 1. Do you mean even UMD should not use virtual address that
> dropped in range from 0 to HOLE_START ?
>
> that way where should UMD work in ?and I assume our UMD now still
> using this range, this part make me puzzle
>
> /Monk
>
> *From:*amd-gfx <amd-gfx-bounces at lists.freedesktop.org> *On Behalf Of
> *Koenig, Christian
> *Sent:* Thursday, January 17, 2019 9:26 PM
> *To:* Liu, Monk <Monk.Liu at amd.com>; Lou, Wentao <Wentao.Lou at amd.com>;
> amd-gfx at lists.freedesktop.org; Zhu, Rex <Rex.Zhu at amd.com>
> *Cc:* Deng, Emily <Emily.Deng at amd.com>
> *Subject:* Re: [PATCH] drm/amdgpu: csa_vaddr should not larger than
> AMDGPU_GMC_HOLE_START
>
> Hi Monk,
>
>
> Regarding with above sentence, do you mean this range
> (0->HOLE_START) shouldn’t be exposed to user space ? I don’t get
> your point here …
>
> Yes exactly. As I said the problem is that 0->HOLE_START is reserved
> for the ATC hardware, we should not touch it at all.
>
>
> Putting CSA in 0~HOLD_START is the legacy approach we selected for
> a long time since very early stage, how comes that you think it is
> a problem now ?
>
> That turned out to be never a good idea in the first place.
>
> What we could do is reduce the max_pfn for SRIOV because the ATC
> doesn't work in that configuration anyway. But I would only do this as
> last resort.
>
> Any idea why an address above the hole doesn't work with SRIOV? It
> seems to work fine in the bare metal case.
>
> Regards,
> Christian.
>
> Am 17.01.19 um 14:19 schrieb Liu, Monk:
>
> Hi Christian
>
> Thanks for explaining the HOLD for us,
>
> My understanding is we still could put CSA to 0~HOLE_START,
> because we can report UMD the max space is HOLD_START-CSA_SIZE ,
> thus no colliding will hit.
>
> > Now on Vega/Raven/Picasso etc.. (everything with a GFX9) the
> lower range (0x0-0x8000 0000 0000) is reserved for SVA/ATC use.
> Since we unfortunately didn't knew that initially we exposed those
> to older userspace as usable and also put the CSA in there.
>
>
> Regarding with above sentence, do you mean this range
> (0->HOLE_START) shouldn’t be exposed to user space ? I don’t get
> your point here …
>
> Putting CSA in 0~HOLD_START is the legacy approach we selected for
> a long time since very early stage, how comes that you think it is
> a problem now ?
>
> /Monk
>
> *From:*amd-gfx <amd-gfx-bounces at lists.freedesktop.org>
> <mailto:amd-gfx-bounces at lists.freedesktop.org> *On Behalf Of
> *Koenig, Christian
> *Sent:* Thursday, January 17, 2019 4:30 PM
> *To:* Liu, Monk <Monk.Liu at amd.com> <mailto:Monk.Liu at amd.com>; Lou,
> Wentao <Wentao.Lou at amd.com> <mailto:Wentao.Lou at amd.com>;
> amd-gfx at lists.freedesktop.org
> <mailto:amd-gfx at lists.freedesktop.org>; Zhu, Rex <Rex.Zhu at amd.com>
> <mailto:Rex.Zhu at amd.com>
> *Cc:* Deng, Emily <Emily.Deng at amd.com> <mailto:Emily.Deng at amd.com>
> *Subject:* Re: [PATCH] drm/amdgpu: csa_vaddr should not larger
> than AMDGPU_GMC_HOLE_START
>
> Hi Monk,
>
> ok let me explain a bit more how the hardware works.
>
> The GMC manages a virtual 64bit address space, but only 48bit of
> that virtual address space are handled by the page table walker.
>
> The 48bits of address space are sign extended, so bit 47 of that
> are extended into bits 48-63.
>
> This gives us the following memory layout:
> 0x0
> .... virtual address space
> 0x8000 0000 0000
> .... hole
> 0xFFFF 8000 0000 0000
> .... virtual address space
> 0xFFFF FFFF FFFF FFFF
>
> Trying to access the hole results in a range fault interrupt IIRC.
>
> When doing the VM page table walk the topmost 16bits are ignored,
> so when programming the page table walker you cut those of and use
> a linear address again. This is what AMDGPU_GMC_HOLE_MASK is good for.
>
> Now on Vega/Raven/Picasso etc.. (everything with a GFX9) the lower
> range (0x0-0x8000 0000 0000) is reserved for SVA/ATC use. Since we
> unfortunately didn't knew that initially we exposed those to older
> userspace as usable and also put the CSA in there.
>
> The most likely cause of this is that we still have a bug
> somewhere about this, e.g. not correctly using sign extended
> addresses *OR* using sign extended addresses where we should use
> linear instead.
>
> Regards,
> Christian.
>
> Am 17.01.19 um 09:04 schrieb Liu, Monk:
>
> Hi Christian
>
> I believe Wentao can fix the issue we it by below step:
>
> 1. Return *Virtual_address_max* (UMD use it) to HOLE_START –
> RESERVED_SIZE
> 2. [optional] Still Keep virtual_address_offset to
> RESERVED_SIZE (current way, I think it’s because
> previously we put CSA in 0 à RESERVED_SIZE space)
> 3. Put CSA in HOLE_START – RESERVED_SIZE è HOLE_START (it’s
> current design)
>
> I don’t get where above scheme is not correct … can you give
> more explain for the GMC_HOLE_START ?
>
> e.g.
>
> 1. why you set GMC_HOLE_START to 0x8’000’0000’0000 (half size
> of MAX of 48bit address space) ? is it for HSA purpose to
> make sure GPU address can also be used for CPU address ?
> 2. now MAX_PFN is 1’000’0000’0000, do you need to change
> GMC_HOLE_START ?
>
> thanks
>
> we need some catch up
>
> /Monk
>
> *From:*amd-gfx <amd-gfx-bounces at lists.freedesktop.org>
> <mailto:amd-gfx-bounces at lists.freedesktop.org> *On Behalf Of
> *Koenig, Christian
> *Sent:* Thursday, January 17, 2019 3:39 PM
> *To:* Lou, Wentao <Wentao.Lou at amd.com>
> <mailto:Wentao.Lou at amd.com>; Liu, Monk <Monk.Liu at amd.com>
> <mailto:Monk.Liu at amd.com>; amd-gfx at lists.freedesktop.org
> <mailto:amd-gfx at lists.freedesktop.org>; Zhu, Rex
> <Rex.Zhu at amd.com> <mailto:Rex.Zhu at amd.com>
> *Cc:* Deng, Emily <Emily.Deng at amd.com> <mailto:Emily.Deng at amd.com>
> *Subject:* Re: [PATCH] drm/amdgpu: csa_vaddr should not larger
> than AMDGPU_GMC_HOLE_START
>
> Am 17.01.19 um 04:17 schrieb Lou, Wentao:
>
> Hi Christian,
>
> Your solution as:
>
> addr = (max_pfn - (AMDGPU_VA_RESERVED_SIZE >>
> AMDGPU_PAGE_SHIFT)) << AMDGPU_PAGE_SHIFT;
>
> now max_pfn = 0x10 0000 0000, AMDGPU_VA_RESERVED_SIZE =
> 0x10 0000, AMDGPU_PAGE_SHIFT = 12
>
> Still got addr = 0xFFFF FFF0 0000, which would cause ring
> gfx timeout.
>
>
> But 0xFFFF FFF0 0000 is the correct address, if that is
> causing a problem then there is a bug somewhere else.
>
> Please try to use
> AMDGPU_GMC_HOLE_START-AMDGPU_VA_RESERVED_SIZE as well. Does
> that work?
>
>
>
>
> Before commit 1bf621c42137926ac249af761c0190a9258aa0db,
> vm_size was 32GB, and csa_addr was under
> AMDGPU_GMC_HOLE_START.
>
>
> Wait a second why was the vm_size 32GB? This is on a Vega10
> isn't it?
>
>
>
>
> I didn’t understand why csa_addr need to be above
> AMDGPU_GMC_HOLE_START now.
>
>
> On Vega10 the lower range, e.g. everything below
> AMDGPU_GMC_HOLE_START is reserved for SVA.
>
> Regards,
> Christian.
>
>
>
>
> Thanks.
>
> BR,
>
> Wentao
>
> *From:* Koenig, Christian <Christian.Koenig at amd.com>
> <mailto:Christian.Koenig at amd.com>
> *Sent:* Wednesday, January 16, 2019 5:48 PM
> *To:* Lou, Wentao <Wentao.Lou at amd.com>
> <mailto:Wentao.Lou at amd.com>; Liu, Monk <Monk.Liu at amd.com>
> <mailto:Monk.Liu at amd.com>; amd-gfx at lists.freedesktop.org
> <mailto:amd-gfx at lists.freedesktop.org>; Zhu, Rex
> <Rex.Zhu at amd.com> <mailto:Rex.Zhu at amd.com>
> *Cc:* Deng, Emily <Emily.Deng at amd.com>
> <mailto:Emily.Deng at amd.com>
> *Subject:* Re: [PATCH] drm/amdgpu: csa_vaddr should not
> larger than AMDGPU_GMC_HOLE_START
>
> Hi Wentao,
>
> well the problem is you don't seem to understand how the
> hardware works.
>
> See the engines see an MC address space with a hole in the
> middle, similar to the how x86 64bit CPU address space
> works. But the page tables are programmed linearly.
>
> So the calculation in amdgpu_driver_open_kms() is correct
> because it takes the MC address and mages a linear page
> table index from it again.
>
> The only thing we might need to fix here is shifting
> max_pfn before the subtraction and I doubt that even that
> is necessary.
>
> Regards,
> Christian.
>
> Am 16.01.19 um 10:34 schrieb Lou, Wentao:
>
> Hi Christian,
>
> Now vm_size was set to 0x4 0000 GB by below commit:
>
> 1bf621c42137926ac249af761c0190a9258aa0db drm/amdgpu:
> Remove unnecessary VM size calculations
>
> So that max_pfn would be 0x10 0000 0000.
>
> amdgpu_csa_vaddr would make max_pfn << 12 to get 0x1
> 0000 0000 0000, and then minus
> AMDGPU_VA_RESERVED_SIZE, to get 0xFFFF FFF0 0000
>
> unfortunately this number was between
> AMDGPU_GMC_HOLE_START and AMDGPU_GMC_HOLE_END, so that
> amdgpu_gmc_sign_extend was called to make it 0xFFFF
> FFFF FFF0 0000
>
> in amdgpu_driver_open_kms, extended csa_addr cannot be
> passed into amdgpu_map_static_csa directly, it would
> be above the limit of max_pfn.
>
> So that csa_addr was restricted by
> AMDGPU_GMC_HOLE_MASK to make it possible for
> amdgpu_vm_alloc_pts.
>
> But this restriction by AMDGPU_GMC_HOLE_MASK would
> make the address fall back into AMDGPU_GMC_HOLE again,
> which causing GPU reset.
>
> We just put amdgpu_csa_vaddr back to
> AMDGPU_GMC_HOLE_START, to avoid the address touching
> AMDGPU_GMC_HOLE.
>
> By the way, if max_pfn was shift much to the left, it
> would always get zero, with or without min(*,*).
>
> BR,
>
> Wentao
>
> -----Original Message-----
> From: Koenig, Christian <Christian.Koenig at amd.com>
> <mailto:Christian.Koenig at amd.com>
> Sent: Tuesday, January 15, 2019 4:02 PM
> To: Liu, Monk <Monk.Liu at amd.com>
> <mailto:Monk.Liu at amd.com>; Lou, Wentao
> <Wentao.Lou at amd.com> <mailto:Wentao.Lou at amd.com>;
> amd-gfx at lists.freedesktop.org
> <mailto:amd-gfx at lists.freedesktop.org>; Zhu, Rex
> <Rex.Zhu at amd.com> <mailto:Rex.Zhu at amd.com>
> Subject: Re: [PATCH] drm/amdgpu: csa_vaddr should not
> larger than AMDGPU_GMC_HOLE_START
>
> Am 15.01.19 um 07:19 schrieb Liu, Monk:
>
> > The max_pfn is now 1'0000'0000'0000'0000 (bytes)
> which is above 48 bit now, and it with
> AMDGPU_GMC_HOLE_MASK make it to zero ....
>
> >
>
> > And in code "amdgpu_driver_open_kms()" I saw @Zhu,
> Rex write the code as :
>
> >
>
> > "csa_addr = amdgpu_csa_vadr(adev) &
> AMDGPU_GMC_HOLE_MASK", I think this is wrong since you
> intentionally place the csa above GMC hole, right ?
>
> The fix is just completely incorrect since
> min(adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT,
> AMDGPU_GMC_HOLE_START) still gives you 0 when we shift
> max_pfn to much to the left.
>
> The correct solution is to substract the reserved size
> first and then shift. E.g.:
>
> addr = (max_pfn - (AMDGPU_VA_RESERVED_SIZE >>
> AMDGPU_PAGE_SHIFT)) << AMDGPU_PAGE_SHIFT;
>
> Regards,
>
> Christian.
>
> >
>
> > Looks like we should modify this place
>
> >
>
> > /Monk
>
> >
>
> > -----Original Message-----
>
> > From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org
> <mailto:amd-gfx-bounces at lists.freedesktop.org>> On
> Behalf Of
>
> > Christian K?nig
>
> > Sent: Monday, January 14, 2019 9:05 PM
>
> > To: Lou, Wentao <Wentao.Lou at amd.com
> <mailto:Wentao.Lou at amd.com>>;
> amd-gfx at lists.freedesktop.org
> <mailto:amd-gfx at lists.freedesktop.org>
>
> > Subject: Re: [PATCH] drm/amdgpu: csa_vaddr should
> not larger than
>
> > AMDGPU_GMC_HOLE_START
>
> >
>
> > Am 14.01.19 um 09:40 schrieb wentalou:
>
> >> After removing unnecessary VM size calculations,
> vm_manager.max_pfn
>
> >> would reach 0x10,0000,0000 max_pfn <<
> AMDGPU_GPU_PAGE_SHIFT exceeding
>
> >> AMDGPU_GMC_HOLE_START would caused GPU reset.
>
> >>
>
> >> Change-Id: I47ad0be2b0bd9fb7490c4e1d7bb7bdacf71132cb
>
> >> Signed-off-by: wentalou <Wentao.Lou at amd.com
> <mailto:Wentao.Lou at amd.com>>
>
> > NAK, that is incorrect. We intentionally place the
> csa above the GMC hole.
>
> >
>
> > Regards,
>
> > Christian.
>
> >
>
> >> ---
>
> >> drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c | 3 ++-
>
> >> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> >>
>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
>
> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
>
> >> index 7e22be7..dd3bd01 100644
>
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
>
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c
>
> >> @@ -26,7 +26,8 @@
>
> >>
>
> >> uint64_t amdgpu_csa_vaddr(struct amdgpu_device
> *adev)
>
> >> {
>
> >> - uint64_t addr = adev->vm_manager.max_pfn
> << AMDGPU_GPU_PAGE_SHIFT;
>
> >> + uint64_t addr =
> min(adev->vm_manager.max_pfn << AMDGPU_GPU_PAGE_SHIFT,
>
> >> + AMDGPU_GMC_HOLE_START);
>
> >>
>
> >> addr -= AMDGPU_VA_RESERVED_SIZE;
>
> >> addr = amdgpu_gmc_sign_extend(addr);
>
> > _______________________________________________
>
> > amd-gfx mailing list
>
> > amd-gfx at lists.freedesktop.org
> <mailto:amd-gfx at lists.freedesktop.org>
>
> > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20190118/a498957d/attachment-0001.html>
More information about the amd-gfx
mailing list