Re: 答复: [V3 04/11] drm/amdgpu/virt: use kiq to access registers

Christian König deathsimple at vodafone.de
Thu Jan 12 11:13:36 UTC 2017


I would go a step further and put the BUG_ON(in_interrupt()) into 
amdgpu_virt_kiq_rreg().

And BTW the problem is even worse, you also can't lock a mutex inside an 
atomic section and we could have plenty of that as well in the driver code.

Regards,
Christian.

Am 12.01.2017 um 04:21 schrieb Liu, Monk:
>
> Xiangliang
>
> please BUG() when register access occured in RUNTIME and IRQ context, 
> e.g.:
>
> for register read:
> if (amdgpu_sriov_runtime(adev)) {
>    if (in_interrupt())
>         BUG();
>    else
> return amdgpu_virt_kiq_rreg(adev, reg, v);
> }
>
> and also for register write, with above addressed, Reviewed-by: Monk 
> Liu <monk.liu at amd.com>
>
> ------------------------------------------------------------------------
> *发件人:* Liu, Monk
> *发送时间:* 2017年1月12日 11:19:40
> *收件人:* Christian König; Yu, Xiangliang; amd-gfx at lists.freedesktop.org
> *主题:* 答复: [V3 04/11] drm/amdgpu/virt: use kiq to access registers
>
> Xiangliang
>
>
> please BUG() when register access occured in RUNTIME and IRQ context, 
> e.g.:
>
>
> if (amdgpu_sriov_runtime(adev)) {
>
>
> }
>   return amdgpu_virt_kiq_wreg(adev, reg, v);
>
>
> with above addressed, Reviewed-by: Monk Liu <monk.liu at amd.com>
>
> ------------------------------------------------------------------------
> *发件人:* amd-gfx <amd-gfx-bounces at lists.freedesktop.org> 代表 Liu, Monk 
> <Monk.Liu at amd.com>
> *发送时间:* 2017年1月11日 22:54:52
> *收件人:* Christian König; Yu, Xiangliang; amd-gfx at lists.freedesktop.org
> *主题:* RE: [V3 04/11] drm/amdgpu/virt: use kiq to access registers
> We should BUG and not access register at all if found during RUNTIME 
> && IRQ_context
> By far.
>
> BR Monk
> -----Original Message-----
> From: Christian König [mailto:deathsimple at vodafone.de]
> Sent: Wednesday, January 11, 2017 10:44 PM
> To: Liu, Monk <Monk.Liu at amd.com>; Yu, Xiangliang 
> <Xiangliang.Yu at amd.com>; amd-gfx at lists.freedesktop.org
> Subject: Re: [V3 04/11] drm/amdgpu/virt: use kiq to access registers
>
> > do you think that's okay ?
> Fine with me, but I'm not sure if the hypervisor doesn't gets a hickup 
> when a client writes to some registers from interrupt context.
>
> Christian.
>
> Am 11.01.2017 um 15:38 schrieb Liu, Monk:
> > Yeah, make sense
> >
> > I think before we have a full version of KIQ reg accessing without 
> context switch, we should totally disable KIQ reg read/write in IRQ 
> context, do you think that's okay ?
> >
> > BR Monk
> >
> > -----Original Message-----
> > From: Christian König [mailto:deathsimple at vodafone.de]
> > Sent: Wednesday, January 11, 2017 10:16 PM
> > To: Liu, Monk <Monk.Liu at amd.com>; Yu, Xiangliang
> > <Xiangliang.Yu at amd.com>; amd-gfx at lists.freedesktop.org
> > Subject: Re: [V3 04/11] drm/amdgpu/virt: use kiq to access registers
> >
> >> Because if we are in interrupt , we are forbid to do schedule, and 
> use kiq to read register will invoke fence_wait() ...
> > That won't work. Locking a mutext like it is done in the write path 
> can cause scheduling as well.
> >
> > If we need to push writes through the KIQ in interrupt context we 
> need to change the code to use a spinlock with disabled interrupts 
> instead of a mutex.
> >
> > Otherwise the whole thing can easily deadlock when an interrupt 
> happens to come while the lock is taken.
> >
> > Please also test the patchset with lockdep enabled, that should 
> yield a bunch of error messages when you try to do something like this.
> >
> > Regards,
> > Christian.
> >
> > Am 11.01.2017 um 14:51 schrieb Liu, Monk:
> >> Because if we are in interrupt , we are forbid to do schedule, and 
> use kiq to read register will invoke fence_wait() ...
> >>
> >> If you think that's odds, we can use BUG_ON() to stop driver 
> running if we need read registers in SRIOV case.
> >>
> >> And to fully support register reading, we need to implement a new
> >> method to use KIQ access register without any schedule() chance, but
> >> that's overhead for current stage, and we don't observe registers
> >> reading in IRQ context now on 3D apps. (kfd observed such case, and
> >> @Shaoyun dispatches a kernel thread/tasklet in IRQ to do the register
> >> dumping, instead of directly reading them via KIQ )
> >>
> >> BR Monk
> >>
> >> -----Original Message-----
> >> From: Christian König [mailto:deathsimple at vodafone.de]
> >> Sent: Wednesday, January 11, 2017 9:38 PM
> >> To: Yu, Xiangliang <Xiangliang.Yu at amd.com>;
> >> amd-gfx at lists.freedesktop.org
> >> Cc: Liu, Monk <Monk.Liu at amd.com>
> >> Subject: Re: [V3 04/11] drm/amdgpu/virt: use kiq to access registers
> >>
> >> Am 11.01.2017 um 14:18 schrieb Xiangliang Yu:
> >>> For virtualization, it is must for driver to use KIQ to access
> >>> registers when it is out of GPU full access mode.
> >>>
> >>> Signed-off-by: Xiangliang Yu <Xiangliang.Yu at amd.com>
> >>> Signed-off-by: Monk Liu <Monk.Liu at amd.com>
> >>> ---
> >>> drivers/gpu/drm/amd/amdgpu/Makefile        |  2 +-
> >>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  6 +++
> >>> drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c   | 82 
> ++++++++++++++++++++++++++++++
> >>> drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h   |  5 ++
> >>> drivers/gpu/drm/amd/amdgpu/vi.c            |  3 ++
> >>>     5 files changed, 97 insertions(+), 1 deletion(-)
> >>>     create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile
> >>> b/drivers/gpu/drm/amd/amdgpu/Makefile
> >>> index 4185b03..0b8e470 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
> >>> @@ -30,7 +30,7 @@ amdgpu-y += amdgpu_device.o amdgpu_kms.o \
> >>>      atombios_encoders.o amdgpu_sa.o atombios_i2c.o \
> >>>      amdgpu_prime.o amdgpu_vm.o amdgpu_ib.o amdgpu_pll.o \
> >>>      amdgpu_ucode.o amdgpu_bo_list.o amdgpu_ctx.o amdgpu_sync.o \
> >>> -   amdgpu_gtt_mgr.o amdgpu_vram_mgr.o
> >>> +   amdgpu_gtt_mgr.o amdgpu_vram_mgr.o amdgpu_virt.o
> >>>
> >>>     # add asic specific block
> >>>     amdgpu-$(CONFIG_DRM_AMDGPU_CIK)+= cik.o cik_ih.o kv_smc.o
> >>> kv_dpm.o \ diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >>> index f82919d..9a2fd3e 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> >>> @@ -95,6 +95,9 @@ uint32_t amdgpu_mm_rreg(struct amdgpu_device 
> *adev, uint32_t reg,
> >>>     {
> >>>      uint32_t ret;
> >>>
> >>> +   if (amdgpu_sriov_runtime(adev) && !in_interrupt())
> >>> +           return amdgpu_virt_kiq_rreg(adev, reg);
> >>> +
> >> Mhm, why is the in_interrupt() necessary here?
> >>
> >>>      if ((reg * 4) < adev->rmmio_size && !always_indirect)
> >>>              ret = readl(((void __iomem *)adev->rmmio) + (reg * 4));
> >>>      else {
> >>> @@ -114,6 +117,9 @@ void amdgpu_mm_wreg(struct amdgpu_device 
> *adev, uint32_t reg, uint32_t v,
> >>>     {
> >>> trace_amdgpu_mm_wreg(adev->pdev->device, reg, v);
> >>>
> >>> +   if (amdgpu_sriov_runtime(adev))
> >>> +           return amdgpu_virt_kiq_wreg(adev, reg, v);
> >>> +
> >> And why it isn't necessary here? What happens when we write a 
> register from an interrupt routine?
> >>
> >> Christian.
> >>
> >>>      if ((reg * 4) < adev->rmmio_size && !always_indirect)
> >>>              writel(v, ((void __iomem *)adev->rmmio) + (reg * 4));
> >>>      else {
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> >>> new file mode 100644
> >>> index 0000000..7861b6b
> >>> --- /dev/null
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> >>> @@ -0,0 +1,82 @@
> >>> +/*
> >>> + * Copyright 2017 Advanced Micro Devices, Inc.
> >>> + *
> >>> + * Permission is hereby granted, free of charge, to any person
> >>> +obtaining a
> >>> + * copy of this software and associated documentation files (the
> >>> +"Software"),
> >>> + * to deal in the Software without restriction, including without
> >>> +limitation
> >>> + * the rights to use, copy, modify, merge, publish, distribute,
> >>> +sublicense,
> >>> + * and/or sell copies of the Software, and to permit persons to
> >>> +whom the
> >>> + * Software is furnished to do so, subject to the following 
> conditions:
> >>> + *
> >>> + * The above copyright notice and this permission notice shall be
> >>> +included in
> >>> + * all copies or substantial portions of the Software.
> >>> + *
> >>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> >>> +EXPRESS OR
> >>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> >>> +MERCHANTABILITY,
> >>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO
> >>> +EVENT SHALL
> >>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM,
> >>> +DAMAGES OR
> >>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
> >>> +OTHERWISE,
> >>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
> >>> +USE OR
> >>> + * OTHER DEALINGS IN THE SOFTWARE.
> >>> + */
> >>> +
> >>> +#include "amdgpu.h"
> >>> +#include "amdgpu_virt.h"
> >>> +
> >>> +void amdgpu_virt_init_setting(struct amdgpu_device *adev) {
> >>> +   mutex_init(&adev->virt.lock);
> >>> +}
> >>> +
> >>> +uint32_t amdgpu_virt_kiq_rreg(struct amdgpu_device *adev, uint32_t
> >>> +reg) {
> >>> +   signed long r;
> >>> +   uint32_t val;
> >>> +   struct fence *f;
> >>> +   struct amdgpu_kiq *kiq = &adev->gfx.kiq;
> >>> +   struct amdgpu_ring *ring = &kiq->ring;
> >>> +
> >>> +   BUG_ON(!ring->funcs->emit_rreg);
> >>> +
> >>> +   mutex_lock(&adev->virt.lock);
> >>> +   amdgpu_ring_alloc(ring, 32);
> >>> +   amdgpu_ring_emit_hdp_flush(ring);
> >>> +   amdgpu_ring_emit_rreg(ring, reg);
> >>> +   amdgpu_ring_emit_hdp_invalidate(ring);
> >>> +   amdgpu_fence_emit(ring, &f);
> >>> +   amdgpu_ring_commit(ring);
> >>> +   mutex_unlock(&adev->virt.lock);
> >>> +
> >>> +   r = fence_wait(f, false);
> >>> +   if (r)
> >>> +           DRM_ERROR("wait for kiq fence error: %ld.\n", r);
> >>> +   fence_put(f);
> >>> +
> >>> +   val = adev->wb.wb[adev->virt.reg_val_offs];
> >>> +
> >>> +   return val;
> >>> +}
> >>> +
> >>> +void amdgpu_virt_kiq_wreg(struct amdgpu_device *adev, uint32_t reg,
> >>> +uint32_t v) {
> >>> +   signed long r;
> >>> +   struct fence *f;
> >>> +   struct amdgpu_kiq *kiq = &adev->gfx.kiq;
> >>> +   struct amdgpu_ring *ring = &kiq->ring;
> >>> +
> >>> +   BUG_ON(!ring->funcs->emit_wreg);
> >>> +
> >>> +   mutex_lock(&adev->virt.lock);
> >>> +   amdgpu_ring_alloc(ring, 32);
> >>> +   amdgpu_ring_emit_hdp_flush(ring);
> >>> +   amdgpu_ring_emit_wreg(ring, reg, v);
> >>> +   amdgpu_ring_emit_hdp_invalidate(ring);
> >>> +   amdgpu_fence_emit(ring, &f);
> >>> +   amdgpu_ring_commit(ring);
> >>> +   mutex_unlock(&adev->virt.lock);
> >>> +
> >>> +   r = fence_wait(f, false);
> >>> +   if (r)
> >>> +           DRM_ERROR("wait for kiq fence error: %ld.\n", r);
> >>> +   fence_put(f);
> >>> +}
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
> >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
> >>> index 3d38a9f..2869980 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
> >>> @@ -33,6 +33,7 @@
> >>>     struct amdgpu_virt {
> >>>      uint32_t                caps;
> >>>      uint32_t                reg_val_offs;
> >>> +   struct mutex            lock;
> >>>     };
> >>>
> >>>     #define amdgpu_sriov_enabled(adev) \ @@ -59,4 +60,8 @@ static
> >>> inline bool is_virtual_machine(void)
> >>>     #endif
> >>>     }
> >>>
> >>> +void amdgpu_virt_init_setting(struct amdgpu_device *adev); uint32_t
> >>> +amdgpu_virt_kiq_rreg(struct amdgpu_device *adev, uint32_t reg);
> >>> +void amdgpu_virt_kiq_wreg(struct amdgpu_device *adev, uint32_t reg,
> >>> +uint32_t v);
> >>> +
> >>>     #endif
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c
> >>> b/drivers/gpu/drm/amd/amdgpu/vi.c index 7350a8f..dc0d4fa 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/vi.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/vi.c
> >>> @@ -892,6 +892,9 @@ static int vi_common_early_init(void *handle)
> >>>              (amdgpu_ip_block_mask & (1 << AMD_IP_BLOCK_TYPE_SMC)))
> >>>              smc_enabled = true;
> >>>
> >>> +   if (amdgpu_sriov_vf(adev))
> >>> +           amdgpu_virt_init_setting(adev);
> >>> +
> >>>      adev->rev_id = vi_get_rev_id(adev);
> >>>      adev->external_rev_id = 0xFF;
> >>>      switch (adev->asic_type) {
> >> _______________________________________________
> >> amd-gfx mailing list
> >> amd-gfx at lists.freedesktop.org
> >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> >
> > _______________________________________________
> > amd-gfx mailing list
> > amd-gfx at lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20170112/bfa855b4/attachment-0001.html>


More information about the amd-gfx mailing list