Re: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in amdgpu_virt

Christian König ckoenig.leichtzumerken at gmail.com
Wed Nov 20 13:36:57 UTC 2019


Ok in this case we should just drop this patch.

Any objections on the other semaphore patch? IIRC we added the 
amdgpu_ring_emit_reg_write_reg_wait() especially to make sure that an 
invalidation can't be interrupted by a world switch.

When we add manual semaphore acquire/release before and after the 
invalidation that could break this quite badly.

Regards,
Christian.

Am 20.11.19 um 14:30 schrieb Liu, Monk:
>>> Question for Emily and Monk: Do we support power gating of the MMHUB with SRIOV? I don't think so and when that's correct we could just drop this patch.
> Any power gating if now allowed to be controlled by a VF in a guest VM ....
>
> It is hypervisor driver's (gim) responsibility to conduct when can our hardware entering a power circle (e.g. BACO reset), and we have software mechanism to make sure
> Power gating off circle shall only happen when all engine is idle (or any of them was hang) state.
>
> And even some engine isn't hang (e.g. KIQ is still doing things like read register or gpu_flush_tlb , etc...) if GIM decide to power off GPU (BACO reset) then that's okay for KIQ, since
> After BACO all engines would be re-init anyway
>
> Thanks
>
> /Monk
>
> -----邮件原件-----
> 发件人: Christian König <ckoenig.leichtzumerken at gmail.com>
> 发送时间: 2019年11月20日 19:24
> 收件人: Zhu, Changfeng <Changfeng.Zhu at amd.com>; Koenig, Christian <Christian.Koenig at amd.com>; Xiao, Jack <Jack.Xiao at amd.com>; Zhou1, Tao <Tao.Zhou1 at amd.com>; Huang, Ray <Ray.Huang at amd.com>; Huang, Shimmer <Xinmei.Huang at amd.com>; amd-gfx at lists.freedesktop.org; Deng, Emily <Emily.Deng at amd.com>; Liu, Monk <Monk.Liu at amd.com>
> 主题: Re: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in amdgpu_virt
>
> Hi Changfeng,
>
> [adding Monk and Emily as well].
>
> I thought more about this and came to the conclusion that this won't work and might result in a lockup as well.
>
> We are using the KIQ on SRIOV for GPUVM invalidation because we need an atomic read/modify/write cycle since we found that the invalidation engine is resetted with every world switch.
>
> Now accessing the semaphore registers is not atomic any more and we could have a world switch in between grabbing the semaphore and sending the VM invalidation. That either won't work or could result in a lockup as well.
>
> Question for Emily and Monk: Do we support power gating of the MMHUB with SRIOV? I don't think so and when that's correct we could just drop this patch.
>
> Regards,
> Christian.
>
> Am 20.11.19 um 10:14 schrieb Changfeng.Zhu:
>> From: changzhu <Changfeng.Zhu at amd.com>
>>
>> It may lose gpuvm invalidate acknowldege state across power-gating off
>> cycle. To avoid this issue in virt invalidation, add semaphore acquire
>> before invalidation and semaphore release after invalidation.
>>
>> Change-Id: Ie98304e475166b53eed033462d76423b6b0fc25b
>> Signed-off-by: changzhu <Changfeng.Zhu at amd.com>
>> ---
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 26 ++++++++++++++++++++++--
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h |  3 ++-
>>    drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c    |  3 ++-
>>    3 files changed, 28 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>> index f04eb1a64271..70ffaf91cd12 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>> @@ -135,7 +135,8 @@ void amdgpu_virt_kiq_wreg(struct amdgpu_device
>> *adev, uint32_t reg, uint32_t v)
>>    
>>    void amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
>>    					uint32_t reg0, uint32_t reg1,
>> -					uint32_t ref, uint32_t mask)
>> +					uint32_t ref, uint32_t mask,
>> +					uint32_t sem)
>>    {
>>    	struct amdgpu_kiq *kiq = &adev->gfx.kiq;
>>    	struct amdgpu_ring *ring = &kiq->ring; @@ -144,9 +145,30 @@ void
>> amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
>>    	uint32_t seq;
>>    
>>    	spin_lock_irqsave(&kiq->ring_lock, flags);
>> -	amdgpu_ring_alloc(ring, 32);
>> +	amdgpu_ring_alloc(ring, 60);
>> +
>> +	/*
>> +	 * It may lose gpuvm invalidate acknowldege state across power-gating
>> +	 * off cycle, add semaphore acquire before invalidation and semaphore
>> +	 * release after invalidation to avoid entering power gated state
>> +	 * to WA the Issue
>> +	 */
>> +
>> +	/* a read return value of 1 means semaphore acuqire */
>> +	if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
>> +	    ring->funcs->vmhub == AMDGPU_MMHUB_1)
>> +	amdgpu_ring_emit_reg_wait(ring, sem, 0x1, 0x1);
>> +
>>    	amdgpu_ring_emit_reg_write_reg_wait(ring, reg0, reg1,
>>    					    ref, mask);
>> +	/*
>> +	 * add semaphore release after invalidation,
>> +	 * write with 0 means semaphore release
>> +	 */
>> +	if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
>> +	    ring->funcs->vmhub == AMDGPU_MMHUB_1)
>> +	amdgpu_ring_emit_wreg(ring, sem, 0);
>> +
>>    	amdgpu_fence_emit_polling(ring, &seq);
>>    	amdgpu_ring_commit(ring);
>>    	spin_unlock_irqrestore(&kiq->ring_lock, flags); diff --git
>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
>> index b0b2bdc750df..bda6a2f37dc0 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
>> @@ -295,7 +295,8 @@ uint32_t amdgpu_virt_kiq_rreg(struct amdgpu_device *adev, uint32_t reg);
>>    void amdgpu_virt_kiq_wreg(struct amdgpu_device *adev, uint32_t reg, uint32_t v);
>>    void amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
>>    					uint32_t reg0, uint32_t rreg1,
>> -					uint32_t ref, uint32_t mask);
>> +					uint32_t ref, uint32_t mask,
>> +					uint32_t sem);
>>    int amdgpu_virt_request_full_gpu(struct amdgpu_device *adev, bool init);
>>    int amdgpu_virt_release_full_gpu(struct amdgpu_device *adev, bool init);
>>    int amdgpu_virt_reset_gpu(struct amdgpu_device *adev); diff --git
>> a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>> b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>> index f25cd97ba5f2..1ae59af7836a 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>> @@ -448,9 +448,10 @@ static void gmc_v9_0_flush_gpu_tlb(struct amdgpu_device *adev, uint32_t vmid,
>>    			!adev->in_gpu_reset) {
>>    		uint32_t req = hub->vm_inv_eng0_req + eng;
>>    		uint32_t ack = hub->vm_inv_eng0_ack + eng;
>> +		uint32_t sem = hub->vm_inv_eng0_sem + eng;
>>    
>>    		amdgpu_virt_kiq_reg_write_reg_wait(adev, req, ack, tmp,
>> -				1 << vmid);
>> +						   1 << vmid, sem);
>>    		return;
>>    	}
>>    
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx



More information about the amd-gfx mailing list