[PATCH] drm/amdgpu: capture invalid hardware access v2
Andrey Grodzovsky
andrey.grodzovsky at amd.com
Tue Mar 9 19:13:08 UTC 2021
On 2021-03-09 1:51 p.m., Christian König wrote:
> Am 09.03.21 um 19:26 schrieb Andrey Grodzovsky:
>> On 2021-03-09 12:47 p.m., Christian König wrote:
>>> No it won't. Accessing the hardware without the lock is ok as long as
>>> the write side isn't taken.
>>
>> Oh, forgot about the trylock part, sorry...
>>
>>>
>>> But that approach is illegal anyway because we suspend the hardware
>>> without proper protection from concurrent access.
>>
>> For my understanding and from looking again at his steps related to this
>>
>> Step 0: atomic_cmpxchg(&adev->in_gpu_reset, 0, 1) - [AG] protects from
>> other TDR threads
>>
>> Step 1: cancel all delay works, stop drm schedule, complete all
>> unreceived fences and so on. Call amdgpu_device_pre_asic_reset... e.t.c
>> - [AG] this is the HW suspend part
>>
>> Step 2: call down_write(&adev->reset_sem) to hold write lock, which will
>> block recovery thread until other threads release read locks.
>>
>> Is the problem here that HW is suspended while some other threads
>> that rely on the read side lock still access HW ?
>
> Exactly that, yes.
>
>> Mostly what I am
>> thinking about are IOCTls - we can't 'wait for them to complete' but
>> they might be accessing HW when we start suspend.
>
> Well, the bigger problem with the IOCTLs is that I can't seem to be able
> to properly explain the dependency here.
>
> Maybe let me describe it from the other side once more to make it more
> understandable:
>
> Core Linux memory management depends on waiting for dma_fences. Amdgpus
> dma_fences in turn depend on the GPU scheduler and recovery/reset
> functionality for proper functionality.
>
> So every call to kmalloc() or get_free_page() depends on the amdgpu GPU
> recover/reset function and with it every lock taken in that function.
>
> Because of this you can't take any lock under which memory allocation
> might happen while holding the reset lock.
>
> This is a hard and by now very well documented limitation and Dennis
> changes here doesn't magically remove that.
>
> Because of this taking the reset lock or any other lock used in the GPU
> reset functionality at the beginning of our IOCTLs will never work
> correctly.
>
> What you need to do is to have all locks which depend on MM taken before
> your take the reset lock. E.g. dma_resv, mmap_sem, VM locks etc etc.. As
> well as not call functions like kmalloc, get_free_page, dma_fence_wait
> etc etc
>
> I hope that somehow makes it more clear now. We should probably try to
> merge Daniels patch set for teaching lockdep to complain about such
> things sooner or later.
It is now more clear then in the previous discussion. One
thing that would be helpful is also to see the kernel documentation
about this limitation - can you maybe point in kernel DOC or even in
code to such documentation ?
Andrey
>
> With those patches merged lockdep starts to complain about the problems
> as soon as you try to run a kernel with that enabled.
>
> Regards,
> Christian.
>
>>
>> Andrey
>>
>>
>>>
>>> Christian.
>>>
>>> Am 09.03.21 um 17:40 schrieb Andrey Grodzovsky:
>>>> Because he takes the write side lock post amdgpu_pre_asic_reset -
>>>> where HW suspend sequence happens (touching registers) - so i think
>>>> it will assert.
>>>>
>>>> Andrey
>>>>
>>>> On 2021-03-09 7:56 a.m., Christian König wrote:
>>>>> Hi Dennis,
>>>>>
>>>>> why do you think that this will always assert in reset thread?
>>>>>
>>>>> In the reset thread while we are holding the reset lock write side
>>>>> lockdep_assert_held() should be satisfied and not cause any splat
>>>>> in the system log.
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>> Am 09.03.21 um 03:03 schrieb Li, Dennis:
>>>>>> [AMD Official Use Only - Internal Distribution Only]
>>>>>>
>>>>>> Hi, Christian,
>>>>>> amdgpu_device_skip_hw_access will always assert in reset
>>>>>> thread, which seems not a good idea.
>>>>>>
>>>>>> Best Regards
>>>>>> Dennis Li
>>>>>> -----Original Message-----
>>>>>> From: Christian König <ckoenig.leichtzumerken at gmail.com>
>>>>>> Sent: Tuesday, March 9, 2021 2:07 AM
>>>>>> To: amd-gfx at lists.freedesktop.org
>>>>>> Cc: Grodzovsky, Andrey <Andrey.Grodzovsky at amd.com>; Li, Dennis
>>>>>> <Dennis.Li at amd.com>
>>>>>> Subject: [PATCH] drm/amdgpu: capture invalid hardware access v2
>>>>>>
>>>>>> From: Dennis Li <Dennis.Li at amd.com>
>>>>>>
>>>>>> When recovery thread has begun GPU reset, there should be not
>>>>>> other threads to access hardware, otherwise system randomly hang.
>>>>>>
>>>>>> v2 (chk): rewritten from scratch, use trylock and lockdep instead of
>>>>>> hand wiring the logic.
>>>>>>
>>>>>> Signed-off-by: Dennis Li <Dennis.Li at amd.com>
>>>>>> Signed-off-by: Christian König <christian.koenig at amd.com>
>>>>>> ---
>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 74
>>>>>> +++++++++++++++++-----
>>>>>> 1 file changed, 57 insertions(+), 17 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>> index e247c3a2ec08..c990af6a43ca 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>> @@ -326,6 +326,34 @@ void amdgpu_device_vram_access(struct
>>>>>> amdgpu_device *adev, loff_t pos,
>>>>>> /*
>>>>>> * register access helper functions.
>>>>>> */
>>>>>> +
>>>>>> +/* Check if hw access should be skipped because of hotplug or device
>>>>>> +error */ static bool amdgpu_device_skip_hw_access(struct
>>>>>> amdgpu_device
>>>>>> +*adev) {
>>>>>> +if (adev->in_pci_err_recovery)
>>>>>> +return true;
>>>>>> +
>>>>>> +#ifdef CONFIG_LOCKDEP
>>>>>> +/*
>>>>>> + * This is a bit complicated to understand, so worth a comment.
>>>>>> What we assert
>>>>>> + * here is that the GPU reset is not running on another thread in
>>>>>> parallel.
>>>>>> + *
>>>>>> + * For this we trylock the read side of the reset semaphore, if
>>>>>> that succeeds
>>>>>> + * we know that the reset is not running in paralell.
>>>>>> + *
>>>>>> + * If the trylock fails we assert that we are either already
>>>>>> holding the read
>>>>>> + * side of the lock or are the reset thread itself and hold the
>>>>>> write side of
>>>>>> + * the lock.
>>>>>> + */
>>>>>> +if (down_read_trylock(&adev->reset_sem))
>>>>>> +up_read(&adev->reset_sem);
>>>>>> +else
>>>>>> +lockdep_assert_held(&adev->reset_sem);
>>>>>> +#endif
>>>>>> +
>>>>>> +return false;
>>>>>> +}
>>>>>> +
>>>>>> /**
>>>>>> * amdgpu_device_rreg - read a memory mapped IO or indirect
>>>>>> register
>>>>>> *
>>>>>> @@ -340,7 +368,7 @@ uint32_t amdgpu_device_rreg(struct
>>>>>> amdgpu_device *adev, {
>>>>>> uint32_t ret;
>>>>>>
>>>>>> -if (adev->in_pci_err_recovery)
>>>>>> +if (amdgpu_device_skip_hw_access(adev))
>>>>>> return 0;
>>>>>>
>>>>>> if ((reg * 4) < adev->rmmio_size) {
>>>>>> @@ -377,7 +405,7 @@ uint32_t amdgpu_device_rreg(struct
>>>>>> amdgpu_device *adev,
>>>>>> */
>>>>>> uint8_t amdgpu_mm_rreg8(struct amdgpu_device *adev, uint32_t
>>>>>> offset) {
>>>>>> -if (adev->in_pci_err_recovery)
>>>>>> +if (amdgpu_device_skip_hw_access(adev))
>>>>>> return 0;
>>>>>>
>>>>>> if (offset < adev->rmmio_size)
>>>>>> @@ -402,7 +430,7 @@ uint8_t amdgpu_mm_rreg8(struct amdgpu_device
>>>>>> *adev, uint32_t offset)
>>>>>> */
>>>>>> void amdgpu_mm_wreg8(struct amdgpu_device *adev, uint32_t
>>>>>> offset, uint8_t value) {
>>>>>> -if (adev->in_pci_err_recovery)
>>>>>> +if (amdgpu_device_skip_hw_access(adev))
>>>>>> return;
>>>>>>
>>>>>> if (offset < adev->rmmio_size)
>>>>>> @@ -425,7 +453,7 @@ void amdgpu_device_wreg(struct amdgpu_device
>>>>>> *adev,
>>>>>> uint32_t reg, uint32_t v,
>>>>>> uint32_t acc_flags)
>>>>>> {
>>>>>> -if (adev->in_pci_err_recovery)
>>>>>> +if (amdgpu_device_skip_hw_access(adev))
>>>>>> return;
>>>>>>
>>>>>> if ((reg * 4) < adev->rmmio_size) {
>>>>>> @@ -452,7 +480,7 @@ void amdgpu_device_wreg(struct amdgpu_device
>>>>>> *adev, void amdgpu_mm_wreg_mmio_rlc(struct amdgpu_device *adev,
>>>>>> uint32_t reg, uint32_t v)
>>>>>> {
>>>>>> -if (adev->in_pci_err_recovery)
>>>>>> +if (amdgpu_device_skip_hw_access(adev))
>>>>>> return;
>>>>>>
>>>>>> if (amdgpu_sriov_fullaccess(adev) &&
>>>>>> @@ -475,7 +503,7 @@ void amdgpu_mm_wreg_mmio_rlc(struct
>>>>>> amdgpu_device *adev,
>>>>>> */
>>>>>> u32 amdgpu_io_rreg(struct amdgpu_device *adev, u32 reg) {
>>>>>> -if (adev->in_pci_err_recovery)
>>>>>> +if (amdgpu_device_skip_hw_access(adev))
>>>>>> return 0;
>>>>>>
>>>>>> if ((reg * 4) < adev->rio_mem_size)
>>>>>> @@ -497,7 +525,7 @@ u32 amdgpu_io_rreg(struct amdgpu_device *adev,
>>>>>> u32 reg)
>>>>>> */
>>>>>> void amdgpu_io_wreg(struct amdgpu_device *adev, u32 reg, u32 v) {
>>>>>> -if (adev->in_pci_err_recovery)
>>>>>> +if (amdgpu_device_skip_hw_access(adev))
>>>>>> return;
>>>>>>
>>>>>> if ((reg * 4) < adev->rio_mem_size)
>>>>>> @@ -519,7 +547,7 @@ void amdgpu_io_wreg(struct amdgpu_device
>>>>>> *adev, u32 reg, u32 v)
>>>>>> */
>>>>>> u32 amdgpu_mm_rdoorbell(struct amdgpu_device *adev, u32 index) {
>>>>>> -if (adev->in_pci_err_recovery)
>>>>>> +if (amdgpu_device_skip_hw_access(adev))
>>>>>> return 0;
>>>>>>
>>>>>> if (index < adev->doorbell.num_doorbells) { @@ -542,7 +570,7 @@
>>>>>> u32 amdgpu_mm_rdoorbell(struct amdgpu_device *adev, u32 index)
>>>>>> */
>>>>>> void amdgpu_mm_wdoorbell(struct amdgpu_device *adev, u32 index,
>>>>>> u32 v) {
>>>>>> -if (adev->in_pci_err_recovery)
>>>>>> +if (amdgpu_device_skip_hw_access(adev))
>>>>>> return;
>>>>>>
>>>>>> if (index < adev->doorbell.num_doorbells) { @@ -563,7 +591,7 @@
>>>>>> void amdgpu_mm_wdoorbell(struct amdgpu_device *adev, u32 index,
>>>>>> u32 v)
>>>>>> */
>>>>>> u64 amdgpu_mm_rdoorbell64(struct amdgpu_device *adev, u32 index) {
>>>>>> -if (adev->in_pci_err_recovery)
>>>>>> +if (amdgpu_device_skip_hw_access(adev))
>>>>>> return 0;
>>>>>>
>>>>>> if (index < adev->doorbell.num_doorbells) { @@ -586,7 +614,7 @@
>>>>>> u64 amdgpu_mm_rdoorbell64(struct amdgpu_device *adev, u32 index)
>>>>>> */
>>>>>> void amdgpu_mm_wdoorbell64(struct amdgpu_device *adev, u32
>>>>>> index, u64 v) {
>>>>>> -if (adev->in_pci_err_recovery)
>>>>>> +if (amdgpu_device_skip_hw_access(adev))
>>>>>> return;
>>>>>>
>>>>>> if (index < adev->doorbell.num_doorbells) { @@ -610,10 +638,13
>>>>>> @@ u32 amdgpu_device_indirect_rreg(struct amdgpu_device *adev,
>>>>>> u32 pcie_index, u32 pcie_data,
>>>>>> u32 reg_addr)
>>>>>> {
>>>>>> -unsigned long flags;
>>>>>> -u32 r;
>>>>>> void __iomem *pcie_index_offset;
>>>>>> void __iomem *pcie_data_offset;
>>>>>> +unsigned long flags;
>>>>>> +u32 r;
>>>>>> +
>>>>>> +if (amdgpu_device_skip_hw_access(adev))
>>>>>> +return 0;
>>>>>>
>>>>>> spin_lock_irqsave(&adev->pcie_idx_lock, flags);
>>>>>> pcie_index_offset = (void __iomem *)adev->rmmio + pcie_index *
>>>>>> 4; @@ -641,10 +672,13 @@ u64 amdgpu_device_indirect_rreg64(struct
>>>>>> amdgpu_device *adev,
>>>>>> u32 pcie_index, u32 pcie_data,
>>>>>> u32 reg_addr)
>>>>>> {
>>>>>> -unsigned long flags;
>>>>>> -u64 r;
>>>>>> void __iomem *pcie_index_offset;
>>>>>> void __iomem *pcie_data_offset;
>>>>>> +unsigned long flags;
>>>>>> +u64 r;
>>>>>> +
>>>>>> +if (amdgpu_device_skip_hw_access(adev))
>>>>>> +return 0;
>>>>>>
>>>>>> spin_lock_irqsave(&adev->pcie_idx_lock, flags);
>>>>>> pcie_index_offset = (void __iomem *)adev->rmmio + pcie_index *
>>>>>> 4; @@ -677,9 +711,12 @@ void amdgpu_device_indirect_wreg(struct
>>>>>> amdgpu_device *adev,
>>>>>> u32 pcie_index, u32 pcie_data,
>>>>>> u32 reg_addr, u32 reg_data)
>>>>>> {
>>>>>> -unsigned long flags;
>>>>>> void __iomem *pcie_index_offset;
>>>>>> void __iomem *pcie_data_offset;
>>>>>> +unsigned long flags;
>>>>>> +
>>>>>> +if (amdgpu_device_skip_hw_access(adev))
>>>>>> +return;
>>>>>>
>>>>>> spin_lock_irqsave(&adev->pcie_idx_lock, flags);
>>>>>> pcie_index_offset = (void __iomem *)adev->rmmio + pcie_index *
>>>>>> 4; @@ -706,9 +743,12 @@ void amdgpu_device_indirect_wreg64(struct
>>>>>> amdgpu_device *adev,
>>>>>> u32 pcie_index, u32 pcie_data,
>>>>>> u32 reg_addr, u64 reg_data)
>>>>>> {
>>>>>> -unsigned long flags;
>>>>>> void __iomem *pcie_index_offset;
>>>>>> void __iomem *pcie_data_offset;
>>>>>> +unsigned long flags;
>>>>>> +
>>>>>> +if (amdgpu_device_skip_hw_access(adev))
>>>>>> +return;
>>>>>>
>>>>>> spin_lock_irqsave(&adev->pcie_idx_lock, flags);
>>>>>> pcie_index_offset = (void __iomem *)adev->rmmio + pcie_index * 4;
>>>>>> --
>>>>>> 2.25.1
>>>>>>
>>>>>
>>>
>
More information about the amd-gfx
mailing list