[PATCH 5/8] drm/amdgpu: use the new cursor in amdgpu_ttm_access_memory

Christian König ckoenig.leichtzumerken at gmail.com
Sat Mar 20 09:08:04 UTC 2021


Yeah, Nirmoy already stumbled over the correct fix.

The size check is off by one. Patch to fix this should be pushed on Monday.

Regards,
Christian.

Am 19.03.21 um 22:23 schrieb Felix Kuehling:
> This is causing a deadlock in amdgpu_ttm_access_memory during the 
> PtraceAccess test in kfdtest. Unfortunately it doesn't get flagged by 
> LOCKDEP. See the kernel log snippet below. I don't have a good 
> explanation what's going on other than maybe some data structure 
> corruption.
>
> With this patch reverted the PtraceAccess test still fails, but it 
> doesn't hang any more. If I also revert "use new cursor in 
> amdgpu_ttm_io_mem_pfn" (which is used via amdgpu_find_mm_node in 
> amdgpu_ttm_access_memory), Ptrace access starts working correctly. 
> That tells me that there is some fundamental bug in the resource 
> cursor implementation that's breaking several users.
>
> Regards,
>   Felix
>
>
> [  129.446085] watchdog: BUG: soft lockup - CPU#8 stuck for 22s! 
> [kfdtest:3588]
> [  129.455379] Modules linked in: ip6table_filter ip6_tables 
> iptable_filter amdgpu x86_pkg_temp_thermal drm_ttm_helper ttm iommu_v2 
> gpu_sched ip_tables x_tables
> [  129.455428] irq event stamp: 75294000
> [  129.455432] hardirqs last  enabled at (75293999): 
> [<ffffffffa9dcfd7d>] _raw_spin_unlock_irqrestore+0x2d/0x40
> [  129.455447] hardirqs last disabled at (75294000): 
> [<ffffffffa9dbef8a>] sysvec_apic_timer_interrupt+0xa/0xa0
> [  129.455457] softirqs last  enabled at (75184000): 
> [<ffffffffaa000306>] __do_softirq+0x306/0x429
> [  129.455467] softirqs last disabled at (75183995): 
> [<ffffffffa9e00f8f>] asm_call_irq_on_stack+0xf/0x20
> [  129.455477] CPU: 8 PID: 3588 Comm: kfdtest Not tainted 
> 5.11.0-kfd-fkuehlin #194
> [  129.455485] Hardware name: ASUS All Series/X99-E WS/USB 3.1, BIOS 
> 3201 06/17/2016
> [  129.455490] RIP: 0010:_raw_spin_lock_irqsave+0xb/0x50
> [  129.455498] Code: d2 31 f6 e8 e7 e9 31 ff 48 89 df 58 5b e9 7d 32 
> 32 ff 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 
> fd 53 9c <5b> fa f6 c7 02 74 05 e8 59 06 3e ff 65 ff 05 92 73 24 56 48 
> 8d 7d
> [  129.455505] RSP: 0018:ffffa3eb407f3c58 EFLAGS: 00000246
> [  129.455513] RAX: ffff96466e2010a0 RBX: ffff96466e200000 RCX: 
> 0000000000000000
> [  129.455519] RDX: ffffa3eb407f3e70 RSI: 00000000019ffff0 RDI: 
> ffff96466e2010a0
> [  129.455524] RBP: ffff96466e2010a0 R08: 0000000000000000 R09: 
> 0000000000000001
> [  129.455528] R10: ffffa3eb407f3c60 R11: ffff96466e2010b8 R12: 
> 00000000019ffff0
> [  129.455533] R13: 00000000019ffff0 R14: ffffa3eb407f3e70 R15: 
> 00000000019ffff0
> [  129.455538] FS:  00007f5aad0f0740(0000) GS:ffff96467fc00000(0000) 
> knlGS:0000000000000000
> [  129.455544] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  129.455549] CR2: 0000563ea76ad0f0 CR3: 00000007c6e92005 CR4: 
> 00000000001706e0
> [  129.455554] Call Trace:
> [  129.455563]  amdgpu_device_vram_access+0xc1/0x200 [amdgpu]
> [  129.455820]  ? _raw_spin_unlock_irqrestore+0x2d/0x40
> [  129.455834]  amdgpu_ttm_access_memory+0x29e/0x320 [amdgpu]
> [  129.456063]  ttm_bo_vm_access+0x1c8/0x3a0 [ttm]
> [  129.456089]  __access_remote_vm+0x289/0x390
> [  129.456112]  ptrace_access_vm+0x98/0xc0
> [  129.456127]  generic_ptrace_peekdata+0x31/0x80
> [  129.456138]  ptrace_request+0x13b/0x5d0
> [  129.456155]  arch_ptrace+0x24f/0x2f0
> [  129.456165]  __x64_sys_ptrace+0xc9/0x140
> [  129.456177]  do_syscall_64+0x2d/0x40
> [  129.456185]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [  129.456194] RIP: 0033:0x7f5aab861a3f
> [  129.456199] Code: 48 89 44 24 18 48 8d 44 24 30 c7 44 24 10 18 00 
> 00 00 8b 70 08 48 8b 50 10 48 89 44 24 20 4c 0f 43 50 18 b8 65 00 00 
> 00 0f 05 <48> 3d 00 f0 ff ff 77 41 48 85 c0 78 06 41 83 f8 02 76 1e 48 
> 8b 4c
> [  129.456205] RSP: 002b:00007ffd27b68750 EFLAGS: 00000293 ORIG_RAX: 
> 0000000000000065
> [  129.456214] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 
> 00007f5aab861a3f
> [  129.456219] RDX: 00007f5aab3ffff0 RSI: 0000000000000dfa RDI: 
> 0000000000000002
> [  129.456224] RBP: 00007ffd27b68870 R08: 0000000000000001 R09: 
> 0000000000000000
> [  129.456228] R10: 00007ffd27b68758 R11: 0000000000000293 R12: 
> 0000563ea764e2aa
> [  129.456233] R13: 0000000000000000 R14: 0000000000000021 R15: 
> 0000000000000000
>
> On 2021-03-08 8:40 a.m., Christian König wrote:
>> Separate the drm_mm_node walking from the actual handling.
>>
>> Signed-off-by: Christian König <christian.koenig at amd.com>
>> Acked-by: Oak Zeng <Oak.Zeng at amd.com>
>> Tested-by: Nirmoy Das <nirmoy.das at amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 67 +++++++------------------
>>   1 file changed, 18 insertions(+), 49 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> index 517611b709fa..2cbe4ace591f 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> @@ -178,26 +178,6 @@ static int amdgpu_verify_access(struct 
>> ttm_buffer_object *bo, struct file *filp)
>>                         filp->private_data);
>>   }
>>   -/**
>> - * amdgpu_find_mm_node - Helper function finds the drm_mm_node 
>> corresponding to
>> - * @offset. It also modifies the offset to be within the drm_mm_node 
>> returned
>> - *
>> - * @mem: The region where the bo resides.
>> - * @offset: The offset that drm_mm_node is used for finding.
>> - *
>> - */
>> -static struct drm_mm_node *amdgpu_find_mm_node(struct ttm_resource 
>> *mem,
>> -                           uint64_t *offset)
>> -{
>> -    struct drm_mm_node *mm_node = mem->mm_node;
>> -
>> -    while (*offset >= (mm_node->size << PAGE_SHIFT)) {
>> -        *offset -= (mm_node->size << PAGE_SHIFT);
>> -        ++mm_node;
>> -    }
>> -    return mm_node;
>> -}
>> -
>>   /**
>>    * amdgpu_ttm_map_buffer - Map memory into the GART windows
>>    * @bo: buffer object to map
>> @@ -1478,41 +1458,36 @@ static bool 
>> amdgpu_ttm_bo_eviction_valuable(struct ttm_buffer_object *bo,
>>    * access for debugging purposes.
>>    */
>>   static int amdgpu_ttm_access_memory(struct ttm_buffer_object *bo,
>> -                    unsigned long offset,
>> -                    void *buf, int len, int write)
>> +                    unsigned long offset, void *buf, int len,
>> +                    int write)
>>   {
>>       struct amdgpu_bo *abo = ttm_to_amdgpu_bo(bo);
>>       struct amdgpu_device *adev = amdgpu_ttm_adev(abo->tbo.bdev);
>> -    struct drm_mm_node *nodes;
>> +    struct amdgpu_res_cursor cursor;
>> +    unsigned long flags;
>>       uint32_t value = 0;
>>       int ret = 0;
>> -    uint64_t pos;
>> -    unsigned long flags;
>>         if (bo->mem.mem_type != TTM_PL_VRAM)
>>           return -EIO;
>>   -    pos = offset;
>> -    nodes = amdgpu_find_mm_node(&abo->tbo.mem, &pos);
>> -    pos += (nodes->start << PAGE_SHIFT);
>> -
>> -    while (len && pos < adev->gmc.mc_vram_size) {
>> -        uint64_t aligned_pos = pos & ~(uint64_t)3;
>> -        uint64_t bytes = 4 - (pos & 3);
>> -        uint32_t shift = (pos & 3) * 8;
>> +    amdgpu_res_first(&bo->mem, offset, len, &cursor);
>> +    while (cursor.remaining) {
>> +        uint64_t aligned_pos = cursor.start & ~(uint64_t)3;
>> +        uint64_t bytes = 4 - (cursor.start & 3);
>> +        uint32_t shift = (cursor.start & 3) * 8;
>>           uint32_t mask = 0xffffffff << shift;
>>   -        if (len < bytes) {
>> -            mask &= 0xffffffff >> (bytes - len) * 8;
>> -            bytes = len;
>> +        if (cursor.size < bytes) {
>> +            mask &= 0xffffffff >> (bytes - cursor.size) * 8;
>> +            bytes = cursor.size;
>>           }
>>             if (mask != 0xffffffff) {
>>               spin_lock_irqsave(&adev->mmio_idx_lock, flags);
>>               WREG32_NO_KIQ(mmMM_INDEX, ((uint32_t)aligned_pos) | 
>> 0x80000000);
>>               WREG32_NO_KIQ(mmMM_INDEX_HI, aligned_pos >> 31);
>> -            if (!write || mask != 0xffffffff)
>> -                value = RREG32_NO_KIQ(mmMM_DATA);
>> +            value = RREG32_NO_KIQ(mmMM_DATA);
>>               if (write) {
>>                   value &= ~mask;
>>                   value |= (*(uint32_t *)buf << shift) & mask;
>> @@ -1524,21 +1499,15 @@ static int amdgpu_ttm_access_memory(struct 
>> ttm_buffer_object *bo,
>>                   memcpy(buf, &value, bytes);
>>               }
>>           } else {
>> -            bytes = (nodes->start + nodes->size) << PAGE_SHIFT;
>> -            bytes = min(bytes - pos, (uint64_t)len & ~0x3ull);
>> -
>> -            amdgpu_device_vram_access(adev, pos, (uint32_t *)buf,
>> -                          bytes, write);
>> +            bytes = cursor.size & 0x3ull;
>> +            amdgpu_device_vram_access(adev, cursor.start,
>> +                          (uint32_t *)buf, bytes,
>> +                          write);
>>           }
>>             ret += bytes;
>>           buf = (uint8_t *)buf + bytes;
>> -        pos += bytes;
>> -        len -= bytes;
>> -        if (pos >= (nodes->start + nodes->size) << PAGE_SHIFT) {
>> -            ++nodes;
>> -            pos = (nodes->start << PAGE_SHIFT);
>> -        }
>> +        amdgpu_res_next(&cursor, bytes);
>>       }
>>         return ret;



More information about the amd-gfx mailing list