[PATCH] Revert "drm/amdgpu: use the BAR if possible in amdgpu_device_vram_access v2"

Russell, Kent Kent.Russell at amd.com
Tue Apr 14 14:19:14 UTC 2020


[AMD Official Use Only - Internal Distribution Only]

On VG20 or MI100, as soon as we run the subtest, we get the dmesg output below, and then the kernel ends up hanging. I don't know enough about the test itself to know why this is occurring, but Jon Kim and Felix were discussing it on a separate thread when the issue was first reported, so they can hopefully provide some additional information.

 Kent

> -----Original Message-----
> From: Christian König <ckoenig.leichtzumerken at gmail.com>
> Sent: Tuesday, April 14, 2020 9:52 AM
> To: Russell, Kent <Kent.Russell at amd.com>; amd-gfx at lists.freedesktop.org
> Subject: Re: [PATCH] Revert "drm/amdgpu: use the BAR if possible in
> amdgpu_device_vram_access v2"
> 
> Am 13.04.20 um 20:20 schrieb Kent Russell:
> > This reverts commit c12b84d6e0d70f1185e6daddfd12afb671791b6e.
> > The original patch causes a RAS event and subsequent kernel hard-hang
> > when running the KFDMemoryTest.PtraceAccessInvisibleVram on VG20 and
> > Arcturus
> >
> > dmesg output at hang time:
> > [drm] RAS event of type ERREVENT_ATHUB_INTERRUPT detected!
> > amdgpu 0000:67:00.0: GPU reset begin!
> > Evicting PASID 0x8000 queues
> > Started evicting pasid 0x8000
> > qcm fence wait loop timeout expired
> > The cp might be in an unrecoverable state due to an unsuccessful
> > queues preemption Failed to evict process queues Failed to suspend
> > process 0x8000 Finished evicting pasid 0x8000 Started restoring pasid
> > 0x8000 Finished restoring pasid 0x8000 [drm] UVD VCPU state may lost
> > due to RAS ERREVENT_ATHUB_INTERRUPT
> > amdgpu: [powerplay] Failed to send message 0x26, response 0x0
> > amdgpu: [powerplay] Failed to set soft min gfxclk !
> > amdgpu: [powerplay] Failed to upload DPM Bootup Levels!
> > amdgpu: [powerplay] Failed to send message 0x7, response 0x0
> > amdgpu: [powerplay] [DisableAllSMUFeatures] Failed to disable all smu
> features!
> > amdgpu: [powerplay] [DisableDpmTasks] Failed to disable all smu features!
> > amdgpu: [powerplay] [PowerOffAsic] Failed to disable DPM!
> > [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP
> > block <powerplay> failed -5
> 
> Do you have more information on what's going wrong here since this is a really
> important patch for KFD debugging.
> 
> >
> > Signed-off-by: Kent Russell <kent.russell at amd.com>
> 
> Reviewed-by: Christian König <christian.koenig at amd.com>
> 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 26 ----------------------
> >   1 file changed, 26 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index cf5d6e585634..a3f997f84020 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -254,32 +254,6 @@ void amdgpu_device_vram_access(struct
> amdgpu_device *adev, loff_t pos,
> >   	uint32_t hi = ~0;
> >   	uint64_t last;
> >
> > -
> > -#ifdef CONFIG_64BIT
> > -	last = min(pos + size, adev->gmc.visible_vram_size);
> > -	if (last > pos) {
> > -		void __iomem *addr = adev->mman.aper_base_kaddr + pos;
> > -		size_t count = last - pos;
> > -
> > -		if (write) {
> > -			memcpy_toio(addr, buf, count);
> > -			mb();
> > -			amdgpu_asic_flush_hdp(adev, NULL);
> > -		} else {
> > -			amdgpu_asic_invalidate_hdp(adev, NULL);
> > -			mb();
> > -			memcpy_fromio(buf, addr, count);
> > -		}
> > -
> > -		if (count == size)
> > -			return;
> > -
> > -		pos += count;
> > -		buf += count / 4;
> > -		size -= count;
> > -	}
> > -#endif
> > -
> >   	spin_lock_irqsave(&adev->mmio_idx_lock, flags);
> >   	for (last = pos + size; pos < last; pos += 4) {
> >   		uint32_t tmp = pos >> 31;


More information about the amd-gfx mailing list