[PATCH] drm/amdgpu: disable UVD/VCE for some polaris 12 variants

Wed Nov 28 03:48:15 UTC 2018

在 2018年11月28日，00:11，Alex Deucher <alexdeucher at gmail.com> 写道：
> 
> On Tue, Nov 27, 2018 at 4:56 AM Christian König
> <ckoenig.leichtzumerken at gmail.com> wrote:
>> 
>> Am 27.11.18 um 02:47 schrieb Zhang, Jerry(Junwei):
>> 
>> On 11/26/18 5:28 PM, Christian König wrote:
>> 
>> Am 26.11.18 um 03:38 schrieb Zhang, Jerry(Junwei):
>> 
>> On 11/24/18 3:32 AM, Deucher, Alexander wrote:
>> 
>> Is this required?  Are the harvesting fuses incorrect?  If the blocks are harvested, we should bail out of the blocks properly during init.  Also, please make this more explicit if we still need it.  E.g.,
>> 
>> 
>> 
>> The harvest fuse is indeed disabling UVD and VCE, as it's a mining card.
>> Then any command to UVD/VCE causing NULL pointer issue, like amdgpu_test.
>> 
>> 
>> In this case we should fix the NULL pointer issue instead. Do you have a backtrace for this?
>> 
>> 
>> Sorry to miss the detail.
>> The NULL pointer is caused by UVD is not initialized as it's disabled in VBIOS for this kind of card.
>> 
>> 
>> Yeah, but that should be handled correctly.
>> 
>> 
>> When cs submit, it will check ring->funcs->parse_cs in amdgpu_cs_ib_fill().
>> However, uvd_v6_0_early_init() skip the set ring function, as CC_HARVEST_FUSES is set UVD/VCE disabled.
>> Then the access to UVD/VCE ring's funcs will cause NULL pointer issue.
>> 
>> BTW, Windows driver disables UVD/VCE for it as well.
>> 
>> 
>> You are approaching this from the wrong side. The fact that UVD/VCE is disabled should already be handled correctly.
>> 
>> The problem is rather that in a couple of places (amdgpu_ctx_init for example) we assume that we have at least one UVD/VCE ring.
>> 
>> Alex is right that checking the fuses should be sufficient and we rather need to fix the handling here instead of adding another workaround.
> 
> Exactly.  There are already cards out there with no UVD or VCE, so we
> need to fix this if it's a problem.  It sounds like userspace is
> submitting work to the VCE or UVD rings without checking whether or
> not the device supports them in the first place.  We should do a
> better job of guarding against that in the kernel.

Thanks your all.
Got that meaning now.

we may also print some message that UVD/VCE is not initialized, since it looks initialized successfully.
```
[   15.730219] [drm] add ip block number 7 <uvd_v6_0>
```
I could check it after the vacation(back next week).

BTW, is that handled by the patch series of [PATCH 1/6] drm/amdgpu: add VCN JPEG support amdgpu_ctx_num_entities?
Try to apply the patches, seems amdgpu_test hang at Userptr Test, verified on latest staging build
Please confirm that.

[ 4388.759743] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[ 4388.759782] IP: amddrm_sched_entity_flush+0x2d/0x1d0 [amd_sched]
[ 4388.759807] PGD 0 P4D 0
[ 4388.759820] Oops: 0000 [#1] SMP PTI
[ 4388.759834] Modules linked in: amdgpu(OE) amdchash(OE) amdttm(OE) amd_sched(OE) amdkcl(OE) amd_iommu_v2 drm_kms_helper drm i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt nls_utf8 cifs ccm rpcsec_gss_krb5 nfsv4 nfs fscache b
infmt_misc nls_iso8859_1 snd_hda_codec_realtek snd_hda_codec_generic intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel snd_hda_codec_hdmi kvm snd_hda_intel irqbypass crct10dif_pclmul snd_hda_codec crc32_pclmul snd_hda_co
re snd_hwdep ghash_clmulni_intel snd_seq_midi snd_seq_midi_event pcbc snd_pcm snd_rawmidi snd_seq snd_seq_device snd_timer aesni_intel aes_x86_64 crypto_simd eeepc_wmi glue_helper snd cryptd asus_wmi intel_cstate soundcore shpchp intel_ra
pl_perf mei_me wmi_bmof intel_wmi_thunderbolt sparse_keymap serio_raw mei acpi_pad mac_hid sch_fq_codel
[ 4388.760141]  nfsd auth_rpcgss nfs_acl parport_pc lockd ppdev grace lp sunrpc parport ip_tables x_tables autofs4 mxm_wmi e1000e psmouse ptp pps_core ahci libahci wmi video
[ 4388.760212] CPU: 7 PID: 915 Comm: amdgpu_test Tainted: G           OE    4.15.0-39-generic #42-Ubuntu
[ 4388.760250] Hardware name: System manufacturer System Product Name/Z170-A, BIOS 1302 11/09/2015
[ 4388.760287] RIP: 0010:amddrm_sched_entity_flush+0x2d/0x1d0 [amd_sched]
[ 4388.760314] RSP: 0018:ffffa37b8166bd38 EFLAGS: 00010246
[ 4388.760337] RAX: 0000000000000000 RBX: ffff88776740e5f8 RCX: 0000000000000000
[ 4388.760366] RDX: 0000000000000000 RSI: 00000000000000fa RDI: ffff88776740e5f8
[ 4388.760396] RBP: ffffa37b8166bd88 R08: ffff8877765dab10 R09: 0000000000000000
[ 4388.760425] R10: 0000000000000000 R11: 0000000000000064 R12: 00000000000000fa
[ 4388.760455] R13: ffff8877606fdf18 R14: ffff8877606fdef8 R15: 00000000000000fa
[ 4388.760484] FS:  00007f05b21a1580(0000) GS:ffff8877765c0000(0000) knlGS:0000000000000000
[ 4388.760518] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4388.760542] CR2: 0000000000000008 CR3: 000000003020a005 CR4: 00000000003606e0
[ 4388.760572] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4388.760601] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 4388.760630] Call Trace:
[ 4388.760644]  ? wait_woken+0x80/0x80
[ 4388.760701]  amdgpu_ctx_mgr_entity_flush+0x7b/0xc0 [amdgpu]
[ 4388.760747]  amdgpu_flush+0x23/0x30 [amdgpu]
[ 4388.760767]  filp_close+0x2f/0x80
[ 4388.760782]  put_files_struct+0x78/0xf0
[ 4388.760967]  exit_files+0x49/0x50
[ 4388.760976]  do_exit+0x2ca/0xb40
[ 4388.760985]  ? __do_page_fault+0x270/0x4d0
[ 4388.760994]  do_group_exit+0x43/0xb0
[ 4388.761003]  SyS_exit_group+0x14/0x20
[ 4388.761013]  do_syscall_64+0x73/0x130
[ 4388.761023]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[ 4388.761034] RIP: 0033:0x7f05b143fe06
[ 4388.761043] RSP: 002b:00007ffd0fde5fa8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
[ 4388.761059] RAX: ffffffffffffffda RBX: 00007f05b1742740 RCX: 00007f05b143fe06
[ 4388.761074] RDX: 0000000000000000 RSI: 000000000000003c RDI: 0000000000000000
[ 4388.761088] RBP: 0000000000000000 R08: 00000000000000e7 R09: ffffffffffffff80
[ 4388.761103] R10: 00007f05b135a140 R11: 0000000000000246 R12: 00007f05b1742740
[ 4388.761117] R13: 0000000000000001 R14: 00007f05b174b628 R15: 0000000000000000
[ 4388.761132] Code: 44 00 00 55 48 89 e5 41 56 41 55 41 54 53 48 89 fb 49 89 f4 48 83 ec 30 65 48 8b 04 25 28 00 00 00 48 89 45 d8 31 c0 48 8b 47 10 <4c> 8b 68 08 65 48 8b 04 25 00 5c 01 00 f6 40 24 04 0f 84 1b 01 
[ 4388.761188] RIP: amddrm_sched_entity_flush+0x2d/0x1d0 [amd_sched] RSP: ffffa37b8166bd38
[ 4388.761204] CR2: 0000000000000008
[ 4388.761212] ---[ end trace 7f1dd38e3cb86992 ]---
[ 4388.761222] Fixing recursive fault but reboot is needed!

Regards,
Jerry

> 
> Alex
> 
>> 
>> Regards,
>> Christian.
>> 
>> 
>> Regards,
>> Jerry
>> 
>> 
>> Regards,
>> Christian.
>> 
>> 
>> AFAIW, windows also disable UVD and VCE in initialization.
>> 
>>       if ((adev->pdev->device == 0x67df) &&
>>              (adev->pdev->revision == 0xf7)) {
>> 
>>        /* Some polaris12 variants don't support UVD/VCE */
>> 
>>      } else  {
>> 
>>                 amdgpu_device_ip_block_add(adev, &uvd_v6_3_ip_block);
>> 
>>                 amdgpu_device_ip_block_add(adev, &vce_v3_4_ip_block);
>> 
>>    }
>> 
>> 
>> 
>> OK, will explicit the process.
>> 
>> Regards,
>> Jerry
>> 
>> That way if we re-arrange the order later, it will be easier to track.
>> 
>> 
>> Alex
>> 
>> ________________________________
>> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> on behalf of Junwei Zhang <Jerry.Zhang at amd.com>
>> Sent: Friday, November 23, 2018 3:32:27 AM
>> To: amd-gfx at lists.freedesktop.org
>> Cc: Zhang, Jerry
>> Subject: [PATCH] drm/amdgpu: disable UVD/VCE for some polaris 12 variants
>> 
>> Some variants don't support UVD and VCE.
>> 
>> Signed-off-by: Junwei Zhang <Jerry.Zhang at amd.com>
>> ---
>> drivers/gpu/drm/amd/amdgpu/vi.c | 4 ++++
>> 1 file changed, 4 insertions(+)
>> 
>> diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c b/drivers/gpu/drm/amd/amdgpu/vi.c
>> index f3a4cf1f013a..3338b013ded4 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/vi.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/vi.c
>> @@ -1660,6 +1660,10 @@ int vi_set_ip_blocks(struct amdgpu_device *adev)
>>                         amdgpu_device_ip_block_add(adev, &dce_v11_2_ip_block);
>>                 amdgpu_device_ip_block_add(adev, &gfx_v8_0_ip_block);
>>                 amdgpu_device_ip_block_add(adev, &sdma_v3_1_ip_block);
>> +               /* Some polaris12 variants don't support UVD/VCE */
>> +               if ((adev->pdev->device == 0x67df) &&
>> +                     (adev->pdev->revision == 0xf7))
>> +                       break;
>>                 amdgpu_device_ip_block_add(adev, &uvd_v6_3_ip_block);
>>                 amdgpu_device_ip_block_add(adev, &vce_v3_4_ip_block);
>>                 break;
>> --
>> 2.17.1
>> 
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>> 
>> 
>> 
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>> 
>> 
>> 
>> 
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>> 
>> 
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx