[PATCH RESEND] drm/amd/display: Fix nested FPU context in dcn21_validate_bandwidth()

Holger Hoffstätte holger at applied-asynchrony.com
Fri Mar 5 14:07:01 UTC 2021


On 2021-03-05 13:23, Holger Hoffstätte wrote:
> On 2021-03-05 12:39, Holger Hoffstätte wrote:
>>
>> Commit 41401ac67791 added FPU wrappers to dcn21_validate_bandwidth(),
>> which was correct. Unfortunately a nested function alredy contained
>> DC_FP_START()/DC_FP_END() calls, which results in nested FPU context
>> enter/exit and complaints by kernel_fpu_begin_mask().
>> This can be observed e.g. with 5.10.20, which backported 41401ac67791
>> and now emits the following warning on boot:
>>
>> WARNING: CPU: 6 PID: 858 at arch/x86/kernel/fpu/core.c:129 kernel_fpu_begin_mask+0xa5/0xc0
>> Call Trace:
>>   dcn21_calculate_wm+0x47/0xa90 [amdgpu]
>>   dcn21_validate_bandwidth_fp+0x15d/0x2b0 [amdgpu]
>>   dcn21_validate_bandwidth+0x29/0x40 [amdgpu]
>>   dc_validate_global_state+0x3c7/0x4c0 [amdgpu]
>>
>> The warning is emitted due to the additional DC_FP_START/END calls in
>> patch_bounding_box(), which is inlined into dcn21_calculate_wm(),
>> its only caller. Removing the calls brings the code in line with
>> dcn20 and makes the warning disappear.
>>
>> Fixes: 41401ac67791 ("drm/amd/display: Add FPU wrappers to dcn21_validate_bandwidth()")
>> Signed-off-by: Holger Hoffstätte <holger at applied-asynchrony.com>
>> ---
>>   drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c | 4 ----
>>   1 file changed, 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c b/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c
>> index 072f8c880924..68be73fe2e23 100644
>> --- a/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c
>> +++ b/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c
>> @@ -1062,8 +1062,6 @@ static void patch_bounding_box(struct dc *dc, struct _vcs_dpi_soc_bounding_box_s
>>   {
>>       int i;
>>
>> -    DC_FP_START();
>> -
>>       if (dc->bb_overrides.sr_exit_time_ns) {
>>           for (i = 0; i < WM_SET_COUNT; i++) {
>>                 dc->clk_mgr->bw_params->wm_table.entries[i].sr_exit_time_us =
>> @@ -1088,8 +1086,6 @@ static void patch_bounding_box(struct dc *dc, struct _vcs_dpi_soc_bounding_box_s
>>                   dc->bb_overrides.dram_clock_change_latency_ns / 1000.0;
>>           }
>>       }
>> -
>> -    DC_FP_END();
>>   }
>>
>>   void dcn21_calculate_wm(
> 
> Hmm..this is getting confusing since I was just greeted by the following for
> no obvious reason (probably when playing a browser video or something):
> 
> Mar 5 12:38] ------------[ cut here ]------------
> [  +0.000006] WARNING: CPU: 8 PID: 3803 at arch/x86/kernel/fpu/core.c:155 kernel_fpu_end+0x19/0x20
> [  +0.000001] Modules linked in: auth_rpcgss nfsv4 dns_resolver lz4 lz4_compress lz4_decompress nfs lockd grace nfs_ssc sunrpc tcp_bbr2 iwlmvm pkcs8_key_parser amdgpu mac80211 lm92 libarc4 snd_hda_codec_realtek wmi_bmof drivetemp iommu_v2 snd_hda_codec_generic gpu_sched ttm i2c_algo_bit btusb btrtl drm_kms_helper snd_hda_codec_hdmi btbcm btintel uvcvideo cec videobuf2_vmalloc videobuf2_memops iwlwifi videobuf2_v4l2 edac_mce_amd snd_hda_intel videobuf2_common crct10dif_pclmul snd_intel_dspcfg crc32_pclmul drm bluetooth crc32c_intel snd_hda_codec videodev ghash_clmulni_intel syscopyarea snd_rn_pci_acp3x snd_hwdep sysfillrect ecdh_generic rapl serio_raw mc ecc snd_hda_core k10temp sysimgblt snd_pci_acp3x fb_sys_fops i2c_piix4 cfg80211 snd_pcm snd_timer r8169 ccp ipmi_devintf ipmi_msghandler realtek thinkpad_acpi ucsi_acpi typec_ucsi snd typec soundcore wmi ledtrig_audio rfkill ac battery video i2c_scmi pinctrl_amd button
> [  +0.000036] CPU: 8 PID: 3803 Comm: X Not tainted 5.10.20 #1
> [  +0.000001] Hardware name: LENOVO 20U50001GE/20U50001GE, BIOS R19ET32W (1.16 ) 01/26/2021
> [  +0.000001] RIP: 0010:kernel_fpu_end+0x19/0x20
> [  +0.000001] Code: ae 47 40 b8 01 00 00 00 c3 0f 0b eb d7 0f 0b eb c9 0f 1f 44 00 00 65 8a 05 dc 42 ff 7e 84 c0 74 09 65 c6 05 d0 42 ff 7e 00 c3 <0f> 0b eb f3 0f 1f 00 0f 1f 44 00 00 8b 15 95 d2 03 02 31 f6 e8 0e
> [  +0.000001] RSP: 0018:ffffc900007b78d0 EFLAGS: 00010246
> [  +0.000001] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000027d46
> [  +0.000000] RDX: 0000000000027d45 RSI: ffffffffa0d6873d RDI: 000000000002ab00
> [  +0.000001] RBP: ffff888349ac0000 R08: 0000000000000480 R09: 00000000000003bf
> [  +0.000001] R10: ffffc900007b77e8 R11: 0000000000000000 R12: 0000000000000001
> [  +0.000000] R13: ffff88810b2e0000 R14: 0000000000000002 R15: 0000000080000000
> [  +0.000001] FS:  00007f6f002558c0(0000) GS:ffff8883ff600000(0000) knlGS:0000000000000000
> [  +0.000001] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  +0.000000] CR2: 00007f255134f8d0 CR3: 000000010431a000 CR4: 0000000000350ee0
> [  +0.000001] Call Trace:
> [  +0.000053]  dcn21_validate_bandwidth+0x31/0x40 [amdgpu]
> [  +0.000028]  dc_commit_updates_for_stream+0x9d9/0x2aa0 [amdgpu]
> [  +0.000033]  amdgpu_dm_atomic_commit_tail+0x1374/0x2260 [amdgpu]
> [  +0.000005]  commit_tail+0x8f/0x120 [drm_kms_helper]
> [  +0.000003]  drm_atomic_helper_commit+0x1d3/0x200 [drm_kms_helper]
> [  +0.000005]  drm_mode_obj_set_property_ioctl+0x118/0x380 [drm]
> [  +0.000004]  ? drm_mode_obj_find_prop_id+0x40/0x40 [drm]
> [  +0.000003]  drm_ioctl_kernel+0x8a/0x120 [drm]
> [  +0.000004]  drm_ioctl+0x1f1/0x3b0 [drm]
> [  +0.000003]  ? drm_mode_obj_find_prop_id+0x40/0x40 [drm]
> [  +0.000019]  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
> [  +0.000002]  __x64_sys_ioctl+0x152/0x920
> [  +0.000002]  ? _copy_from_user+0x28/0x60
> [  +0.000002]  ? restore_altstack+0x19/0xd0
> [  +0.000003]  do_syscall_64+0x2d/0x40
> [  +0.000002]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [  +0.000001] RIP: 0033:0x7f6f007549b7
> [  +0.000002] Code: 1f 40 00 48 89 d8 49 8d 3c 1c 48 f7 d8 49 39 c4 72 b1 e8 0c ff ff ff 85 c0 78 b6 5b 4c 89 e0 5d 41 5c c3 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 79 c4 0c 00 f7 d8 64 89 01 48
> [  +0.000000] RSP: 002b:00007ffe9c0f4788 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> [  +0.000001] RAX: ffffffffffffffda RBX: 00007ffe9c0f47c0 RCX: 00007f6f007549b7
> [  +0.000000] RDX: 00007ffe9c0f47c0 RSI: 00000000c01864ba RDI: 000000000000000b
> [  +0.000001] RBP: 00000000c01864ba R08: 000000000000006d R09: 00000000cccccccc
> [  +0.000000] R10: 0000000000000fff R11: 0000000000000246 R12: 000055a9b98d6720
> [  +0.000000] R13: 000000000000000b R14: 0000000000000000 R15: 0000000000000003
> [  +0.000001] ---[ end trace 9f0368711896f6eb ]---
> 
> ..which indicates that there is another spurious kernel_fpu_begin()/end() somewhere,
> or I'm misreading things.
> 
> It's curious that these warnings only appeared after 41401ac67791; apparently this
> is more messy than it seems.
> 
> Any clues welcome..

Looks like this is a replay of f41ed88cbd ("drm/amdgpu/display: use GFP_ATOMIC in dcn20_validate_bandwidth_internal"), but this time for dcn21..which still uses
GFP_KERNEL. I'll send a patch.

-h


More information about the amd-gfx mailing list