[PATCH RESEND] drm/amd/display: Fix nested FPU context in dcn21_validate_bandwidth()

Holger Hoffstätte holger at applied-asynchrony.com
Fri Mar 5 12:23:14 UTC 2021


On 2021-03-05 12:39, Holger Hoffstätte wrote:
> 
> Commit 41401ac67791 added FPU wrappers to dcn21_validate_bandwidth(),
> which was correct. Unfortunately a nested function alredy contained
> DC_FP_START()/DC_FP_END() calls, which results in nested FPU context
> enter/exit and complaints by kernel_fpu_begin_mask().
> This can be observed e.g. with 5.10.20, which backported 41401ac67791
> and now emits the following warning on boot:
> 
> WARNING: CPU: 6 PID: 858 at arch/x86/kernel/fpu/core.c:129 kernel_fpu_begin_mask+0xa5/0xc0
> Call Trace:
>   dcn21_calculate_wm+0x47/0xa90 [amdgpu]
>   dcn21_validate_bandwidth_fp+0x15d/0x2b0 [amdgpu]
>   dcn21_validate_bandwidth+0x29/0x40 [amdgpu]
>   dc_validate_global_state+0x3c7/0x4c0 [amdgpu]
> 
> The warning is emitted due to the additional DC_FP_START/END calls in
> patch_bounding_box(), which is inlined into dcn21_calculate_wm(),
> its only caller. Removing the calls brings the code in line with
> dcn20 and makes the warning disappear.
> 
> Fixes: 41401ac67791 ("drm/amd/display: Add FPU wrappers to dcn21_validate_bandwidth()")
> Signed-off-by: Holger Hoffstätte <holger at applied-asynchrony.com>
> ---
>   drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c | 4 ----
>   1 file changed, 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c b/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c
> index 072f8c880924..68be73fe2e23 100644
> --- a/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c
> +++ b/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c
> @@ -1062,8 +1062,6 @@ static void patch_bounding_box(struct dc *dc, struct _vcs_dpi_soc_bounding_box_s
>   {
>       int i;
> 
> -    DC_FP_START();
> -
>       if (dc->bb_overrides.sr_exit_time_ns) {
>           for (i = 0; i < WM_SET_COUNT; i++) {
>                 dc->clk_mgr->bw_params->wm_table.entries[i].sr_exit_time_us =
> @@ -1088,8 +1086,6 @@ static void patch_bounding_box(struct dc *dc, struct _vcs_dpi_soc_bounding_box_s
>                   dc->bb_overrides.dram_clock_change_latency_ns / 1000.0;
>           }
>       }
> -
> -    DC_FP_END();
>   }
> 
>   void dcn21_calculate_wm(

Hmm..this is getting confusing since I was just greeted by the following for
no obvious reason (probably when playing a browser video or something):

Mar 5 12:38] ------------[ cut here ]------------
[  +0.000006] WARNING: CPU: 8 PID: 3803 at arch/x86/kernel/fpu/core.c:155 kernel_fpu_end+0x19/0x20
[  +0.000001] Modules linked in: auth_rpcgss nfsv4 dns_resolver lz4 lz4_compress lz4_decompress nfs lockd grace nfs_ssc sunrpc tcp_bbr2 iwlmvm pkcs8_key_parser amdgpu mac80211 lm92 libarc4 snd_hda_codec_realtek wmi_bmof drivetemp iommu_v2 snd_hda_codec_generic gpu_sched ttm i2c_algo_bit btusb btrtl drm_kms_helper snd_hda_codec_hdmi btbcm btintel uvcvideo cec videobuf2_vmalloc videobuf2_memops iwlwifi videobuf2_v4l2 edac_mce_amd snd_hda_intel videobuf2_common crct10dif_pclmul snd_intel_dspcfg crc32_pclmul drm bluetooth crc32c_intel snd_hda_codec videodev ghash_clmulni_intel syscopyarea snd_rn_pci_acp3x snd_hwdep sysfillrect ecdh_generic rapl serio_raw mc ecc snd_hda_core k10temp sysimgblt snd_pci_acp3x fb_sys_fops i2c_piix4 cfg80211 snd_pcm snd_timer r8169 ccp ipmi_devintf ipmi_msghandler realtek thinkpad_acpi ucsi_acpi typec_ucsi snd typec soundcore wmi ledtrig_audio rfkill ac battery video i2c_scmi pinctrl_amd button
[  +0.000036] CPU: 8 PID: 3803 Comm: X Not tainted 5.10.20 #1
[  +0.000001] Hardware name: LENOVO 20U50001GE/20U50001GE, BIOS R19ET32W (1.16 ) 01/26/2021
[  +0.000001] RIP: 0010:kernel_fpu_end+0x19/0x20
[  +0.000001] Code: ae 47 40 b8 01 00 00 00 c3 0f 0b eb d7 0f 0b eb c9 0f 1f 44 00 00 65 8a 05 dc 42 ff 7e 84 c0 74 09 65 c6 05 d0 42 ff 7e 00 c3 <0f> 0b eb f3 0f 1f 00 0f 1f 44 00 00 8b 15 95 d2 03 02 31 f6 e8 0e
[  +0.000001] RSP: 0018:ffffc900007b78d0 EFLAGS: 00010246
[  +0.000001] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000027d46
[  +0.000000] RDX: 0000000000027d45 RSI: ffffffffa0d6873d RDI: 000000000002ab00
[  +0.000001] RBP: ffff888349ac0000 R08: 0000000000000480 R09: 00000000000003bf
[  +0.000001] R10: ffffc900007b77e8 R11: 0000000000000000 R12: 0000000000000001
[  +0.000000] R13: ffff88810b2e0000 R14: 0000000000000002 R15: 0000000080000000
[  +0.000001] FS:  00007f6f002558c0(0000) GS:ffff8883ff600000(0000) knlGS:0000000000000000
[  +0.000001] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  +0.000000] CR2: 00007f255134f8d0 CR3: 000000010431a000 CR4: 0000000000350ee0
[  +0.000001] Call Trace:
[  +0.000053]  dcn21_validate_bandwidth+0x31/0x40 [amdgpu]
[  +0.000028]  dc_commit_updates_for_stream+0x9d9/0x2aa0 [amdgpu]
[  +0.000033]  amdgpu_dm_atomic_commit_tail+0x1374/0x2260 [amdgpu]
[  +0.000005]  commit_tail+0x8f/0x120 [drm_kms_helper]
[  +0.000003]  drm_atomic_helper_commit+0x1d3/0x200 [drm_kms_helper]
[  +0.000005]  drm_mode_obj_set_property_ioctl+0x118/0x380 [drm]
[  +0.000004]  ? drm_mode_obj_find_prop_id+0x40/0x40 [drm]
[  +0.000003]  drm_ioctl_kernel+0x8a/0x120 [drm]
[  +0.000004]  drm_ioctl+0x1f1/0x3b0 [drm]
[  +0.000003]  ? drm_mode_obj_find_prop_id+0x40/0x40 [drm]
[  +0.000019]  amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
[  +0.000002]  __x64_sys_ioctl+0x152/0x920
[  +0.000002]  ? _copy_from_user+0x28/0x60
[  +0.000002]  ? restore_altstack+0x19/0xd0
[  +0.000003]  do_syscall_64+0x2d/0x40
[  +0.000002]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  +0.000001] RIP: 0033:0x7f6f007549b7
[  +0.000002] Code: 1f 40 00 48 89 d8 49 8d 3c 1c 48 f7 d8 49 39 c4 72 b1 e8 0c ff ff ff 85 c0 78 b6 5b 4c 89 e0 5d 41 5c c3 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 79 c4 0c 00 f7 d8 64 89 01 48
[  +0.000000] RSP: 002b:00007ffe9c0f4788 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  +0.000001] RAX: ffffffffffffffda RBX: 00007ffe9c0f47c0 RCX: 00007f6f007549b7
[  +0.000000] RDX: 00007ffe9c0f47c0 RSI: 00000000c01864ba RDI: 000000000000000b
[  +0.000001] RBP: 00000000c01864ba R08: 000000000000006d R09: 00000000cccccccc
[  +0.000000] R10: 0000000000000fff R11: 0000000000000246 R12: 000055a9b98d6720
[  +0.000000] R13: 000000000000000b R14: 0000000000000000 R15: 0000000000000003
[  +0.000001] ---[ end trace 9f0368711896f6eb ]---

..which indicates that there is another spurious kernel_fpu_begin()/end() somewhere,
or I'm misreading things.

It's curious that these warnings only appeared after 41401ac67791; apparently this
is more messy than it seems.

Any clues welcome..

-h


More information about the amd-gfx mailing list