[PATCH v4 00/14] drm: Add a driver for CSF-based Mali GPUs

Danilo Krummrich dakr at redhat.com
Mon Feb 5 09:54:05 UTC 2024


On 2/5/24 10:03, Boris Brezillon wrote:
> +Danilo for the panthor gpuvm-needs update.
> 
> On Sun, 4 Feb 2024 09:14:44 +0800 (CST)
> "Andy Yan" <andyshrk at 163.com> wrote:
> 
>> Hi Boris:
>> I saw this warning sometimes(Run on a armbain based bookworm),not sure is a know issue or something else。
>> [15368.293031] systemd-journald[715]: Received client request to relinquish /var/log/journal/1bc4a340506142af9bd31a6a3d2170ba access.
>> [37743.040737] ------------[ cut here ]------------
>> [37743.040764] panthor fb000000.gpu: drm_WARN_ON(shmem->pages_use_count)
>> [37743.040890] WARNING: CPU: 2 PID: 5702 at drivers/gpu/drm/drm_gem_shmem_helper.c:158 drm_gem_shmem_free+0x144/0x14c [drm_shmem_helper]
>> [37743.040929] Modules linked in: joydev rfkill sunrpc lz4hc lz4 zram binfmt_misc hantro_vpu crct10dif_ce v4l2_vp9 v4l2_h264 snd_soc_simple_amplifier v4l2_mem2mem videobuf2_dma_contig snd_soc_es8328_i2c videobuf2_memops rk_crypto2 snd_soc_es8328 videobuf2_v4l2 sm3_generic videodev crypto_engine sm3 rockchip_rng videobuf2_common nvmem_rockchip_otp snd_soc_rockchip_i2s_tdm snd_soc_hdmi_codec snd_soc_simple_card mc snd_soc_simple_card_utils snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine snd_pcm snd_timer snd soundcore dm_mod ip_tables x_tables autofs4 dw_hdmi_qp_i2s_audio dw_hdmi_qp_cec rk808_regulator rockchipdrm dw_mipi_dsi dw_hdmi_qp dw_hdmi analogix_dp drm_dma_helper fusb302 display_connector rk8xx_spi drm_display_helper phy_rockchip_snps_pcie3 phy_rockchip_samsung_hdptx_hdmi panthor tcpm rk8xx_core cec drm_gpuvm gpu_sched drm_kms_helper drm_shmem_helper drm_exec r8169 drm pwm_bl adc_keys
>> [37743.041108] CPU: 2 PID: 5702 Comm: kworker/u16:8 Not tainted 6.8.0-rc1-edge-rockchip-rk3588 #2
>> [37743.041115] Hardware name: Rockchip RK3588 EVB1 V10 Board (DT)
>> [37743.041120] Workqueue: panthor-cleanup panthor_vm_bind_job_cleanup_op_ctx_work [panthor]
>> [37743.041151] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> [37743.041157] pc : drm_gem_shmem_free+0x144/0x14c [drm_shmem_helper]
>> [37743.041169] lr : drm_gem_shmem_free+0x144/0x14c [drm_shmem_helper]
>> [37743.041181] sp : ffff80008d37bcc0
>> [37743.041184] x29: ffff80008d37bcc0 x28: ffff800081d379c0 x27: ffff800081d37000
>> [37743.041196] x26: ffff00019909a280 x25: ffff00019909a2c0 x24: ffff0001017a4c05
>> [37743.041206] x23: dead000000000100 x22: dead000000000122 x21: ffff0001627ac1a0
>> [37743.041217] x20: 0000000000000000 x19: ffff0001627ac000 x18: 0000000000000000
>> [37743.041227] x17: 000000040044ffff x16: 005000f2b5503510 x15: fffffffffff91b77
>> [37743.041238] x14: 0000000000000001 x13: 00000000000003c5 x12: 00000000ffffffea
>> [37743.041248] x11: 00000000ffffdfff x10: 00000000ffffdfff x9 : ffff800081e0e818
>> [37743.041259] x8 : 000000000002ffe8 x7 : c0000000ffffdfff x6 : 00000000000affa8
>> [37743.041269] x5 : 0000000000001fff x4 : 0000000000000000 x3 : ffff8000819a6008
>> [37743.041279] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff00018465e900
>> [37743.041290] Call trace:
>> [37743.041293]  drm_gem_shmem_free+0x144/0x14c [drm_shmem_helper]
>> [37743.041306]  panthor_gem_free_object+0x24/0xa0 [panthor]
>> [37743.041321]  drm_gem_object_free+0x1c/0x30 [drm]
>> [37743.041452]  panthor_vm_bo_put+0xc4/0x12c [panthor]
>> [37743.041475]  panthor_vm_cleanup_op_ctx.constprop.0+0xb0/0x104 [panthor]
>> [37743.041491]  panthor_vm_bind_job_cleanup_op_ctx_work+0x28/0xd0 [panthor]
> 
> Ok, I think I found the culprit: there's a race between
> the drm_gpuvm_bo_put() call in panthor_vm_bo_put() and the list
> iteration done by drm_gpuvm_prepare_objects(). Because we're not
> setting DRM_GPUVM_RESV_PROTECTED, the code goes through the 'lockless'
> iteration loop, and takes/release a vm_bo ref at each iteration. This
> means our 'were we the last vm_bo user?' test in panthor_vm_bo_put()
> might return false even if we were actually the last user, and when
> for_each_vm_bo_in_list() releases the ref it acquired, it not only leaks
> the pin reference, thus leaving GEM pages pinned (which explains this
> WARN_ON() splat), but it also calls drm_gpuvm_bo_destroy() in a path
> where we don't hold the GPUVA list lock, which is bad.

IIUC, GPUVM generally behaves correctly. It's just that the return value
of drm_gpuvm_bo_put() needs to be treated with care.

Maybe we should extend c50a291d621a ("drm/gpuvm: Let drm_gpuvm_bo_put()
report when the vm_bo object is destroyed") by a note explaining this
unexpected case, or, if not required anymore, simply revert the patch.

> 
> Long story short, I'll have to use DRM_GPUVM_RESV_PROTECTED, which is
> fine because I'm deferring vm_bo removal to a work where taking the VM
> resv lock is allowed. Since I was the one asking for this lockless
> iterator in the first place, I wonder if we should kill that and make
> DRM_GPUVM_RESV_PROTECTED the default (this would greatly simplify
> the code). AFAICT, The PowerVR driver shouldn't be impacted because it's
> using drm_gpuvm in synchronous mode only, and Xe already uses the
> resv-protected mode. That leaves Nouveau, but IIRC, it's also doing VM
> updates in the ioctl path.
> 
> Danilo, any opinions?

I agree that we should remove it once we got enough evidence that updating
the VA space in the asynchronous path isn't going to be a thing. I'm not
entirely sure, whether we'll attempt to re-work Nouveau, but I'd like to try
this approach with Nova. Hence, I'd still like to leave it in for a while.

OOC, knowing that you went a little back and forth with one or the other
approach, last I heard was that using DRM_GPUVM_RESV_PROTECTED didn't work
out well with shmem buffers. How did you fix it?

> 
> Andy, I pushed a new version to the panthor-next [1] and
> panthor-next+rk3588 [2] branches. The fix I'm talking about is [3], but
> you probably want to consider taking all the fixups in your branch.
> 
> Regards,
> 
> Boris
> 
> [1]https://gitlab.freedesktop.org/panfrost/linux/-/commits/panthor-next
> [2]https://gitlab.freedesktop.org/panfrost/linux/-/commits/panthor-next+rk3588
> [3]https://gitlab.freedesktop.org/panfrost/linux/-/commit/df48c09662a403275e76e679ee004085badea7c1
> 



More information about the dri-devel mailing list