[PATCH v2 00/10] drm/msm: probe deferral fixes

Steev Klimaszewski steev at kali.org
Tue Sep 13 20:23:10 UTC 2022


Hi Johan,

On 9/13/22 3:53 AM, Johan Hovold wrote:
> The MSM DRM driver is currently broken in multiple ways with respect to
> probe deferral. Not only does the driver currently fail to probe again
> after a late deferral, but due to a related use-after-free bug this also
> triggers NULL-pointer dereferences.
>
> These bugs are not new but have become critical with the release of
> 5.19 where probe is deferred in case the aux-bus EP panel driver has not
> yet been loaded.
>
> The underlying problem is lifetime issues due to careless use of
> device-managed resources.
>
> Specifically, device-managed resources allocated post component bind
> must be tied to the lifetime of the aggregate DRM device or they will
> not necessarily be released when binding of the aggregate device is
> deferred.
>
> The following call chain and pseudo code serves as an illustration of
> the problem:
>
>   - platform_probe(pdev1)
>     - dp_display_probe()
>       - component_add()
>
>   - platform_probe(pdev2) 				// last component
>     - dp_display_probe()					// d0
>         - component_add()
>           - try_to_bring_up_aggregate_device()
> 	   - devres_open_group(adev->parent)		// d1
>
> 	   - msm_drm_bind()
> 	     - msm_drm_init()
> 	       - component_bind_all()
> 	         - for_each_component()
> 		   - component_bind()
> 		     - devres_open_group(&pdev->dev)	// d2
> 	             - dp_display_bind()
> 		       - devm_kzalloc(&pdev->dev)	// a1, OK
> 		     - devres_close_group(&pdev->dev)	// d3
>
> 	       - dpu_kms_hw_init()
> 	         - for_each_panel()
> 	           - msm_dp_modeset_init()
> 		     - dp_display_request_irq()
> 		       - devm_request_irq(&pdev->dev)	// a2, BUG
> 		     - if (pdev == pdev2 && condition)
> 		       - return -EPROBE_DEFER;
>
> 	      - if (error)
>   	        - component_unbind_all()
> 	          - for_each_component()
> 		    - component_unbind()
> 		      - dp_display_unbind()
> 		      - devres_release_group(&pdev->dev) // d4, only a1 is freed
>
>             - if (error)
> 	     - devres_release_group(adev->parent)	// d5
>
> The device-managed allocation a2 is buggy as its lifetime is tied to the
> component platform device and will not be released when the aggregate
> device bind fails (e.g. due to a probe deferral).
>
> When pdev2 is later probed again, the attempt to allocate the IRQ a
> second time will fail for pdev1 (which is still bound to its platform
> driver).
>
> This series fixes the lifetime issues by tying the lifetime of a2 (and
> similar allocations) to the lifetime of the aggregate device so that a2
> is released at d5.
>
> In some cases, such has for the DP IRQ, the above situation can also be
> avoided by moving the allocation in question to the platform driver
> probe (d0) or component bind (between d2 and d3). But as doing so is not
> a general fix, this can be done later as a cleanup/optimisation.
>
> Johan
>
> Changes in v2
>   - use a custom devres action instead of amending the AUX bus interface
>     (Doug)
>   - split sanity check fixes and cleanups per bridge type (Dmitry)
>   - add another Fixes tag for the missing bridge counter reset (Dmitry)
>
>
> Johan Hovold (10):
>    drm/msm: fix use-after-free on probe deferral
>    drm/msm/dp: fix memory corruption with too many bridges
>    drm/msm/dsi: fix memory corruption with too many bridges
>    drm/msm/hdmi: fix memory corruption with too many bridges
>    drm/msm/dp: fix IRQ lifetime
>    drm/msm/dp: fix aux-bus EP lifetime
>    drm/msm/dp: fix bridge lifetime
>    drm/msm/hdmi: fix IRQ lifetime
>    drm/msm/dp: drop modeset sanity checks
>    drm/msm/dsi: drop modeset sanity checks
>
>   drivers/gpu/drm/msm/dp/dp_display.c | 26 +++++++++++++++++++-------
>   drivers/gpu/drm/msm/dp/dp_parser.c  |  6 +++---
>   drivers/gpu/drm/msm/dp/dp_parser.h  |  5 +++--
>   drivers/gpu/drm/msm/dsi/dsi.c       |  9 +++++----
>   drivers/gpu/drm/msm/hdmi/hdmi.c     |  7 ++++++-
>   drivers/gpu/drm/msm/msm_drv.c       |  1 +
>   6 files changed, 37 insertions(+), 17 deletions(-)
>
I've tested this on both sc8180x (Lenovo Flex 5G) and sdm850 (Lenovo 
Yoga C630), and both of them show the same issue:

[ 7.449305] platform ae9a000.displayport-controller: Fixing up cyclic 
dependency with ae01000.mdp [ 7.454344] Unable to handle kernel NULL 
pointer dereference at virtual address 0000000000000008 [ 7.454406] Mem 
abort info: [ 7.454423] ESR = 0x0000000096000004 [ 7.454446] EC = 0x25: 
DABT (current EL), IL = 32 bits [ 7.454475] SET = 0, FnV = 0 [ 7.454494] 
EA = 0, S1PTW = 0 [ 7.454512] FSC = 0x04: level 0 translation fault [ 
7.454539] Data abort info: [ 7.454556] ISV = 0, ISS = 0x00000004 [ 
7.454577] CM = 0, WnR = 0 [ 7.454595] user pgtable: 4k pages, 48-bit 
VAs, pgdp=0000000101504000 [ 7.454629] [0000000000000008] 
pgd=0000000000000000, p4d=0000000000000000 [ 7.454669] Internal error: 
Oops: 96000004 [#1] PREEMPT SMP [ 7.454700] Modules linked in: 
i2c_hid_of i2c_hid leds_qcom_lpg led_class_multicolor rtc_pm8xxx msm 
mdt_loader gpu_sched drm_dp_aux_bus drm_display_helper drm_kms_helper 
drm phy_qcom_edp llcc_qcom i2c_qcom_geni phy_qcom_qmp_combo 
phy_qcom_snps_femto_v2 phy_qcom_qmp_ufs phy_qcom_qmp_pcie ufs_qcom 
pwm_bl [ 7.454860] CPU: 2 PID: 76 Comm: kworker/u16:2 Not tainted 
5.19.0-rc8-next-20220728 #26 [ 7.454902] Hardware name: LENOVO 
82AK/LNVNB161216, BIOS EACN43WW(V1.15) 09/13/2021 [ 7.454941] Workqueue: 
events_unbound deferred_probe_work_func [ 7.454982] pstate: 40400005 
(nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 7.455020] pc : 
dp_display_request_irq+0x50/0xdc [msm] [ 7.455145] lr : 
dp_display_request_irq+0x2c/0xdc [msm] [ 7.455265] sp : ffff800008c1bb30 
[ 7.455285] x29: ffff800008c1bb30 x28: 0000000000000000 x27: 
0000000000000000 [ 7.455327] x26: ffffc9c918420000 x25: ffffc9c9186ec570 
x24: 000000000000003a [ 7.455368] x23: ffffc9c918811d30 x22: 
ffff2a5806baa998 x21: ffff2a5806ba3410 [ 7.455410] x20: ffff2a5806baa880 
x19: ffff2a5806baa998 x18: ffffffffffffffff [ 7.455451] x17: 
0000000000000038 x16: ffffc9c9164eeb24 x15: ffffffffffffffff [ 7.455492] 
x14: ffff2a5806bc3004 x13: ffff2a5806bc3000 x12: 0000000000000000 [ 
7.455533] x11: 0000000000000040 x10: ffffc9c918493080 x9 : 
ffffc9c918493078 [ 7.455574] x8 : ffff2a5800681b88 x7 : 0000000000000000 
x6 : ffff2a5806baa880 [ 7.455616] x5 : ffffc9c8ca2de000 x4 : 
0000000000080004 x3 : 0000000000000000 [ 7.455656] x2 : ffffc9c8ca296000 
x1 : 00000000000000a8 x0 : 0000000000000000 [ 7.455698] Call trace: [ 
7.455714] dp_display_request_irq+0x50/0xdc [msm] [ 7.455834] 
dp_display_probe+0xf8/0x4a4 [msm] [ 7.455950] platform_probe+0x6c/0xc4 [ 
7.455976] really_probe+0xbc/0x2d4 [ 7.455999] 
__driver_probe_device+0x78/0xe0 [ 7.456025] 
driver_probe_device+0x3c/0x13c [ 7.456051] 
__device_attach_driver+0xb8/0x120 [ 7.456077] bus_for_each_drv+0x78/0xd0 
[ 7.456105] __device_attach+0x9c/0x1a0 [ 7.456129] 
device_initial_probe+0x18/0x2c [ 7.456154] bus_probe_device+0x9c/0xa4 [ 
7.456178] deferred_probe_work_func+0x88/0xc0 [ 7.456204] 
process_one_work+0x1d4/0x330 [ 7.456231] worker_thread+0x70/0x42c [ 
7.456255] kthread+0x10c/0x110 [ 7.456278] ret_from_fork+0x10/0x20 [ 
7.456306] Code: aa1403e6 f2a00104 f0000225 f0ffffe2 (f9400400) [ 
7.456341] ---[ end trace 0000000000000000 ]---

This is from the sc8180x, sdm850 is the same call stack, just with 
different addresses.

I do have 
https://lore.kernel.org/all/20220712132258.671263-1-dmitry.baryshkov@linaro.org/ 
applied here which makes the 10th patch not apply cleanly.

It fails actually, but I applied it manually here.



More information about the dri-devel mailing list