drm/imx: Crash in drm_mode_config_cleanup
stefan at agner.ch
Thu Sep 13 16:52:00 UTC 2018
On 13.09.2018 01:36, Philipp Zabel wrote:
> Hi Stefan,
> thank you for the report. I think what happens is the following:
Thanks for looking into it!
> On Wed, 2018-09-12 at 15:26 -0700, Stefan Agner wrote:
>> While working on Apalis iMX6 with four displays, I encountered the
>> following crash:
>> [ 3.781163] imx-drm display-subsystem: bound 120000.hdmi (ops dw_hdmi_imx_ops)
>> [ 3.788441] imx-drm display-subsystem: binding ldb (ops imx_ldb_ops)
>> [ 3.794902] imx_ldb_bind, imx_ldb ec0da010
> component_bind calls devres_open_group right before component->ops->bind
> The devmem_kzalloc'ed imx_ldb_channel that contains the encoder is
> allocated in this devres group.
>> [ 3.799818] drm_encoder_init, encoder ec0da388
>> [ 3.804363] drm_encoder_init, ret 0
> drm_encoder_init adds the encoder in imx_ldb to the
>> [ 3.807908] imx_ldb_register, encoder ec0da388
>> [ 3.812432] imx_ldb_register, ret 0
>> [ 3.815951] imx_ldb_bind encoder LVDS-46, ec0da388 ->func c0e6371c
>> [ 3.822227] imx_ldb_bind, done
>> [ 3.825331] imx-drm display-subsystem: bound ldb (ops imx_ldb_ops)
>> [ 3.831614] imx-drm display-subsystem: binding parallel-display-controller1 (ops imx_pd_ops)
>> [ 3.840285] drm_encoder_init, encoder ec8a2b78
>> [ 3.844931] imx-drm display-subsystem: Linked as a consumer to panel
>> [ 3.851472] imx-drm display-subsystem: bound parallel-display-controller1 (ops imx_pd_ops)
>> [ 3.859785] imx-drm display-subsystem: binding parallel-display-controller0 (ops imx_pd_ops)
>> [ 3.868561] imx-drm display-subsystem: failed to bind parallel-display-controller0 (ops imx_pd_ops): -517
>> [ 3.878225] imx-drm display-subsystem: Dropping the link to panel
>> [ 3.884679] imx_ldb_unbind
>> [ 3.887421] imx_ldb_unbind, encoder LVDS-46, ec0da388 ->func c0e6371c
>> [ 3.893950] imx_ldb_unbind, encoder (null), ec0da838 ->func 0
>> [ 3.899723] imx_ldb_unbind, done
> component_unbind calls devres_release_group after component->ops-
>>unbind, freeing imx_ldb and with it the encoder that is still in the
Oh I see, I did not realize that component_unbind releases devres...
>> [ 3.908345] imx_drm_bind, component_bind_all, ret -517
>> [ 3.913638] drm_mode_config_cleanup, encoder TMDS-44, ec861894 ->funcs c0e63a18
>> [ 3.921260] drm_mode_config_cleanup, calling encoder->funcs->destroy!
>> [ 3.927897] drm_encoder_cleanup, encoder ec861894
>> [ 3.932717] drm_mode_config_cleanup, encoder (null), ec0da388 ->funcs 0
> Use after free.
That all makes sense now...
>> [ 3.939356] drm_mode_config_cleanup, calling encoder->funcs->destroy!
>> [ 3.946050] Unable to handle kernel NULL pointer dereference at virtual address 00000004
>> The Apalis iMX6 device tree I work with uses a dumb VGA bridge and a LVDS
>> display directly specified in the ldb node. In the case above I did not
>> compile in the dumb VGA bridge driver, that is the reason why binding
>> parallel-display-controller0 fails.
> I suppose this means that component drivers must not allocate the
> encoder in the component devres group during bind. Not only their own
> failure, but any other component's failure can cause component_unbind to
> be called in the cleanup path of component_bind_all, freeing the memory
> before drm_mode_config_cleanup is called.
Indeed, moving allocation into driver bind fixes the crash!
I wonder, how does devres knows that imx_ldb got allocated in probe and does not free on unbind? I guess that is the group mechanism which makes sure of that?
ldb would not the only driver affected, also drivers/gpu/drm/imx/parallel-display.c allocates encoders in component bind, there are probably more.
Actually, when thinking more about it, if we would have a way to call framework cleanup functions (drm_mode_config_cleanup() in this case) between the bind and unbind loops in component_bind_all(), we would not have that problem.
[adding Russel, since he introduced the component infrastructure]
>> The LVDS encoder (0xec1a0388 in that case) is somehow already cleaned up
>> when calling drm_mode_config_cleanup(), which leads to a null pointer
>> deference when trying to call destroy.
>> I was unable to figure out why the DRM encoder in the second
>> drm_mode_config_cleanup() call is already zeroed out. The component
>> framework calls imx_ldb_unbind() first, but that should not free
>> up the encoder afaict. Any idea?
>> I was also wondering in the deferred probing case, functions like
>> imx_ldb_bind() call devm_kzalloc on every try. Since the underlying
>> device is not removed, doesn't this leads to multiple active allocations?
> See above. To me it looks like we have to allocate imx_ldb in probe and
> deal with possibly partially preinitialzied structure when bind is
> called a second time.
Yeah with the fact that component framework does free devres at unbind time, my concern stated here is null...
More information about the dri-devel