Linux 6.12.4 - crash dma_alloc_attrs+0x12b via ipu6

Stanislaw Gruszka stanislaw.gruszka at linux.intel.com
Tue Dec 10 15:18:32 UTC 2024


On Tue, Dec 10, 2024 at 01:37:11PM +0100, Greg Kroah-Hartman wrote:
> On Tue, Dec 10, 2024 at 02:24:56PM +0200, Jani Nikula wrote:
> > On Tue, 10 Dec 2024, Genes Lists <lists at sapience.com> wrote:
> > > On Tue, 2024-12-10 at 10:58 +0200, Jani Nikula wrote:
> > >> On Tue, 10 Dec 2024, Sakari Ailus <sakari.ailus at linux.intel.com>
> > >> wrote:
> > >> > Hi,
> > >> > 
> > >> > > ...
> > >> > > FYI 6.12.4 got a crash shortly after booting in dma_alloc_attrs -
> > >> > > maybe
> > >> > > triggered in ipu6_probe. Crash only happened on laptop with ipu6.
> > >> > > All
> > >> > > other machines are running fine.
> > >> > 
> > >> > Have you read the dmesg further than the IPU6 related warning? The
> > >> > IPU6
> > >> > driver won't work (maybe not even probe?) but if the system
> > >> > crashes, it
> > >> > appears unlikely the IPU6 drivers would have something to do with
> > >> > that.
> > >> > Look for warnings on linked list corruption later, they seem to be
> > >> > coming
> > >> > from the i915 driver.
> > >> 
> > >> And the list corruption is actually happening in
> > >> cpu_latency_qos_update_request(). I don't see any i915 changes in
> > >> 6.12.4
> > >> that could cause it.
> > >> 
> > >> I guess the question is, when did it work? Did 6.12.3 work?
> > >> 
> > >> 
> > >> BR,
> > >> Jani.
> > >
> > >
> > >  - 6.12.1 worked
> > >
> > >  - mainline - works (but only with i915 patch set [1] otherwise there
> > > are no graphics at all)
> > >
> > >     [1] https://patchwork.freedesktop.org/series/141911/
> > >
> > > - 6.12.3 - crashed (i see i915 not ipu6) and again it has       
> > >     cpu_latency_qos_update_request+0x61/0xc0
> > 
> > Thanks for testing.
> > 
> > There are no changes to either i915 or kernel/power between 6.12.1 and
> > 6.12.4.
> > 
> > There are some changes to drm core, but none that could explain this.
> > 
> > Maybe try the same kernels a few more times to see if it's really
> > deterministic? Not that I have obvious ideas where to go from there, but
> > it's a clue nonetheless.
> 
> 'git bisect' would be nice to run if possible...

I've reproduced the issue. It's caused by 6.12.y commit:

commit 6ac269abab9ca5ae910deb2d3ca54351c3467e99
Author: Bingbu Cao <bingbu.cao at intel.com>
Date:   Wed Oct 16 15:53:01 2024 +0800

    media: ipu6: not override the dma_ops of device in driver

    [ Upstream commit daabc5c64703432c4a8798421a3588c2c142c51b ]


It makes alloc_fw_msg_bufs() fail on isys_probe() 

	cpu_latency_qos_add_request(&isys->pm_qos, PM_QOS_DEFAULT_VALUE);

	ret = alloc_fw_msg_bufs(isys, 20);
	if (ret < 0)
		goto out_remove_pkg_dir_shared_buffer;

And on error path we do not call cpu_latency_qos_remove_request() 
what cause pm_qos_request list corruption (it is memory use
after free bug).

The problem will disappear after applying:
https://lore.kernel.org/stable/20241209175416.59433-1-stanislaw.gruszka@linux.intel.com/
since the allocation will not longer fail.

But we also need to handle fail case correctly by adding
cpu_latency_qos_remove_request() on error path. This requires
mainline fix, I'll post it. 

Regards
Stanislaw


More information about the Intel-gfx mailing list