[Intel-gfx] [PATCH v1] drm/i915: fix null pointer dereference
Ville Syrjälä
ville.syrjala at linux.intel.com
Wed Feb 9 10:31:33 UTC 2022
On Wed, Feb 09, 2022 at 02:02:05AM +0000, Sripada, Radhakrishna wrote:
>
>
> > -----Original Message-----
> > From: Łukasz Bartosik <lb at semihalf.com>
> > Sent: Tuesday, February 8, 2022 8:20 AM
> > To: Jani Nikula <jani.nikula at linux.intel.com>; Joonas Lahtinen
> > <joonas.lahtinen at linux.intel.com>; Vivi, Rodrigo <rodrigo.vivi at intel.com>;
> > Tvrtko Ursulin <tvrtko.ursulin at linux.intel.com>
> > Cc: Sripada, Radhakrishna <radhakrishna.sripada at intel.com>; intel-
> > gfx at lists.freedesktop.org; upstream at semihalf.com; Ville Syrjälä
> > <ville.syrjala at linux.intel.com>; Roper, Matthew D
> > <matthew.d.roper at intel.com>; Srivatsa, Anusha <anusha.srivatsa at intel.com>
> > Subject: Re: [PATCH v1] drm/i915: fix null pointer dereference
> >
> > Have you had a chance to review the patch ? The crash is still there
> > on v5.17-rc3.
> >
> > Thanks,
> > Lukasz
> >
> > wt., 1 lut 2022 o 16:49 Jani Nikula <jani.nikula at linux.intel.com> napisał(a):
> > >
> > >
> > > Thanks for the patch, adding some Cc's from the commit that regressed.
> > >
> > > BR,
> > > Jani.
> > >
> > > On Tue, 01 Feb 2022, Lukasz Bartosik <lb at semihalf.com> wrote:
> > > > From: Łukasz Bartosik <lb at semihalf.com>
> > > >
> > > > Asus chromebook CX550 crashes during boot on v5.17-rc1 kernel.
> > > > The root cause is null pointer defeference of bi_next
> > > > in tgl_get_bw_info() in drivers/gpu/drm/i915/display/intel_bw.c.
> > > >
> > > > BUG: kernel NULL pointer dereference, address: 000000000000002e
> > > > PGD 0 P4D 0
> > > > Oops: 0002 [#1] PREEMPT SMP NOPTI
> > > > CPU: 0 PID: 1 Comm: swapper/0 Tainted: G U 5.17.0-rc1
> > > > Hardware name: Google Delbin/Delbin, BIOS Google_Delbin.13672.156.3
> > 05/14/2021
> > > > RIP: 0010:tgl_get_bw_info+0x2de/0x510
> > > > ...
> > > > [ 2.554467] Call Trace:
> > > > [ 2.554467] <TASK>
> > > > [ 2.554467] intel_bw_init_hw+0x14a/0x434
> > > > [ 2.554467] ? _printk+0x59/0x73
> > > > [ 2.554467] ? _dev_err+0x77/0x91
> > > > [ 2.554467] i915_driver_hw_probe+0x329/0x33e
> > > > [ 2.554467] i915_driver_probe+0x4c8/0x638
> > > > [ 2.554467] i915_pci_probe+0xf8/0x14e
> > > > [ 2.554467] ? _raw_spin_unlock_irqrestore+0x12/0x2c
> > > > [ 2.554467] pci_device_probe+0xaa/0x142
> > > > [ 2.554467] really_probe+0x13f/0x2f4
> > > > [ 2.554467] __driver_probe_device+0x9e/0xd3
> > > > [ 2.554467] driver_probe_device+0x24/0x7c
> > > > [ 2.554467] __driver_attach+0xba/0xcf
> > > > [ 2.554467] ? driver_attach+0x1f/0x1f
> > > > [ 2.554467] bus_for_each_dev+0x8c/0xc0
> > > > [ 2.554467] bus_add_driver+0x11b/0x1f7
> > > > [ 2.554467] driver_register+0x60/0xea
> > > > [ 2.554467] ? mipi_dsi_bus_init+0x16/0x16
> > > > [ 2.554467] i915_init+0x2c/0xb9
> > > > [ 2.554467] ? mipi_dsi_bus_init+0x16/0x16
> > > > [ 2.554467] do_one_initcall+0x12e/0x2b3
> > > > [ 2.554467] do_initcall_level+0xd6/0xf3
> > > > [ 2.554467] do_initcalls+0x4e/0x79
> > > > [ 2.554467] kernel_init_freeable+0xed/0x14d
> > > > [ 2.554467] ? rest_init+0xc1/0xc1
> > > > [ 2.554467] kernel_init+0x1a/0x120
> > > > [ 2.554467] ret_from_fork+0x1f/0x30
> > > > [ 2.554467] </TASK>
> > > > ...
> > > > Kernel panic - not syncing: Fatal exception
> > > >
> > > > Fixes: c64a9a7c05be ("drm/i915: Update memory bandwidth formulae")
>
> LGTM,
> Reviewed-by: Radhakrishna Sripada <radhakrishna.sripada at intel.com>
>
> > > > Signed-off-by: Łukasz Bartosik <lb at semihalf.com>
> > > > ---
> > > > drivers/gpu/drm/i915/display/intel_bw.c | 16 +++++++++-------
> > > > 1 file changed, 9 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/i915/display/intel_bw.c
> > b/drivers/gpu/drm/i915/display/intel_bw.c
> > > > index 2da4aacc956b..bd0ed68b7faa 100644
> > > > --- a/drivers/gpu/drm/i915/display/intel_bw.c
> > > > +++ b/drivers/gpu/drm/i915/display/intel_bw.c
> > > > @@ -404,15 +404,17 @@ static int tgl_get_bw_info(struct
> > drm_i915_private *dev_priv, const struct intel
> > > > int clpchgroup;
> > > > int j;
> > > >
> > > > - if (i < num_groups - 1)
> > > > - bi_next = &dev_priv->max_bw[i + 1];
> > > > -
> > > > clpchgroup = (sa->deburst * qi.deinterleave / num_channels) << i;
> > > >
> > > > - if (i < num_groups - 1 && clpchgroup < clperchgroup)
> > > > - bi_next->num_planes = (ipqdepth - clpchgroup) / clpchgroup + 1;
> > > > - else
> > > > - bi_next->num_planes = 0;
> > > > + if (i < num_groups - 1) {
> > > > + bi_next = &dev_priv->max_bw[i + 1];
> > > > +
> > > > + if (clpchgroup < clperchgroup)
> > > > + bi_next->num_planes = (ipqdepth - clpchgroup) /
> > > > + clpchgroup + 1;
> > > > + else
> > > > + bi_next->num_planes = 0;
> > > > + }
BTW this code makes me rather suspicious overall. num_planes==0 means
no planes can be enabled at all. Is that even correct? IIRC the icl
code did not have qgv points that had num_planes==0.
Also IIRC I added that 'num_planes = ... + 1' to the icl code
ot make it actually sensible. The icl sample code didn't have
that +1 and instead it used '>' as opposed to '>=' in the
comparison to the actual number of enabled planes thus
implying the +1. But now here in the tgl+ code we have the
+1 for in one branch of the if, but the other branch just has
a 0 (so no +1).
And it doesn't help that the code is doing this weird [i] + [i+1]
stuff inside the single loop. Would be a lot more legible if we
just did two loops I think. Though I see the same awkward construct
is used in spec sample code as well.
--
Ville Syrjälä
Intel
More information about the Intel-gfx
mailing list