[Intel-gfx] [PATCH v4 9/9] drm/i915: Add missing deinitialization cases of load failure

Souza, Jose jose.souza at intel.com
Tue Apr 14 20:50:30 UTC 2020


On Tue, 2020-04-14 at 21:29 +0300, Imre Deak wrote:
> On Mon, Apr 13, 2020 at 09:45:15AM -0700, José Roberto de Souza
> wrote:
> > The intel_display_power_put_async() used in TC cold sequences made
> > easy to hit the missing deinitialization of driver in case of load
> > failure as seen in the stack trace bellow.
> > 
> > intel_modeset_driver_remove_noirq() had to be removed from
> > i915_driver_modeset_remove_noirq() as those are different
> > initialialition steps with IRQ in between then.
> 
> Looks ok with some comments below, but this fix should be applied
> before
> the rest of the changes in the patchset, could you resend it
> separately?

Sure

> 
> Is it needed in -fixes too?
> 
> > [drm:__intel_engine_init_ctx_wa [i915]] Initialized 3 context
> > workarounds on rcs'0
> > [drm:__i915_inject_probe_error [i915]] Injecting failure -19 at
> > checkpoint 36 [__uc_init:294]
> > [drm:i915_hdcp_component_unbind [i915]] I915 HDCP comp unbind
> > [drm:edp_panel_vdd_off_sync [i915]] Turning [ENCODER:275:DDI A] VDD
> > off
> > [drm:edp_panel_vdd_off_sync [i915]] PP_STATUS: 0x00000000
> > PP_CONTROL: 0x00000060
> > [drm:intel_power_well_disable [i915]] disabling AUX A
> > general protection fault, probably for non-canonical address
> > 0x6b6b6b6b6b6b6b6b: 0000 [#1] PREEMPT SMP NOPTI
> > CPU: 3 PID: 1142 Comm: kworker/u16:20 Tainted:
> > G     U            5.6.0-CI-Patchwork_17226+ #1
> > Hardware name: Intel Corporation Tiger Lake Client
> > Platform/TigerLake U DDR4 SODIMM RVP, BIOS
> > TGLSFWI1.R00.2457.A16.1912270059 12/27/2019
> > Workqueue: events_unbound intel_display_power_put_async_work [i915]
> > RIP: 0010:__intel_display_power_put_domain+0xa5/0x180 [i915]
> > Code: 48 85 c0 78 54 44 89 e1 41 bd 01 00 00 00 49 c7 c4 80 44 41
> > a0 49 d3 e5 eb 0d 48 83 eb 10 48 3b 9d 08 ad 00 00 78 32 48 8b 03
> > <4c> 85 68 10 74 ea 8b 53 08 85 d2 74 2d 83 ea 01 85 d2 89 53 08 75
> > RSP: 0018:ffffc9000061fdb0 EFLAGS: 00010206
> > RAX: 6b6b6b6b6b6b6b6b RBX: ffff8884948f5df0 RCX: 000000000000003d
> > RDX: 0000000080000001 RSI: 0000000000000000 RDI: 0000000000000000
> > RBP: ffff888479be0000 R08: ffff88849a180920 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa0414480
> > R13: 2000000000000000 R14: ffff888479beb320 R15: 2000000000000000
> > FS:  0000000000000000(0000) GS:ffff88849ff80000(0000)
> > knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00005634fa8ed670 CR3: 0000000005610004 CR4: 0000000000760ee0
> > PKRU: 55555554
> > Call Trace:
> >  release_async_put_domains+0x9b/0x110 [i915]
> >  intel_display_power_put_async_work+0x91/0xf0 [i915]
> >  process_one_work+0x260/0x600
> >  ? worker_thread+0xc9/0x380
> >  worker_thread+0x37/0x380
> >  ? process_one_work+0x600/0x600
> >  kthread+0x119/0x130
> >  ? kthread_park+0x80/0x80
> >  ret_from_fork+0x24/0x50
> > Modules linked in: i915(+) vgem snd_hda_codec_hdmi mei_hdcp
> > x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul
> > cdc_ether usbnet mii snd_intel_dspcfg ghash_clmulni_intel
> > snd_hda_codec snd_hwdep snd_hda_core e1000e ptp mei_me snd_pcm
> > pps_core mei intel_lpss_pci prime_numbers [last unloaded: i915]
> > ---[ end trace b402d1b4060f8b97 ]---
> > BUG: sleeping function called from invalid context at
> > kernel/sched/completion.c:99
> > in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 1142, name:
> > kworker/u16:20
> > INFO: lockdep is turned off.
> > Preemption disabled at:
> > [<0000000000000000>] 0x0
> > CPU: 3 PID: 1142 Comm: kworker/u16:20 Tainted:
> > G     UD           5.6.0-CI-Patchwork_17226+ #1
> > Hardware name: Intel Corporation Tiger Lake Client
> > Platform/TigerLake U DDR4 SODIMM RVP, BIOS
> > TGLSFWI1.R00.2457.A16.1912270059 12/27/2019
> > Workqueue: events_unbound intel_display_power_put_async_work [i915]
> > Call Trace:
> >  dump_stack+0x71/0x9b
> >  ___might_sleep+0x178/0x260
> >  wait_for_completion+0x37/0x1a0
> >  virt_efi_query_variable_info+0x161/0x1b0
> >  efi_query_variable_store+0xb3/0x1a0
> >  ? efivar_entry_set_safe+0x19c/0x220
> >  efivar_entry_set_safe+0x19c/0x220
> >  ? efi_pstore_write+0x10b/0x150
> >  ? efi_pstore_write+0xa0/0x150
> >  efi_pstore_write+0x10b/0x150
> >  pstore_dump+0x123/0x340
> >  kmsg_dump+0x87/0x1b0
> >  oops_end+0x3e/0x90
> >  do_general_protection+0x1c3/0x2f0
> >  general_protection+0x2d/0x40
> > RIP: 0010:__intel_display_power_put_domain+0xa5/0x180 [i915]
> > Code: 48 85 c0 78 54 44 89 e1 41 bd 01 00 00 00 49 c7 c4 80 44 41
> > a0 49 d3 e5 eb 0d 48 83 eb 10 48 3b 9d 08 ad 00 00 78 32 48 8b 03
> > <4c> 85 68 10 74 ea 8b 53 08 85 d2 74 2d 83 ea 01 85 d2 89 53 08 75
> > RSP: 0018:ffffc9000061fdb0 EFLAGS: 00010206
> > RAX: 6b6b6b6b6b6b6b6b RBX: ffff8884948f5df0 RCX: 000000000000003d
> > RDX: 0000000080000001 RSI: 0000000000000000 RDI: 0000000000000000
> > RBP: ffff888479be0000 R08: ffff88849a180920 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000000 R12: ffffffffa0414480
> > R13: 2000000000000000 R14: ffff888479beb320 R15: 2000000000000000
> >  release_async_put_domains+0x9b/0x110 [i915]
> >  intel_display_power_put_async_work+0x91/0xf0 [i915]
> >  process_one_work+0x260/0x600
> >  ? worker_thread+0xc9/0x380
> >  worker_thread+0x37/0x380
> >  ? process_one_work+0x600/0x600
> >  kthread+0x119/0x130
> >  ? kthread_park+0x80/0x80
> >  ret_from_fork+0x24/0x50
> > ------------[ cut here ]------------
> > WARNING: CPU: 3 PID: 1142 at kernel/rcu/tree_plugin.h:293
> > rcu_note_context_switch+0x87/0x650
> > Modules linked in: i915(+) vgem snd_hda_codec_hdmi mei_hdcp
> > x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul
> > cdc_ether usbnet mii snd_intel_dspcfg ghash_clmulni_intel
> > snd_hda_codec snd_hwdep snd_hda_core e1000e ptp mei_me snd_pcm
> > pps_core mei intel_lpss_pci prime_numbers [last unloaded: i915]
> > 
> > Cc: Imre Deak <imre.deak at intel.com>
> > Cc: Ville Syrjälä <ville.syrjala at linux.intel.com>
> > Cc: Jani Nikula <jani.nikula at intel.com>
> > Signed-off-by: José Roberto de Souza <jose.souza at intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_drv.c | 15 +++++++++------
> >  1 file changed, 9 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_drv.c
> > b/drivers/gpu/drm/i915/i915_drv.c
> > index a7a3b4b98572..9727689e264b 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.c
> > +++ b/drivers/gpu/drm/i915/i915_drv.c
> > @@ -248,8 +248,11 @@ static int
> > i915_driver_modeset_probe_noirq(struct drm_i915_private *i915)
> >  	return 0;
> >  
> >  cleanup_vga_client:
> > +	intel_csr_ucode_fini(i915);
> > +	intel_power_domains_driver_remove(i915);
> >  	intel_vga_unregister(i915);
> >  out:
> > +	intel_bios_driver_remove(i915);
> 
> It's also called now incorrectly, needs its own label.

Thanks.
Easier just return instead of goto in case of error in
drm_vblank_init().

> 
> >  	return ret;
> >  }
> >  
> > @@ -308,13 +311,13 @@ static void i915_driver_modeset_remove(struct
> > drm_i915_private *i915)
> >  /* part #2: call after irq uninstall */
> >  static void i915_driver_modeset_remove_noirq(struct
> > drm_i915_private *i915)
> >  {
> > -	intel_modeset_driver_remove_noirq(i915);
> > +	intel_csr_ucode_fini(i915);
> >  
> > -	intel_bios_driver_remove(i915);
> > +	intel_power_domains_driver_remove(i915);
> >  
> >  	intel_vga_unregister(i915);
> >  
> > -	intel_csr_ucode_fini(i915);
> > +	intel_bios_driver_remove(i915);
> >  }
> >  
> >  static void intel_init_dpio(struct drm_i915_private *dev_priv)
> > @@ -994,7 +997,7 @@ int i915_driver_probe(struct pci_dev *pdev,
> > const struct pci_device_id *ent)
> >  out_cleanup_irq:
> >  	intel_irq_uninstall(i915);
> >  out_cleanup_modeset:
> > -	/* FIXME */
> > +	i915_driver_modeset_remove_noirq(i915);
> >  out_cleanup_hw:
> >  	i915_driver_hw_remove(i915);
> >  	intel_memory_regions_driver_release(i915);
> > @@ -1031,13 +1034,13 @@ void i915_driver_remove(struct
> > drm_i915_private *i915)
> >  
> >  	intel_irq_uninstall(i915);
> >  
> > +	intel_modeset_driver_remove_noirq(i915);
> > +
> >  	i915_driver_modeset_remove_noirq(i915);
> >  
> >  	i915_reset_error_state(i915);
> >  	i915_gem_driver_remove(i915);
> >  
> > -	intel_power_domains_driver_remove(i915);
> > -
> 
> Reordering the above wrt. i915_gem_driver_remove() makes it not
> symmetric
> with i915_gem_init(), which happens after
> intel_power_domains_init_hw().
> GEM may use now some power domain after we deinited power domains.

Hum so moving i915_gem_driver_remove() before
i915_driver_modeset_remove_noirq() will fix it and will match the error
handling in i915_driver_modeset_probe()

void i915_driver_remove(struct drm_i915_private *i915)
{
	disable_rpm_wakeref_asserts(&i915->runtime_pm);

	i915_driver_unregister(i915);

	/* Flush any external code that still may be under the RCU lock
*/
	synchronize_rcu();

	i915_gem_suspend(i915);

	drm_atomic_helper_shutdown(&i915->drm);

	intel_gvt_driver_remove(i915);

	i915_driver_modeset_remove(i915);

	intel_irq_uninstall(i915);

	intel_modeset_driver_remove_noirq(i915);

	i915_gem_driver_remove(i915);

	i915_driver_modeset_remove_noirq(i915);

	i915_reset_error_state(i915);

	i915_driver_hw_remove(i915);

	enable_rpm_wakeref_asserts(&i915->runtime_pm);
}

Will run some tests before send it separated.

Thanks for the comments.

> 
> >  	i915_driver_hw_remove(i915);
> >  
> >  	enable_rpm_wakeref_asserts(&i915->runtime_pm);
> > -- 
> > 2.26.0
> > 


More information about the Intel-gfx mailing list