[Intel-gfx] [PATCH] drm/i915: Prevent NULL after failed PPGTT
Ben Widawsky
benjamin.widawsky at intel.com
Fri Nov 15 02:01:44 CET 2013
If an object was bound in the ppgtt, and we do a GPU reset, but the
PPGTT was not brought back up on reset, trying to unbind the object
later will result in a NULL ptr. Ideally this (failed PPGTT) should
never happen, but it is allowed in the code, and therefore we should
prevent the OOPS.
Since Broadwell hangs/reset is still under development, and apparently
so is aliasing PPGTT after rest, this helps alleviate some of the pain.
NOTE: With the coming PPGTT patches this can't ever occur since there if
PPGTT is supposed to come up, and doesn't the driver will fail to load
(since it will make context loading fail).
Here is an example splat:
[ 588.795571] ---[ end trace f23239922ecdffbc ]---
[ 598.427072] [drm] stuck on render ring
[ 598.473116] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[ 598.550946] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[ 598.663996] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[ 598.772830] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[ 598.891172] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[ 599.006668] [drm] Simulated gpu hang, resetting stop_rings
[ 599.084004] [drm:__gen6_gt_force_wake_mt_get] *ERROR* Timed out waiting for forcewake old ack to clear.
[ 599.204258] [drm] PPGTT enable failed. This is not fatal, but unexpected
[ 599.287287] BUG: unable to handle kernel NULL pointer dereference at 0000000000000108
[ 599.389563] IP: [<ffffffffa0559794>] i915_ppgtt_unbind_object+0x14/0x60 [i915]
[ 599.484426] PGD 34ab6067 PUD 50b2a067 PMD 0
[ 599.542964] Oops: 0000 [#1] PREEMPT SMP
[ 599.597171] Modules linked in: i915 drm_kms_helper drm intel_gtt agpgart i2c_algo_bit i2c_core netconsole configfs ext4 x86_pkg_temp_thermal coretemp crc16 mbcache kvm_intel jbd2 kvm ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd microcode serio_raw evdev thermal fan battery e1000e acpi_cpufreq video ptp button pps_core ac processor snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_timer snd soundcore hid_generic usbhid hid btrfs libcrc32c xor raid6_pq sd_mod ehci_pci ehci_hcd ahci libahci crc32c_intel libata usbcore scsi_mod usb_common
[ 600.268119] CPU: 0 PID: 2612 Comm: kms_flip Tainted: G W 3.12.0-BEN+ #38
[ 600.366889] Hardware name: Intel Corporation Broadwell Client platform/WhiteTip Mountain 1, BIOS BDW-E1R1.86C.0048.R02.1310291000 10/29/2013
[ 600.527990] task: ffff88009d28b0c0 ti: ffff88003b5d4000 task.ti: ffff88003b5d4000
[ 600.626121] RIP: 0010:[<ffffffffa0559794>] [<ffffffffa0559794>] i915_ppgtt_unbind_object+0x14/0x60 [i915]
[ 600.750986] RSP: 0018:ffff88003b5d5ca0 EFLAGS: 00010202
[ 600.822380] RAX: 0000000000000004 RBX: ffff88008eac5a40 RCX: 00000000000000fe
[ 600.916281] RDX: 0000000000000000 RSI: ffff88008eac5a40 RDI: ffff88008eac5a40
[ 601.010181] RBP: ffff88003b5d5cb8 R08: 0000000000000000 R09: 0000000000000000
[ 601.104083] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 601.197983] R13: ffff88008eac5b30 R14: ffff880145208000 R15: ffff88008eac5ac8
[ 601.291686] FS: 00007fabc555a8c0(0000) GS:ffff88014f400000(0000) knlGS:0000000000000000
[ 601.397339] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 601.474153] CR2: 0000000000000108 CR3: 000000008eadd000 CR4: 00000000003407f0
[ 601.568035] Stack:
[ 601.599051] ffff88008eac5a40 ffff88005cb96000 ffff88008eac5b30 ffff88003b5d5cf0
[ 601.696409] ffffffffa055003f ffff88008eac5a40 ffff88005cb96000 ffff88008eac5b30
[ 601.793636] ffff880145208000 ffff88008eac5ac8 ffff88003b5d5d30 ffffffffa05513fe
[ 601.890889] Call Trace:
[ 601.927323] [<ffffffffa055003f>] i915_vma_unbind+0x28f/0x340 [i915]
[ 602.011520] [<ffffffffa05513fe>] i915_gem_free_object+0x9e/0x340 [i915]
[ 602.100135] [<ffffffff810b81cd>] ? trace_hardirqs_on+0xd/0x10
[ 602.178010] [<ffffffffa04c248a>] drm_gem_object_free+0x2a/0x30 [drm]
[ 602.263249] [<ffffffffa04c29fa>] drm_gem_object_handle_unreference_unlocked+0x11a/0x130 [drm]
[ 602.375214] [<ffffffffa04c2ae6>] drm_gem_handle_delete+0xd6/0x1d0 [drm]
[ 602.463759] [<ffffffffa04c3358>] drm_gem_close_ioctl+0x28/0x30 [drm]
[ 602.549031] [<ffffffffa04c0d92>] drm_ioctl+0x502/0x640 [drm]
[ 602.625820] [<ffffffff8115ac70>] ? might_fault+0xa0/0xb0
[ 602.698152] [<ffffffff8115ac27>] ? might_fault+0x57/0xb0
[ 602.770831] [<ffffffff8100f0ec>] ? __restore_xstate_sig+0x13c/0x600
[ 602.855035] [<ffffffff811bb6c5>] do_vfs_ioctl+0x305/0x530
[ 602.928680] [<ffffffff811c73a7>] ? fget_light+0x387/0x4f0
[ 603.001415] [<ffffffff811bb971>] SyS_ioctl+0x81/0xa0
[ 603.069506] [<ffffffff814dd6d6>] system_call_fastpath+0x1a/0x1f
[ 603.148332] Code: 89 e7 89 c2 41 ff d6 5b 41 5c 41 5d 41 5e 5d c3 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 49 89 fc 48 89 f7 53 <4d> 8b ac 24 08 01 00 00 48 8b 56 08 48 8b 9e b8 00 00 00 48 8b
[ 603.392564] RIP [<ffffffffa0559794>] i915_ppgtt_unbind_object+0x14/0x60 [i915]
[ 603.487472] RSP <ffff88003b5d5ca0>
[ 603.535732] CR2: 0000000000000108
[ 603.622175] ---[ end trace f23239922ecdffbd ]---
Signed-off-by: Ben Widawsky <ben at bwidawsk.net>
---
drivers/gpu/drm/i915/i915_gem.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 40d9dcf..2b9245d 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2741,6 +2741,13 @@ int i915_vma_unbind(struct i915_vma *vma)
if (obj->has_global_gtt_mapping)
i915_gem_gtt_unbind_object(obj);
+
+ if (unlikely(!dev_priv->mm.aliasing_ppgtt &&
+ obj->has_aliasing_ppgtt_mapping)) {
+ DRM_DEBUG_DRIVER("Leftover PPGTT mapping after reset\n");
+ obj->has_aliasing_ppgtt_mapping = 0;
+ }
+
if (obj->has_aliasing_ppgtt_mapping) {
i915_ppgtt_unbind_object(dev_priv->mm.aliasing_ppgtt, obj);
obj->has_aliasing_ppgtt_mapping = 0;
--
1.8.4.2
More information about the Intel-gfx
mailing list