[Intel-gfx] [PATCH v4 6/6] drm/i915: fix context/engine cleanup order

Thu Feb 11 15:02:39 UTC 2016

Chris Wilson <chris at chris-wilson.co.uk> writes:

> On Sat, Jan 30, 2016 at 11:17:07AM +0000, Chris Wilson wrote:
>> On Fri, Jan 29, 2016 at 07:19:31PM +0000, Dave Gordon wrote:
>> > From: Nick Hoath <nicholas.hoath at intel.com>
>> > 
>> > Swap the order of context & engine cleanup, so that contexts are cleaned
>> > up first, and *then* engines. This is a more sensible order anyway, but
>> > in particular has become necessary since the 'intel_ring_initialized()
>> > must be simple and inline' patch, which now uses ring->dev as an
>> > 'initialised' flag, so it can now be NULL after engine teardown. This in
>> > turn can cause a problem in the context code, which (used to) check the
>> > ring->dev->struct_mutex -- causing a fault if ring->dev was NULL.
>> > 
>> > Also rename the cleanup function to reflect what it actually does
>> > (cleanup engines, not a ringbuffer), and fix an annoying whitespace
>> > issue.
>> > 
>> > v2: Also make the fix in i915_load_modeset_init, not just in
>> >     i915_driver_unload (Chris Wilson)
>> > v3: Had extra stuff in it.
>> > v4: Reverted extra stuff (so we're back to v2).
>> >     Rebased and updated commentary above (Dave Gordon).
>> > 
>> > Signed-off-by: Nick Hoath <nicholas.hoath at intel.com>
>> > Signed-off-by: David Gordon <david.s.gordon at intel.com>
>> > Reviewed-by: Chris Wilson <chris at chris-wilson.co.uk>
>> 
>> Have to drop that, with the recent context changes.
>> 
>> You have to move the gpu-reset now for execlists.
>> 
>> Basically pull it out into:
>> 
>> static void i915_unload_gem(struct drm_device *dev)
>> {
>>        /*
>>         * Neither the BIOS, ourselves or any other kernel
>>         * expects the system to be in execlists mode on startup,
>>         * so we need to reset the GPU back to legacy mode. And the only
>>         * known way to disable logical contexts is through a GPU reset.
>>         *
>>         * So in order to leave the system in a known default configration,
>>         * always reset the GPU upon unload. This also cleans up the GEM
>>         * state tracking, flushing off the requests and leaving the system
>>         * idle.
>>         *
>>         * Note that is of the upmost importance that the GPU is idle and
>>         * all stray writes are flushed *before* we dismantle the backing
>>         * storage for the pinned objects.
>>         */
>>        i915_reset(dev);
>> 
>>        mutex_lock(&dev->struct_mutex);
>>        i915_gem_context_fini(dev);
>>        i915_gem_cleanup_ringbuffer(dev);
>>        mutex_unlock(&dev->struct_mutex);
>> }
>> 
>> and then kill the intel_gpu_reset along both the cleanup pathsh
>
> It appears this patch was applied without dropping my r-b for the issue
> I pointed out above.

Now causes a splat in intel_logical_ring_cleanup when unloading module.

Best to revert and rework on top of Dave's cleanup set?
-Mika

[   58.170369] BUG: unable to handle kernel NULL pointer dereference at
(null)
[   58.170374] IP: [<ffffffffa00e04d3>]
intel_logical_ring_cleanup+0x83/0x100 [i915]
[   58.170389] PGD 0 
[   58.170391] Oops: 0000 [#1] PREEMPT SMP 
[   58.170394] Modules linked in: x86_pkg_temp_thermal intel_powerclamp
coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i915(-)
mei_me mei i2c_hid e1000e ptp pps_core
[   58.170404] CPU: 0 PID: 5766 Comm: rmmod Not tainted 4.5.0-rc3+ #43
[   58.170407] Hardware name: System manufacturer System Product
Name/Z170M-PLUS, BIOS 0505 11/16/2015
[   58.170410] task: ffff880036aab380 ti: ffff8802374c0000 task.ti:
ffff8802374c0000
[   58.170412] RIP: 0010:[<ffffffffa00e04d3>]  [<ffffffffa00e04d3>]
intel_logical_ring_cleanup+0x83/0x100 [i915]
[   58.170424] RSP: 0018:ffff8802374c3d30  EFLAGS: 00010282
[   58.170425] RAX: 0000000000000000 RBX: ffff880237203788 RCX:
0000000000000001
[   58.170428] RDX: 0000000087654321 RSI: 000000000000000d RDI:
ffff8802372037c0
[   58.170430] RBP: ffff8802374c3d40 R08: 0000000000000000 R09:
0000000ad856c000
[   58.170432] R10: 0000000000000000 R11: 0000000000000001 R12:
ffff880237200000
[   58.170434] R13: ffff880237209638 R14: ffff880237323000 R15:
0000558a770bd090
[   58.170438] FS:  00007f8b439ff700(0000) GS:ffff880239800000(0000)
knlGS:0000000000000000
[   58.170442] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   58.170444] CR2: 0000000000000000 CR3: 000000023532d000 CR4:
00000000003406f0
[   58.170447] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[   58.170450] DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7:
0000000000000400
[   58.170453] Stack:
[   58.170455]  ffff880237203788 ffff880237200000 ffff8802374c3d70
ffffffffa00d0ed4
[   58.170460]  ffff880237200000 ffff880237323000 ffff880237323060
ffffffffa0194360
[   58.170465]  ffff8802374c3d98 ffffffffa0154520 ffff880237323000
ffff880237323000
[   58.170469] Call Trace:
[   58.170479]  [<ffffffffa00d0ed4>] i915_gem_cleanup_engines+0x34/0x60
[i915]
[   58.170493]  [<ffffffffa0154520>] i915_driver_unload+0x140/0x220
[i915]
[   58.170497]  [<ffffffff8154a4f4>] drm_dev_unregister+0x24/0xa0
[   58.170501]  [<ffffffff8154aace>] drm_put_dev+0x1e/0x60
[   58.170506]  [<ffffffffa00912a0>] i915_pci_remove+0x10/0x20 [i915]
[   58.170510]  [<ffffffff814766e4>] pci_device_remove+0x34/0xb0
[   58.170514]  [<ffffffff8156e7d5>] __device_release_driver+0x95/0x140
[   58.170518]  [<ffffffff8156e97c>] driver_detach+0xbc/0xc0
[   58.170521]  [<ffffffff8156d883>] bus_remove_driver+0x53/0xd0
[   58.170525]  [<ffffffff8156f3a7>] driver_unregister+0x27/0x50
[   58.170528]  [<ffffffff81475725>] pci_unregister_driver+0x25/0x70
[   58.170531]  [<ffffffff8154c274>] drm_pci_exit+0x74/0x90
[   58.170543]  [<ffffffffa0154cb0>] i915_exit+0x20/0x1aa [i915]
[   58.170548]  [<ffffffff8111846f>] SyS_delete_module+0x18f/0x1f0

> -Chris
>
> -- 
> Chris Wilson, Intel Open Source Technology Centre