[Intel-gfx] NPE in i915_gemfs_init
Heiner Kallweit
hkallweit1 at gmail.com
Sun Jul 14 12:39:15 UTC 2019
On 14.07.2019 14:34, Chris Wilson wrote:
> Quoting Heiner Kallweit (2019-07-13 12:12:56)
>> On 13.07.2019 12:38, Heiner Kallweit wrote:
>>> Since few days I'm getting the following on a N3450-based headless system with linux-next.
>>> linux-next from Jul 4th was still ok.
>>> Is this a known issue?
>>>
>>> [ 4.818139] BUG: kernel NULL pointer dereference, address: 0000000000000000
>>> [ 4.818165] #PF: supervisor instruction fetch in kernel mode
>>> [ 4.818178] #PF: error_code(0x0010) - not-present page
>>> [ 4.818192] PGD 0 P4D 0
>>> [ 4.818203] Oops: 0010 [#1] SMP
>>> [ 4.818214] CPU: 2 PID: 2008 Comm: systemd-udevd Not tainted 5.2.0-next-20190712 #1
>>> [ 4.818232] Hardware name: NA ZBOX-CI327NANO-GS-01/ZBOX-CI327NANO-GS-01, BIOS 5.12 04/26/2018
>>> [ 4.818253] RIP: 0010:0x0
>>> [ 4.818265] Code: Bad RIP value.
>>> [ 4.818275] RSP: 0018:ffffacd84023f918 EFLAGS: 00010287
>>> [ 4.818288] RAX: 0000000000000000 RBX: ffff8b11f7f60000 RCX: 00000000000000ae
>>> [ 4.818304] RDX: ffffacd84023f92d RSI: ffffacd84023f928 RDI: ffff8b11f884a000
>>> [ 4.818320] RBP: ffffacd84023f950 R08: 0000000000000001 R09: 0000000000000000
>>> [ 4.818336] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8b11f882c7a0
>>> [ 4.818352] R13: 0000000000000000 R14: ffff8b11f7f60000 R15: ffffffffc06b71e0
>>> [ 4.818369] FS: 00007fb45c549840(0000) GS:ffff8b11fbb00000(0000) knlGS:0000000000000000
>>> [ 4.818387] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ 4.818404] CR2: ffffffffffffffd6 CR3: 0000000177d4a000 CR4: 00000000003406e0
>>> [ 4.818419] Call Trace:
>>> [ 4.818508] i915_gemfs_init+0x80/0xc0 [i915]
>>> [ 4.818582] i915_gem_init_early+0x126/0x140 [i915]
>>> [ 4.818647] i915_driver_load+0x362/0x1740 [i915]
>>> [ 4.818663] ? find_held_lock+0x37/0x90
>>> [ 4.818675] ? _raw_spin_unlock_irqrestore+0x45/0x50
>>> [ 4.818689] ? __pm_runtime_resume+0x5e/0x90
>>> [ 4.818701] ? lockdep_hardirqs_on+0xf2/0x180
>>> [ 4.818713] ? _raw_spin_unlock_irqrestore+0x45/0x50
>>> [ 4.818778] i915_pci_probe+0x45/0x120 [i915]
>>> [ 4.818792] pci_device_probe+0xab/0x120
>>> [ 4.818804] really_probe+0xf4/0x290
>>> [ 4.818815] driver_probe_device+0x53/0xa0
>>> [ 4.818826] device_driver_attach+0x59/0x60
>>> [ 4.818838] __driver_attach+0x53/0xc0
>>> [ 4.818849] ? device_driver_attach+0x60/0x60
>>> [ 4.818862] bus_for_each_dev+0x82/0xd0
>>> [ 4.818873] driver_attach+0x1f/0x30
>>> [ 4.818883] bus_add_driver+0x174/0x1c0
>>> [ 4.818895] driver_register+0x71/0xc0
>>> [ 4.818906] __pci_register_driver+0x73/0x80
>>> [ 4.818971] i915_init+0x5c/0x67 [i915]
>>> [ 4.818982] ? 0xffffffffc06fb000
>>> [ 4.818993] do_one_initcall+0x5f/0x2e5
>>> [ 4.819005] ? do_init_module+0x23/0x220
>>> [ 4.819018] ? rcu_read_lock_sched_held+0x76/0x80
>>> [ 4.819032] ? kmem_cache_alloc_trace+0x234/0x260
>>> [ 4.819045] do_init_module+0x5d/0x220
>>> [ 4.819056] load_module+0x2220/0x2560
>>> [ 4.819068] ? kernel_read+0x52/0x80
>>> [ 4.819079] __do_sys_finit_module+0xda/0x100
>>> [ 4.819091] ? __do_sys_finit_module+0xda/0x100
>>> [ 4.819104] __x64_sys_finit_module+0x19/0x20
>>> [ 4.819116] do_syscall_64+0x50/0x1a0
>>> [ 4.819127] entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>> [ 4.819140] RIP: 0033:0x7fb45dd2ce3d
>>> [ 4.819152] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 23 50 0c 00 f7 d8 64 89 01 48
>>> [ 4.819190] RSP: 002b:00007fff3b90ee48 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
>>> [ 4.819208] RAX: ffffffffffffffda RBX: 00005586f27eccb0 RCX: 00007fb45dd2ce3d
>>> [ 4.819224] RDX: 0000000000000000 RSI: 00007fb45d9a284d RDI: 0000000000000015
>>> [ 4.819240] RBP: 00007fb45d9a284d R08: 0000000000000000 R09: 0000000000000001
>>> [ 4.819256] R10: 0000000000000015 R11: 0000000000000246 R12: 0000000000000000
>>> [ 4.819272] R13: 00005586f27e23b0 R14: 0000000000020000 R15: 00005586f27eccb0
>>> [ 4.819289] Modules linked in: aes_x86_64(+) glue_helper crypto_simd i915(+) cryptd snd_hda_intel intel_gtt i2c_algo_bit drm_kms_helper syscopyarea snd_hda_codec sysfillrect sysimgblt snd_hda_core fb_sys_fops r8169 i2c_i801 snd_pcm realtek snd_timer libphy drm snd mei_me mei sch_fq_codel crypto_user efivarfs ipv6 serio_raw atkbd libps2 i8042 serio ums_realtek ext4 crc32c_intel mbcache jbd2 ahci libahci libata
>>> [ 4.819387] CR2: 0000000000000000
>>> [ 4.819399] ---[ end trace 531c4d73e2bf857e ]---
>>> [ 4.819412] RIP: 0010:0x0
>>> [ 4.819424] Code: Bad RIP value.
>>> [ 4.819434] RSP: 0018:ffffacd84023f918 EFLAGS: 00010287
>>> [ 4.819447] RAX: 0000000000000000 RBX: ffff8b11f7f60000 RCX: 00000000000000ae
>>> [ 4.819463] RDX: ffffacd84023f92d RSI: ffffacd84023f928 RDI: ffff8b11f884a000
>>> [ 4.819479] RBP: ffffacd84023f950 R08: 0000000000000001 R09: 0000000000000000
>>> [ 4.819495] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8b11f882c7a0
>>> [ 4.819511] R13: 0000000000000000 R14: ffff8b11f7f60000 R15: ffffffffc06b71e0
>>> [ 4.819527] FS: 00007fb45c549840(0000) GS:ffff8b11fbb00000(0000) knlGS:0000000000000000
>>> [ 4.819545] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ 4.819559] CR2: ffffffffffffffd6 CR3: 0000000177d4a000 CR4: 00000000003406e0
>>> [ 4.819576] BUG: sleeping function called from invalid context at include/linux/percpu-rwsem.h:38
>>> [ 4.819596] in_atomic(): 0, irqs_disabled(): 1, pid: 2008, name: systemd-udevd
>>> [ 4.819613] INFO: lockdep is turned off.
>>> [ 4.819623] irq event stamp: 31896
>>> [ 4.819637] hardirqs last enabled at (31895): [<ffffffff9f40e875>] kfree+0xc5/0x2a0
>>> [ 4.819656] hardirqs last disabled at (31896): [<ffffffff9f201c3d>] trace_hardirqs_off_thunk+0x1a/0x1c
>>> [ 4.819679] softirqs last enabled at (31866): [<ffffffff9fc00327>] __do_softirq+0x327/0x424
>>> [ 4.819700] softirqs last disabled at (31859): [<ffffffff9f26f8b3>] irq_exit+0xb3/0xc0
>>> [ 4.819719] CPU: 2 PID: 2008 Comm: systemd-udevd Tainted: G D 5.2.0-next-20190712 #1
>>> [ 4.819739] Hardware name: NA ZBOX-CI327NANO-GS-01/ZBOX-CI327NANO-GS-01, BIOS 5.12 04/26/2018
>>> [ 4.819758] Call Trace:
>>> [ 4.819770] dump_stack+0x70/0xa0
>>> [ 4.819782] ___might_sleep.cold+0x9f/0xb0
>>> [ 4.819794] __might_sleep+0x46/0x80
>>> [ 4.819805] exit_signals+0x2f/0x330
>>> [ 4.819816] do_exit+0xb3/0xb60
>>> [ 4.819827] rewind_stack_do_exit+0x17/0x20
>>> [ 4.819838] RIP: 0033:0x7fb45dd2ce3d
>>> [ 4.819849] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 23 50 0c 00 f7 d8 64 89 01 48
>>> [ 4.819887] RSP: 002b:00007fff3b90ee48 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
>>> [ 4.819905] RAX: ffffffffffffffda RBX: 00005586f27eccb0 RCX: 00007fb45dd2ce3d
>>> [ 4.819921] RDX: 0000000000000000 RSI: 00007fb45d9a284d RDI: 0000000000000015
>>> [ 4.819937] RBP: 00007fb45d9a284d R08: 0000000000000000 R09: 0000000000000001
>>> [ 4.819953] R10: 0000000000000015 R11: 0000000000000246 R12: 0000000000000000
>>> [ 4.819969] R13: 00005586f27e23b0 R14: 0000000000020000 R15: 00005586f27eccb0
>>>
>>
>> I debugged a little bit and remount_fs isn't set in sb->s_op.
>> The following at least avoids the NPE, not sure whether it's the correct fix.
>
> I take it you don't have CONFIG_TMPFS set?
>
This option is set:
[root at zotac linux-next]# grep TMPFS .config
CONFIG_DEVTMPFS=y
CONFIG_DEVTMPFS_MOUNT=y
CONFIG_TMPFS=y
CONFIG_TMPFS_POSIX_ACL=y
CONFIG_TMPFS_XATTR=y
> In which case we can't use the remount_fs trick and we can't pass
> options to kern_mount(). We could however just set the option direction
> in our superblock -- sadly, the defines are private so we will just have
> to hope they don't change :)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gemfs.c b/drivers/gpu/drm/i915/gem/i915_gemfs.c
> index 099f3397aada..5910315f2069 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gemfs.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gemfs.c
> @@ -7,6 +7,7 @@
> #include <linux/fs.h>
> #include <linux/mount.h>
> #include <linux/pagemap.h>
> +#include <linux/shmem_fs.h>
>
> #include "i915_drv.h"
> #include "i915_gemfs.h"
> @@ -33,17 +34,10 @@ int i915_gemfs_init(struct drm_i915_private *i915)
> */
>
> if (has_transparent_hugepage()) {
> - struct super_block *sb = gemfs->mnt_sb;
> + struct shmem_sb_info *sb_info = gemfs->mnt_sb->s_fs_info;
> +
> /* FIXME: Disabled until we get W/A for read BW issue. */
> - char options[] = "huge=never";
> - int flags = 0;
> - int err;
> -
> - err = sb->s_op->remount_fs(sb, &flags, options);
> - if (err) {
> - kern_unmount(gemfs);
> - return err;
> - }
> + sb_info->huge = 0; /* SHMEM_HUGE_NEVER */
> }
>
> i915->mm.gemfs = gemfs;
>
> -Chris
>
Heiner
More information about the Intel-gfx
mailing list