KUnit issues - Was: [igt-dev] [PATCH RFC v2 8/8] drm/i915: check if current->mm is not NULL

Mauro Carvalho Chehab mauro.chehab at linux.intel.com
Fri Nov 4 12:47:03 UTC 2022


On Fri, 4 Nov 2022 08:49:55 +0100
Mauro Carvalho Chehab <mauro.chehab at linux.intel.com> wrote:

> On Thu, 3 Nov 2022 15:43:26 -0700
> Daniel Latypov <dlatypov at google.com> wrote:
> 
> > On Thu, Nov 3, 2022 at 8:23 AM Mauro Carvalho Chehab
> > <mauro.chehab at linux.intel.com> wrote:  
> > >
> > > Hi,
> > >
> > > I'm facing a couple of issues when testing KUnit with the i915 driver.
> > >
> > > The DRM subsystem and the i915 driver has, for a long time, his own
> > > way to do unit tests, which seems to be added before KUnit.
> > >
> > > I'm now checking if it is worth start using KUnit at i915. So, I wrote
> > > a RFC with some patches adding support for the tests we have to be
> > > reported using Kernel TAP and KUnit.
> > >
> > > There are basically 3 groups of tests there:
> > >
> > > - mock tests - check i915 hardware-independent logic;
> > > - live tests - run some hardware-specific tests;
> > > - perf tests - check perf support - also hardware-dependent.
> > >
> > > As they depend on i915 driver, they run only on x86, with PCI
> > > stack enabled, but the mock tests run nicely via qemu.
> > >
> > > The live and perf tests require a real hardware. As we run them
> > > together with our CI, which, among other things, test module
> > > unload/reload and test loading i915 driver with different
> > > modprobe parameters, the KUnit tests should be able to run as
> > > a module.
> > >
> > > While testing KUnit, I noticed a couple of issues:
> > >
> > > 1. kunit.py parser is currently broken when used with modules
> > >
> > > the parser expects "TAP version xx" output, but this won't
> > > happen when loading the kunit test driver.
> > >
> > > Are there any plans or patches fixing this issue?    
> > 
> > Partially.
> > Note: we need a header to look for so we can strip prefixes (like timestamps).
> > 
> > But there is a patch in the works to add a TAP header for each
> > subtest, hopefully in time for 6.2.  
> 
> Good to know.
> 
> > This is to match the KTAP spec:
> > https://kernel.org/doc/html/latest/dev-tools/ktap.html  
> 
> I see.
> 
> > That should fix it so you can parse one suite's results at a time.
> > I'm pretty sure it won't fix the case where there's multiple suites
> > and/or you're trying to parse all test results at once via
> > 
> > $ find /sys/kernel/debug/kunit/ -type f | xargs cat |
> > ./tools/testing/kunit/kunit.py parse  
> 
> Could you point me to the changeset? perhaps I can write a followup
> patch addressing this case.
> 
> > I think that in-kernel code change + some more python changes could
> > make the above command work, but no one has actively started looking
> > at that just yet.
> > Hopefully we can pick this up and also get it done for 6.2 (unless I'm
> > underestimating how complicated this is).
> >   
> > >
> > > 2. current->mm is not initialized
> > >
> > > Some tests do mmap(). They need the mm user context to be initialized,
> > > but this is not happening right now.
> > >
> > > Are there a way to properly initialize it for KUnit?    
> > 
> > Right, this is a consequence of how early built-in KUnit tests are run
> > after boot.
> > I think for now, the answer is to make the test module-only.
> > 
> > I know David had some ideas here, but I can't speak to them.  
> 
> This is happening when test-i915 is built as module as well.
> 
> I suspect that the function which initializes it is mm_alloc() inside 
> kernel/fork.c:
> 
> 	struct mm_struct *mm_alloc(void)
> 	{
> 	        struct mm_struct *mm;
> 
> 	        mm = allocate_mm();
> 	        if (!mm)
> 	                return NULL;
> 
> 	        memset(mm, 0, sizeof(*mm));
> 	        return mm_init(mm, current, current_user_ns());
> 	}
> 
> As modprobing a test won't fork until all tests run, this never runs.
> 
> It seems that the normal usage is at fs/exec.c:
> 
> 	fs/exec.c:      bprm->mm = mm = mm_alloc();
> 
> but other places also call it:
> 
> 	arch/arm/mach-rpc/ecard.c:      struct mm_struct * mm = mm_alloc();
> 	drivers/dma-buf/dma-resv.c:     struct mm_struct *mm = mm_alloc();
> 	include/linux/sched/mm.h:extern struct mm_struct *mm_alloc(void);
> 	mm/debug_vm_pgtable.c:  args->mm = mm_alloc();
> 
> Probably the solution would be to call it inside kunit executor code,
> adding support for modules to use it.


Hmm... it is not that simple... I tried the enclosed patch, but it caused
another issue at the live/mman/mmap test:

<snip>
[  152.815543] test_i915: 0000:00:02.0: it is a i915 device.
[  152.816456]     # Subtest: i915 live selftests
[  152.816463]     1..1
[  152.816835] kunit_try_run_case: allocating user context
[  152.816978] CPU: 1 PID: 1139 Comm: kunit_try_catch Tainted: G                 N 6.1.0-rc2-drm-110e9bebcbcc+ #20
[  152.817063] Hardware name: Intel Corporation Tiger Lake Client Platform/TigerLake Y LPDDR4x T4 Crb, BIOS TGLSFWI1.R00.3243.A01.2006102133 06/10/2020
[  152.817583] i915: Performing live_mman selftests with st_random_seed=0x11aaba4d st_timeout=500
[  152.817735] test_i915: Setting dangerous option KUnit live_mman - tainting kernel
[  152.817819] test_i915: Running live_mman on 0000:00:02.0
[  152.817899] i915: Running i915_gem_mman_live_selftests/igt_partial_tiling
[  153.346653] check_partial_mappings: timed out after tiling=0 stride=0
[  153.847696] check_partial_mappings: timed out after tiling=1 stride=262144
[  154.348615] check_partial_mappings: timed out after tiling=2 stride=262144
[  154.376677] i915: Running i915_gem_mman_live_selftests/igt_smoke_tiling
[  154.877686] igt_smoke_tiling: Completed 3465 trials
[  155.025764] i915: Running i915_gem_mman_live_selftests/igt_mmap_offset_exhaustion
[  155.050908] i915: Running i915_gem_mman_live_selftests/igt_mmap
[  155.052056] BUG: kernel NULL pointer dereference, address: 0000000000000000
[  155.052080] #PF: supervisor instruction fetch in kernel mode
[  155.052095] #PF: error_code(0x0010) - not-present page
[  155.052110] PGD 0 P4D 0 
[  155.052121] Oops: 0010 [#1] PREEMPT SMP NOPTI
[  155.052135] CPU: 5 PID: 1139 Comm: kunit_try_catch Tainted: G     U           N 6.1.0-rc2-drm-110e9bebcbcc+ #20
[  155.052162] Hardware name: Intel Corporation Tiger Lake Client Platform/TigerLake Y LPDDR4x T4 Crb, BIOS TGLSFWI1.R00.3243.A01.2006102133 06/10/2020
[  155.052191] RIP: 0010:0x0
[  155.052207] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
[  155.052223] RSP: 0018:ffffc900019ebbe8 EFLAGS: 00010246
[  155.052238] RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000100000
[  155.052257] RDX: 0000000000001000 RSI: 0000000000000000 RDI: ffff8881111a6840
[  155.052275] RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000
[  155.052292] R10: ffff8881049ad000 R11: 00000000ffffffff R12: 0000000000000002
[  155.052309] R13: ffff8881111a6840 R14: 0000000000100000 R15: 0000000000000000
[  155.052327] FS:  0000000000000000(0000) GS:ffff8883a3a80000(0000) knlGS:0000000000000000
[  155.052347] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  155.052361] CR2: ffffffffffffffd6 CR3: 000000011118c004 CR4: 0000000000770ee0
[  155.052379] PKRU: 55555554
[  155.052387] Call Trace:
[  155.052396]  <TASK>
[  155.052403]  get_unmapped_area+0x80/0x130
[  155.052422]  do_mmap+0xe5/0x530
[  155.052439]  vm_mmap_pgoff+0xab/0x150
[  155.052457]  igt_mmap_offset+0x133/0x1e0 [i915]
[  155.052875]  __igt_mmap+0xfe/0x680 [i915]
[  155.053233]  ? __i915_gem_object_create_user_ext+0x49c/0x550 [i915]
[  155.053614]  igt_mmap+0xd8/0x290 [i915]
[  155.054057]  ? __trace_bprintk+0x8c/0xa0
[  155.054080]  __i915_subtests.cold+0x53/0xd5 [i915]
[  155.054648]  ? __i915_nop_teardown+0x20/0x20 [i915]
[  155.055127]  ? __i915_live_setup+0x60/0x60 [i915]
[  155.055608]  ? singleton_release+0x40/0x40 [i915]
[  155.056060]  i915_gem_mman_live_selftests+0x4e/0x60 [i915]
[  155.056503]  run_pci_test.cold+0x4d/0x163 [test_i915]
[  155.056535]  ? kunit_try_catch_throw+0x20/0x20
[  155.056557]  live_mman+0x19/0x26 [test_i915]
[  155.056581]  kunit_try_run_case+0xf0/0x145
[  155.056607]  kunit_generic_run_threadfn_adapter+0x13/0x30
[  155.057715]  kthread+0xf2/0x120
[  155.058864]  ? kthread_complete_and_exit+0x20/0x20
[  155.060014]  ret_from_fork+0x1f/0x30
[  155.061108]  </TASK>
[  155.062174] Modules linked in: test_i915 x86_pkg_temp_thermal coretemp snd_hda_codec_hdmi mei_hdcp kvm_intel snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep kvm snd_hda_core mei_me irqbypass wmi_bmof snd_pcm i2c_i801 mei i2c_smbus intel_lpss_pci crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i915 prime_numbers drm_buddy drm_display_helper drm_kms_helper syscopyarea e1000e sysfillrect sysimgblt ptp fb_sys_fops pps_core ttm video wmi fuse
[  155.064354] CR2: 0000000000000000
[  155.065413] ---[ end trace 0000000000000000 ]---
[  155.074555] RIP: 0010:0x0
[  155.075437] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
[  155.076313] RSP: 0018:ffffc900019ebbe8 EFLAGS: 00010246
[  155.077195] RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000100000
[  155.078124] RDX: 0000000000001000 RSI: 0000000000000000 RDI: ffff8881111a6840
[  155.079013] RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000
[  155.079898] R10: ffff8881049ad000 R11: 00000000ffffffff R12: 0000000000000002
[  155.080785] R13: ffff8881111a6840 R14: 0000000000100000 R15: 0000000000000000
[  155.081668] FS:  0000000000000000(0000) GS:ffff8883a3a80000(0000) knlGS:0000000000000000
[  155.082565] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  155.083451] CR2: ffffffffffffffd6 CR3: 0000000110904006 CR4: 0000000000770ee0
[  155.084348] PKRU: 55555554
</snip>

It sounds that something else is needed to properly initialize the user
context.

Regards,
Mauro

---

[PATCH] kunit: allocate user context mm

Without that, tests envolving mmap won't work.

Signed-off-by: Mauro Carvalho Chehab <mchehab at kernel.org>

diff --git a/lib/kunit/test.c b/lib/kunit/test.c
index 90640a43cf62..809522e110c5 100644
--- a/lib/kunit/test.c
+++ b/lib/kunit/test.c
@@ -14,6 +14,7 @@
 #include <linux/moduleparam.h>
 #include <linux/panic.h>
 #include <linux/sched/debug.h>
+#include <linux/sched/mm.h>
 #include <linux/sched.h>
 
 #include "debugfs.h"
@@ -381,9 +382,23 @@ static void kunit_try_run_case(void *data)
 	struct kunit *test = ctx->test;
 	struct kunit_suite *suite = ctx->suite;
 	struct kunit_case *test_case = ctx->test_case;
+	struct mm_struct *mm = NULL;
 
 	current->kunit_test = test;
 
+	if (!current->mm) {
+		pr_info("%s: allocating user context\n", __func__);
+		mm = mm_alloc();
+		if (!mm) {
+			kunit_err(suite, KUNIT_SUBTEST_INDENT
+				"# failed to allocate mm user context");
+			return;
+		}
+		current->mm = mm;
+	} else {
+		pr_info("%s: using already-existing user context\n", __func__);
+	}
+
 	/*
 	 * kunit_run_case_internal may encounter a fatal error; if it does,
 	 * abort will be called, this thread will exit, and finally the parent
@@ -392,6 +407,11 @@ static void kunit_try_run_case(void *data)
 	kunit_run_case_internal(test, suite, test_case);
 	/* This line may never be reached. */
 	kunit_run_case_cleanup(test, suite);
+
+	if (mm) {
+		mmdrop(mm);
+		current->mm = NULL;
+	}
 }
 
 static void kunit_catch_run_case(void *data)



More information about the dri-devel mailing list