[Intel-gfx] KUnit issues - Was: [igt-dev] [PATCH RFC v2 8/8] drm/i915: check if current->mm is not NULL
Mauro Carvalho Chehab
mauro.chehab at linux.intel.com
Fri Nov 4 12:47:03 UTC 2022
On Fri, 4 Nov 2022 08:49:55 +0100
Mauro Carvalho Chehab <mauro.chehab at linux.intel.com> wrote:
> On Thu, 3 Nov 2022 15:43:26 -0700
> Daniel Latypov <dlatypov at google.com> wrote:
>
> > On Thu, Nov 3, 2022 at 8:23 AM Mauro Carvalho Chehab
> > <mauro.chehab at linux.intel.com> wrote:
> > >
> > > Hi,
> > >
> > > I'm facing a couple of issues when testing KUnit with the i915 driver.
> > >
> > > The DRM subsystem and the i915 driver has, for a long time, his own
> > > way to do unit tests, which seems to be added before KUnit.
> > >
> > > I'm now checking if it is worth start using KUnit at i915. So, I wrote
> > > a RFC with some patches adding support for the tests we have to be
> > > reported using Kernel TAP and KUnit.
> > >
> > > There are basically 3 groups of tests there:
> > >
> > > - mock tests - check i915 hardware-independent logic;
> > > - live tests - run some hardware-specific tests;
> > > - perf tests - check perf support - also hardware-dependent.
> > >
> > > As they depend on i915 driver, they run only on x86, with PCI
> > > stack enabled, but the mock tests run nicely via qemu.
> > >
> > > The live and perf tests require a real hardware. As we run them
> > > together with our CI, which, among other things, test module
> > > unload/reload and test loading i915 driver with different
> > > modprobe parameters, the KUnit tests should be able to run as
> > > a module.
> > >
> > > While testing KUnit, I noticed a couple of issues:
> > >
> > > 1. kunit.py parser is currently broken when used with modules
> > >
> > > the parser expects "TAP version xx" output, but this won't
> > > happen when loading the kunit test driver.
> > >
> > > Are there any plans or patches fixing this issue?
> >
> > Partially.
> > Note: we need a header to look for so we can strip prefixes (like timestamps).
> >
> > But there is a patch in the works to add a TAP header for each
> > subtest, hopefully in time for 6.2.
>
> Good to know.
>
> > This is to match the KTAP spec:
> > https://kernel.org/doc/html/latest/dev-tools/ktap.html
>
> I see.
>
> > That should fix it so you can parse one suite's results at a time.
> > I'm pretty sure it won't fix the case where there's multiple suites
> > and/or you're trying to parse all test results at once via
> >
> > $ find /sys/kernel/debug/kunit/ -type f | xargs cat |
> > ./tools/testing/kunit/kunit.py parse
>
> Could you point me to the changeset? perhaps I can write a followup
> patch addressing this case.
>
> > I think that in-kernel code change + some more python changes could
> > make the above command work, but no one has actively started looking
> > at that just yet.
> > Hopefully we can pick this up and also get it done for 6.2 (unless I'm
> > underestimating how complicated this is).
> >
> > >
> > > 2. current->mm is not initialized
> > >
> > > Some tests do mmap(). They need the mm user context to be initialized,
> > > but this is not happening right now.
> > >
> > > Are there a way to properly initialize it for KUnit?
> >
> > Right, this is a consequence of how early built-in KUnit tests are run
> > after boot.
> > I think for now, the answer is to make the test module-only.
> >
> > I know David had some ideas here, but I can't speak to them.
>
> This is happening when test-i915 is built as module as well.
>
> I suspect that the function which initializes it is mm_alloc() inside
> kernel/fork.c:
>
> struct mm_struct *mm_alloc(void)
> {
> struct mm_struct *mm;
>
> mm = allocate_mm();
> if (!mm)
> return NULL;
>
> memset(mm, 0, sizeof(*mm));
> return mm_init(mm, current, current_user_ns());
> }
>
> As modprobing a test won't fork until all tests run, this never runs.
>
> It seems that the normal usage is at fs/exec.c:
>
> fs/exec.c: bprm->mm = mm = mm_alloc();
>
> but other places also call it:
>
> arch/arm/mach-rpc/ecard.c: struct mm_struct * mm = mm_alloc();
> drivers/dma-buf/dma-resv.c: struct mm_struct *mm = mm_alloc();
> include/linux/sched/mm.h:extern struct mm_struct *mm_alloc(void);
> mm/debug_vm_pgtable.c: args->mm = mm_alloc();
>
> Probably the solution would be to call it inside kunit executor code,
> adding support for modules to use it.
Hmm... it is not that simple... I tried the enclosed patch, but it caused
another issue at the live/mman/mmap test:
<snip>
[ 152.815543] test_i915: 0000:00:02.0: it is a i915 device.
[ 152.816456] # Subtest: i915 live selftests
[ 152.816463] 1..1
[ 152.816835] kunit_try_run_case: allocating user context
[ 152.816978] CPU: 1 PID: 1139 Comm: kunit_try_catch Tainted: G N 6.1.0-rc2-drm-110e9bebcbcc+ #20
[ 152.817063] Hardware name: Intel Corporation Tiger Lake Client Platform/TigerLake Y LPDDR4x T4 Crb, BIOS TGLSFWI1.R00.3243.A01.2006102133 06/10/2020
[ 152.817583] i915: Performing live_mman selftests with st_random_seed=0x11aaba4d st_timeout=500
[ 152.817735] test_i915: Setting dangerous option KUnit live_mman - tainting kernel
[ 152.817819] test_i915: Running live_mman on 0000:00:02.0
[ 152.817899] i915: Running i915_gem_mman_live_selftests/igt_partial_tiling
[ 153.346653] check_partial_mappings: timed out after tiling=0 stride=0
[ 153.847696] check_partial_mappings: timed out after tiling=1 stride=262144
[ 154.348615] check_partial_mappings: timed out after tiling=2 stride=262144
[ 154.376677] i915: Running i915_gem_mman_live_selftests/igt_smoke_tiling
[ 154.877686] igt_smoke_tiling: Completed 3465 trials
[ 155.025764] i915: Running i915_gem_mman_live_selftests/igt_mmap_offset_exhaustion
[ 155.050908] i915: Running i915_gem_mman_live_selftests/igt_mmap
[ 155.052056] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 155.052080] #PF: supervisor instruction fetch in kernel mode
[ 155.052095] #PF: error_code(0x0010) - not-present page
[ 155.052110] PGD 0 P4D 0
[ 155.052121] Oops: 0010 [#1] PREEMPT SMP NOPTI
[ 155.052135] CPU: 5 PID: 1139 Comm: kunit_try_catch Tainted: G U N 6.1.0-rc2-drm-110e9bebcbcc+ #20
[ 155.052162] Hardware name: Intel Corporation Tiger Lake Client Platform/TigerLake Y LPDDR4x T4 Crb, BIOS TGLSFWI1.R00.3243.A01.2006102133 06/10/2020
[ 155.052191] RIP: 0010:0x0
[ 155.052207] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
[ 155.052223] RSP: 0018:ffffc900019ebbe8 EFLAGS: 00010246
[ 155.052238] RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000100000
[ 155.052257] RDX: 0000000000001000 RSI: 0000000000000000 RDI: ffff8881111a6840
[ 155.052275] RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000
[ 155.052292] R10: ffff8881049ad000 R11: 00000000ffffffff R12: 0000000000000002
[ 155.052309] R13: ffff8881111a6840 R14: 0000000000100000 R15: 0000000000000000
[ 155.052327] FS: 0000000000000000(0000) GS:ffff8883a3a80000(0000) knlGS:0000000000000000
[ 155.052347] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 155.052361] CR2: ffffffffffffffd6 CR3: 000000011118c004 CR4: 0000000000770ee0
[ 155.052379] PKRU: 55555554
[ 155.052387] Call Trace:
[ 155.052396] <TASK>
[ 155.052403] get_unmapped_area+0x80/0x130
[ 155.052422] do_mmap+0xe5/0x530
[ 155.052439] vm_mmap_pgoff+0xab/0x150
[ 155.052457] igt_mmap_offset+0x133/0x1e0 [i915]
[ 155.052875] __igt_mmap+0xfe/0x680 [i915]
[ 155.053233] ? __i915_gem_object_create_user_ext+0x49c/0x550 [i915]
[ 155.053614] igt_mmap+0xd8/0x290 [i915]
[ 155.054057] ? __trace_bprintk+0x8c/0xa0
[ 155.054080] __i915_subtests.cold+0x53/0xd5 [i915]
[ 155.054648] ? __i915_nop_teardown+0x20/0x20 [i915]
[ 155.055127] ? __i915_live_setup+0x60/0x60 [i915]
[ 155.055608] ? singleton_release+0x40/0x40 [i915]
[ 155.056060] i915_gem_mman_live_selftests+0x4e/0x60 [i915]
[ 155.056503] run_pci_test.cold+0x4d/0x163 [test_i915]
[ 155.056535] ? kunit_try_catch_throw+0x20/0x20
[ 155.056557] live_mman+0x19/0x26 [test_i915]
[ 155.056581] kunit_try_run_case+0xf0/0x145
[ 155.056607] kunit_generic_run_threadfn_adapter+0x13/0x30
[ 155.057715] kthread+0xf2/0x120
[ 155.058864] ? kthread_complete_and_exit+0x20/0x20
[ 155.060014] ret_from_fork+0x1f/0x30
[ 155.061108] </TASK>
[ 155.062174] Modules linked in: test_i915 x86_pkg_temp_thermal coretemp snd_hda_codec_hdmi mei_hdcp kvm_intel snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep kvm snd_hda_core mei_me irqbypass wmi_bmof snd_pcm i2c_i801 mei i2c_smbus intel_lpss_pci crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i915 prime_numbers drm_buddy drm_display_helper drm_kms_helper syscopyarea e1000e sysfillrect sysimgblt ptp fb_sys_fops pps_core ttm video wmi fuse
[ 155.064354] CR2: 0000000000000000
[ 155.065413] ---[ end trace 0000000000000000 ]---
[ 155.074555] RIP: 0010:0x0
[ 155.075437] Code: Unable to access opcode bytes at 0xffffffffffffffd6.
[ 155.076313] RSP: 0018:ffffc900019ebbe8 EFLAGS: 00010246
[ 155.077195] RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000100000
[ 155.078124] RDX: 0000000000001000 RSI: 0000000000000000 RDI: ffff8881111a6840
[ 155.079013] RBP: 0000000000000001 R08: 0000000000000001 R09: 0000000000000000
[ 155.079898] R10: ffff8881049ad000 R11: 00000000ffffffff R12: 0000000000000002
[ 155.080785] R13: ffff8881111a6840 R14: 0000000000100000 R15: 0000000000000000
[ 155.081668] FS: 0000000000000000(0000) GS:ffff8883a3a80000(0000) knlGS:0000000000000000
[ 155.082565] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 155.083451] CR2: ffffffffffffffd6 CR3: 0000000110904006 CR4: 0000000000770ee0
[ 155.084348] PKRU: 55555554
</snip>
It sounds that something else is needed to properly initialize the user
context.
Regards,
Mauro
---
[PATCH] kunit: allocate user context mm
Without that, tests envolving mmap won't work.
Signed-off-by: Mauro Carvalho Chehab <mchehab at kernel.org>
diff --git a/lib/kunit/test.c b/lib/kunit/test.c
index 90640a43cf62..809522e110c5 100644
--- a/lib/kunit/test.c
+++ b/lib/kunit/test.c
@@ -14,6 +14,7 @@
#include <linux/moduleparam.h>
#include <linux/panic.h>
#include <linux/sched/debug.h>
+#include <linux/sched/mm.h>
#include <linux/sched.h>
#include "debugfs.h"
@@ -381,9 +382,23 @@ static void kunit_try_run_case(void *data)
struct kunit *test = ctx->test;
struct kunit_suite *suite = ctx->suite;
struct kunit_case *test_case = ctx->test_case;
+ struct mm_struct *mm = NULL;
current->kunit_test = test;
+ if (!current->mm) {
+ pr_info("%s: allocating user context\n", __func__);
+ mm = mm_alloc();
+ if (!mm) {
+ kunit_err(suite, KUNIT_SUBTEST_INDENT
+ "# failed to allocate mm user context");
+ return;
+ }
+ current->mm = mm;
+ } else {
+ pr_info("%s: using already-existing user context\n", __func__);
+ }
+
/*
* kunit_run_case_internal may encounter a fatal error; if it does,
* abort will be called, this thread will exit, and finally the parent
@@ -392,6 +407,11 @@ static void kunit_try_run_case(void *data)
kunit_run_case_internal(test, suite, test_case);
/* This line may never be reached. */
kunit_run_case_cleanup(test, suite);
+
+ if (mm) {
+ mmdrop(mm);
+ current->mm = NULL;
+ }
}
static void kunit_catch_run_case(void *data)
More information about the Intel-gfx
mailing list