[Intel-gfx] [PATCH 1/2] drmtest: introduce kmsg_error functions

Mon Nov 11 19:48:26 CET 2013

On Mon, Nov 11, 2013 at 04:18:04PM -0200, Paulo Zanoni wrote:
> 2013/11/11 Daniel Vetter <daniel at ffwll.ch>:
> > On Mon, Nov 11, 2013 at 03:06:09PM -0200, Paulo Zanoni wrote:
> >> From: Paulo Zanoni <paulo.r.zanoni at intel.com>
> >>
> >> These functions should help you checking for new Kernel error
> >> messages. One of the problems I had while writing the runtime PM test
> >> suite is that when you read the sysfs and debugfs files, the only way
> >> to detect errors is by checking dmesg, so I was always getting SUCCESS
> >> even if the test caught a bug. Also, we have so many debugfs/sysfs
> >> files that it was not easy to discover which file caused the error
> >> messages I was seeing. So this commit adds some infrastructure to
> >> allow us to automatically check for new errors on dmesg.
> >>
> >> Use it like this:
> >>
> >> int main(int argc, char *argv[]) {
> >>       int fd, i;
> >>
> >>       igt_fixture
> >>               fd = kmsg_error_setup();
> >>
> >>       igt_subtest("t1") {
> >>               kmsg_error_reset(fd);
> >>               do_something();
> >>               kmsg_error_detect("");
> >>       }
> >>
> >>       igt_subtest("t2") {
> >>               for (i = 0; i < 10; i++) {
> >>                       char *file_name = get_file(i);
> >>                       kmsg_error_reset(fd);
> >>                       process_file(file_name);
> >>                       kmsg_error_detect(file_name):
> >>               }
> >>       }
> >>
> >>       igt_fixture
> >>               kmsg_error_teardown(fd);
> >> }
> >
> > Imo that's the wrong approach. _Every_ test should fail if we end up with
> > errors/backtraces in dmesg.
> 
> That's exactly why I wrote code to check dmesg! We could, in the
> future, make the igt_subtest macros call this code automatically.

That leaves out tests that aren't yet converted over to subtests ...
> 
> 
> > And if you look for very specific stuff (like
> > gpu hang or missed irq warnings) the approach thus far has been to expose
> > that information somehow through debugfs files.
> 
> So you're suggesting I should create some sort of debugfs interface to
> expose every single WARN our driver does? Doesn't really sound like a
> good idea, unless we invent our our I915_WARN, I915_ERROR, etc.. And
> we still won't catch the WARNs and ERRORs spit by drm.ko or anything
> outside i915.ko.

Nope. I'm suggesting to do this in the piglit runner instead so that all
tests profit automatically. Since as you say, we can't patch lockdep.

> > That way we're independent
> > from the exact string used in the kernel output.
> 
> ZZ_check_dmesg already parses dmesg strings, I just copied it. Also,
> the whole IGT already relies way too much on being ran against
> very-recent libdrm/Kernels, we're just adding one more dependency. And
> we can also add the newer strings if somebody ever changes the WARN or
> DRM_ERROR output: it's not like our code will completely break, it
> just won't be as good. And we always require everybody to use IGT git
> master anyway. I don't see the problem.

ZZ_check_dmesg was a quick hack done two years ago that should have been
ported to something sane a long time ago. With piglit's non-deterministic
test ordering it's pretty useless nowadays.

> > I think the right approach is to add this to the test runner, i.e. piglit.
> > There's already very basic support to capture the (new) dmesg output for
> > each test with the --dmesg option. Have you played around with that and
> > tried to extend it to your liking?
> 
> My goal is that I want to know, inside a test program, which line of
> code introduced the dmesg error, and if we use some sort of external
> approach like what you're suggesting this won't be possible. I have
> code that opens hundreds of sysfs and debugfs files, and I want to
> check dmesg after I open/close every single file, to be able to detect
> which one exactly causes the problem. I'm already using this locally
> and it *really* saved a lot of time for me. If we don 't accept this
> code inside drmtest.c, I'm gonna ask if I can push it directly to
> pm_pc8.c.

Hm, thus far I've just looked at the functions in the backtrace. Can you
give an example of which kinds of bugs your hunting that need this?

But in general I certainly don't want this in pc8.c. If it's indeed useful
to check dmesg after each step in some tests then having some helpers in
igt/lib makes sense. Just doing it for the overall test though just
duplicates functionality which already exists in piglit.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch