[PATCH i-g-t 1/2] lib/igt_kmemleak: library to interact with kmemleak

Mon Jan 27 16:28:51 UTC 2025

-----Original Message-----
From: Peter Senna Tschudin <peter.senna at linux.intel.com> 
Sent: Monday, January 27, 2025 3:20 AM
To: Cavitt, Jonathan <jonathan.cavitt at intel.com>; igt-dev at lists.freedesktop.org
Cc: stylon.wang at amd.com; Rodrigo.Siqueira at amd.com; Gandi, Ramadevi <ramadevi.gandi at intel.com>; Knop, Ryszard <ryszard.knop at intel.com>; Lattannavar, Sameer <sameer.lattannavar at intel.com>; De Marchi, Lucas <lucas.demarchi at intel.com>; Saarinen, Jani <jani.saarinen at intel.com>; Piecielska, Katarzyna <katarzyna.piecielska at intel.com>; Roper, Matthew D <matthew.d.roper at intel.com>; gregory.f.germano at intel.com; Taylor, Clinton A <clinton.a.taylor at intel.com>; Yu, Jianshui <jianshui.yu at intel.com>; Vivekanandan, Balasubramani <balasubramani.vivekanandan at intel.com>
Subject: Re: [PATCH i-g-t 1/2] lib/igt_kmemleak: library to interact with kmemleak
> 
> Hi Jonathan,
> 
> I sent v2 that took care of all but one of your comments here. Please see below.
> 
> On 22.01.2025 18:09, Cavitt, Jonathan wrote:
> > I left some notes below, though there are only two required changes:
> > 1. Move fclose in igt_kmemleak_found_leaks to after the conditional lseek to
> >     avoid undefined behavior.
> 
> Thank you for that. Fixed on v2.
> 
> > 2. Pass igt_kmemleak_sync as a function variable to igt_kmemleak instead
> >     of keeping it stored as a global variable initialized during igt_kmemelak_init.
> 
> Fixed on v2 as well.
> 
> > Once those are fixed:
> > Reviewed-by: Jonathan Cavitt <jonathan.cavitt at intel.com>
> > 
> > -----Original Message-----
> > From: Peter Senna Tschudin <peter.senna at linux.intel.com> 
> > Sent: Tuesday, January 21, 2025 3:30 AM
> > To: igt-dev at lists.freedesktop.org
> > Cc: Peter Senna Tschudin <peter.senna at linux.intel.com>; stylon.wang at amd.com; Rodrigo.Siqueira at amd.com; Gandi, Ramadevi <ramadevi.gandi at intel.com>; Knop, Ryszard <ryszard.knop at intel.com>; Lattannavar, Sameer <sameer.lattannavar at intel.com>; De Marchi, Lucas <lucas.demarchi at intel.com>; Saarinen, Jani <jani.saarinen at intel.com>; Piecielska, Katarzyna <katarzyna.piecielska at intel.com>; Roper, Matthew D <matthew.d.roper at intel.com>; gregory.f.germano at intel.com; Taylor, Clinton A <clinton.a.taylor at intel.com>; Yu, Jianshui <jianshui.yu at intel.com>; Vivekanandan, Balasubramani <balasubramani.vivekanandan at intel.com>; Cavitt, Jonathan <jonathan.cavitt at intel.com>
> > Subject: [PATCH i-g-t 1/2] lib/igt_kmemleak: library to interact with kmemleak

[...]

> >> +{
> >> +	/* Scan to collect results */
> >> +	if (igt_kmemleak_scan())
> > 
> > NIT:
> > Should this function also fail if igt_kmemleak_scan fails?  AFAICT,
> > igt_kmemleak_scan fails in one of two cases:
> > 
> > 1. Error when sending command.  This should probably result in a failure
> > because an error has occurred.
> > 2. No leaks found.  Assuming we're expecting to find leaks, this should
> > probably also constitute a failure.
> > 
> > Though in the case where only one of the two above causes issues, maybe
> > we should add return values to igt_kmemleak_scan?  Perhaps 0 on success,
> > -EBUSY for a write failure, and -EIO if no leaks are found?  I'm just choosing
> > some arbitrary error values here; these aren't set in stone.
> 
> This is the one I did not act upon. First we do not want igt_runner to crash or
> abort if something goes wrong with kmemleak. So we don't really need the extra
> details about the error. Currently if the call to igt_kmemleak_init() fails, it
> will disable kmemleaks on the runner. My take was that errors detected on
> igt_kmemleak() would either mean that the system is in a really bad shape or
> that something unimportant happend. Either way nothing to worry about.
> 
> Second, we actually expect that no mem leaks will be found on the vast majority
> of calls. It took me hundreds of attempts to learn how to trigger one reliably,
> and with xe tests it happens but not that often.
> 
> So I left this as is for now. Please let me know if any of this is nonsensical.

That makes sense.  If we're expecting to not see any kmemleak logs in the general case,
then failing on not finding the logs would be bad.  And if igt_kmemleak_cmd fails, that
probably means one of two things:
1. kmemleaks was disabled on the system
2. The system was broken in some unrelated way
Blocking on the former doesn't make sense, and the latter should've been caught by the
system or tests earlier in execution.

I mean, we probably still want to report in the "unrelated system break" case since we
(unfortunately) can't make any assertions on whether or not the system break has
already been reported.  And if we wanted to do something like that, we'd likely need
to check that kmemleak has been enabled separately before running the scan here.
However, given the explanation that was provided, this is just a possible suggestion
and not something I'm particularly adamant about.
-Jonathan Cavitt

> 
> > 
> >> +		if (!igt_kmemleak_append_to(last_test, resdirfd, kmemleak_each))
> >> +			return false;
> >> +
> >> +	if (kmemleak_each)
> >> +		igt_kmemleak_clear();
> >> +
> >> +	return true;
> >> +}

[...]

> >> -- 
> >> 2.34.1
> >>
> >>
> 
>