[PATCH i-g-t v4 1/2] lib/igt_kmemleak: library to interact with kmemleak

Wed Feb 12 08:36:40 UTC 2025

On Tue, Feb 11, 2025 at 08:12:16PM +0100, Peter Senna Tschudin wrote:
> Hi Zbigniew,
> 
> Thank you for your comments, please see my answers below.
> 
> On 11.02.2025 17:17, Zbigniew Kempczyński wrote:
> > On Mon, Feb 03, 2025 at 10:23:00AM +0100, Peter Senna Tschudin wrote:
> >> Hi Kamil,
> >>
> >> Thank you for your review. Please see below.
> >>
> >> On 31.01.2025 13:34, Kamil Konieczny wrote:
> >>
> >> [...]
> >>
> >>>> +#include <stdio.h>
> >>>> +#include <string.h>
> >>>> +#include <unistd.h>
> >>>> +#include <fcntl.h>
> >>>> +#include <errno.h>
> >>>
> >>> Sort this alphabetically with exception of unistd.h - this
> >>> header could be first.
> >>
> >> That is picky, but ok.
> >>
> >>
> >>>
> >>>> +
> >>>> +#include "igt_core.h"
> >>>> +#include "igt_kmemleak.h"
> >>>> +
> >>>> +/* We can change the path for unit testing, see igt_kmemleak_init() */
> >>>> +static char igt_kmemleak_file[256] = "/sys/kernel/debug/kmemleak";
> >>>> +
> >>>> +#define MAX_WRITE_RETRIES 5
> >>>> +/**
> >>>> + * igt_kmemleak_write - Writes the buffer to the file descriptor retrying when
> >>>> + * possible.
> >>>> + * @fd: The file descriptor to write to.
> >>>> + * @buf: Pointer to the data to write.
> >>>> + * @count: Total number of bytes to write.
> >>>> + *
> >>>> + * Returns the total number of bytes written on success, or -1 on failure.
> >>>> + */
> >>>> +static bool igt_kmemleak_write(int fd, const void *buf, size_t count)
> >>>> +{
> >>>> +	const char *ptr = buf;
> >>>> +	size_t remaining = count;
> >>>> +	ssize_t written;
> >>>> +	int retries = 0;
> >>>> +
> >>>> +	while (remaining > 0) {
> >>>> +		written = write(fd, ptr, remaining);
> >>>> +		if (written > 0) {
> >>>> +			ptr += written;
> >>>> +			remaining -= written;
> >>>> +		} else if (written == -1) {
> >>>> +			if (errno == EINTR || errno == EAGAIN) {
> >>>> +				/* Retry for recoverable errors */
> >>>> +				if (++retries > MAX_WRITE_RETRIES) {
> >>>> +					igt_warn("igt_write: Exceeded retry limit\n");
> >>>
> >>> Do not use test prints in lib, especially those used by runner.
> >>
> >> Oh, I will. kmemleak is about writing to disk. If that fails, I will tell the user.
> >> Why does this restriction applies only to me? See:
> >>
> >> ---
> >> psennats at friendship7-home:~/UPSTREAM/igt-gpu-tools/lib$ git grep igt_warn|wc -l
> >> 164
> >> psennats at friendship7-home:~/UPSTREAM/igt-gpu-tools/lib$ cd ..
> >> psennats at friendship7-home:~/UPSTREAM/igt-gpu-tools$ git grep igt_warn|wc -l
> >> 324
> >> ---
> >>
> >> About 50% of all references to igt_warn are in lib, what is the problem with my code?
> > 
> > I'm looking at the code and I have impression this should be part of
> > the runner, not igt library. Do I understand correctly that your
> > code will be used by the runner only and not by the tests?
> 
> It is difficult to know for now. We have one test that will use the lib
> as is: `tests/amdgpu/amd_mem_leak.c`. But that is not the main potential
> use case.
> 
> As I describe in the cover letter, transient leaks are kind of tricky,
> and we may require to request a scan from a test itself for better
> coverage. So in short I honestly do not know if we will have test
> integration in the future or not.

Fundamental question is - where do you want to scan/clear - on the runner
level or on the test? What will happen if you start to clear-scan/clear
both from runner (-k) and from the test itself. I've intentionally
added clear first to limit catched leaks to period where test was
executed.

On the runner level we would collect for all tests, on the test level
I imagine command line argument passed to the test like -k which would
enable the scan on the very beginning. But this would require to rewrite
all tests which are using pure main() instead of igt_main.*().
If really there's leak and someone wants to catch it, he can always
simply wrap around script execution by kmemleak clear-scan/clear.

Resume - I would stick to the runner level, I see no judgement to change
igt_core / tests to use this library directly.

> 
> > 
> > Regarding your above question, I think runner should use as much as
> > minimal from lib/ directory, because this is collection of functions
> > designed for tests, not for runner. I think generic code like data
> > structures (lists) are fine, but not the others. Imagine you'll
> > incidentally try to use igt_require() - follow the call and see
> > the pitfall there.
> 
> Currently the list of lib includes in the runner directory is not
> small (I used a dumb regex that looks for includes starting with igt_):
> 
> #include "igt_aux.h"
> #include "igt_core.h"
> #include "igt_facts.h"
> #include "igt_hook.h"
> #include "igt_list.h"
> #include "igt_taints.h"
> #include "igt_vec.h"
> 
> 
> > 
> > So if my understanding is correct and this code should be part of the
> > runner only your test might be moved to the runner_tests.c.
> 
> It is difficult to know at this stage. At least the amd test I mentioned
> earlier will use it.

I've explored runner code even if it is not in the center of my
interests. I've seen there're few places where it is tainted by igt
library code which might call logging from igt domain like igt_warn()
[igt_gettime()], or igt_assert() [igt_vec_elem()]. I've experimented
how runner code behaves with those calls, igt_warn() just logs but
output format differs than executor.c __logf__(). I think we should
be consistent in this case. Asserting __igt_fail_assert() unfortunately
uses igt logging but if we ever assert in the runner this will have
fatal consequences. IGT library contains a lot of asserts, so functions
from them should be meticulously reviewed before using in the runner.

--
Zbigniew

> 
> > 
> > BTW please rebase on top of current master, there were changes in
> > the runner/settings.c file that affects this series.
> 
> As soon as I am confident this will be merged, I surely will.
> 
> Thanks!
> 
> > 
> > --
> > Zbigniew
> > 
> >>
> >>
> >>>
> >>>> +					return false;
> >>>> +				}
> >>>> +				continue;
> >>>> +			} else {
> >>>> +				/* Log unrecoverable error */
> >>>> +				igt_warn("igt_write: unrecoverable write error");
> >>>
> >>> Same here.
> >> We should not remove this one.
> >>
> >>>
> >>>> +				return false;
> >>>> +			}
> >>>> +		} else if (written == 0) {
> >>>> +			if (++retries > MAX_WRITE_RETRIES) {
> >>>> +				igt_warn("igt_write: Exceeded retry limit\n");
> >>>
> >>> Same here.
> >> We should not remove this one.
> >>
> >> [...]
> >>
> >>>> +	FILE *fp;
> >>>
> >>> Why FILE* and fopen? Could you just use open/read?
> >>
> >> Why not?
> >>
> >> ---
> >> psennats at friendship7-home:~/UPSTREAM/igt-gpu-tools$ git grep fopen|wc -l
> >> 97
> >> ---
> >>
> >>
> >> [...]
> >>
> >>>> +bool igt_kmemleak_init(const char *unit_test_kmemleak_file)
> >>>> +{
> >>>> +	FILE *fp;
> >>>
> >>> Same here, open/read.
> >>
> >> Same here, why not?
> >> ---
> >> psennats at friendship7-home:~/UPSTREAM/igt-gpu-tools$ git grep fopen|wc -l
> >> 97
> >> ---
> >>
> >> [...]
> >>
> >>>> +#define KMEMLEAKRESFILENAME "kmemleak.txt"
> >>>
> >>> imho better: IGT_KMEMLEAK_RESFILENAME
> >>
> >> It reads better indeed. Thanks.
> >>
> >> [...]
>