[PATCH i-g-t 1/1] tests/intel/kms_ccs: add hiberbate test
Gupta, Anshuman
anshuman.gupta at intel.com
Thu Nov 28 06:19:17 UTC 2024
> -----Original Message-----
> From: igt-dev <igt-dev-bounces at lists.freedesktop.org> On Behalf Of Juha-
> Pekka Heikkilä
> Sent: Wednesday, November 27, 2024 6:51 PM
> To: Vivi, Rodrigo <rodrigo.vivi at intel.com>
> Cc: Development mailing list for IGT GPU Tools <igt-dev at lists.freedesktop.org>
> Subject: Re: [PATCH i-g-t 1/1] tests/intel/kms_ccs: add hiberbate test
>
> On Tue, Nov 26, 2024 at 7:12 PM Juha-Pekka Heikkilä
> <juhapekka.heikkila at gmail.com> wrote:
> >
> > On Tue, Nov 26, 2024 at 3:00 PM Rodrigo Vivi <rodrigo.vivi at intel.com>
> wrote:
> > >
> > > On Tue, Nov 26, 2024 at 10:07:37AM +0200, Juha-Pekka Heikkilä wrote:
> > > > Seems I accidentally replied only to Rodrigo so I'll add igt-dev back here.
> > > >
> > > > On Mon, Nov 25, 2024 at 10:03 PM Juha-Pekka Heikkilä
> > > > <juhapekka.heikkila at gmail.com> wrote:
> > > > >
> > > > > On Mon, Nov 25, 2024 at 8:20 PM Rodrigo Vivi
> <rodrigo.vivi at intel.com> wrote:
> > > > > >
> > > > > > On Mon, Nov 25, 2024 at 06:19:58PM +0200, Juha-Pekka Heikkila
> wrote:
> > > > > > > Add hibernate test which bring entire system down for short
> > > > > > > hibernate. This mode is added to suspend tests to be run
> > > > > > > manually with '-r' flag because this is not ci friendly
> > > > > > > test, on hibernate ci would lose connection to the hibernated box.
> > > > > >
> > > > > > I know that nowadays it is a beast to get hibernate to work in
> > > > > > the distros, but afaik our CI is prepared for that, otherwise
> > > > > > we wouldn't be able to get these:
> > > > > >
> > > > > > https://intel-gfx-ci.01.org/tree/intel-xe/index.html?testfilte
> > > > > > r=s4
> > > > >
> > > > > If you go see those test machine bootlogs you see there is no
> > > > > resume configured. I think all these current s4 tests work with
> > > > > pm_test hence also original problem where tests didn't reproduce
> > > > > that broken ccs after resume. If you try running on your own box
> > > > > any of these s4 tests you'll see they don't go fully down. I
> > > > > wanted to have a test that does the same as real hibernate.
> > >
> > > hmmm... something indeed changed because of CI.
Network driver does not resume properly and take time during actual hibernation,
AFAIU the problem might br igt tests are forked by igt_runner and igt_runner is a child of bash ssh instance.
Ssh connection are prone to disconnect on loss of network that will kill the bash and igt runner and igt test will be zombie.
If we can run igt runner as a background process or as native on console, and should make igt_runner to wait for network resume.
Thanks,
Anshuman.
> > >
> > > I originally implemented them without SUSPEND_TEST_DEVICES, in a way
> > > that the machine was going fully down in my ADL+DG2 machine.
> > >
> > > But commit 4b767566bbc ("tests/intel/xe_pm: S4 to go up to devices
> > > only") modified that to make it work for CI.
> > >
> > > So, perhaps we can do both, have the manual setup with -r and no
> > > TEST and have the one with TEST but not fully down.
> > >
> > > Have you reproduced the CSS reported bug? Then have you tried the
> > > solution with the TEST?
> >
> > I suspect I earlier got lot of frozen test machines, with these test
> > it always affect testing of new test. This test I now have here does
> > reproduce that ccs bug with kernel where it's not fixed.
> >
> > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > For this test to work kernel resume point need to be set
> > > > > > > using swapfile, from kernel command line is checked if there
> > > > > > > is found something along the lines of
> > > > > > > "resume=/dev/nvme0n1p2 resume_offset=73527296" or so to
> > > > > > > verify hibernate will be successfull.
> > > > > >
> > > > > > Indeed painful nowadays... I can't even believe this gap came
> > > > > > from user report... is anyone really still using hibernation
> > > > > > nowadays?! :)
> > > > > >
> > > > > > >
> > > > > > > Signed-off-by: Juha-Pekka Heikkila
> > > > > > > <juhapekka.heikkila at gmail.com>
> > > > > > > ---
> > > > > > > tests/intel/kms_ccs.c | 162
> > > > > > > +++++++++++++++++++++++++++++++++++++++++-
> > > > > > > 1 file changed, 159 insertions(+), 3 deletions(-)
> > > > > > >
> > > > > > > diff --git a/tests/intel/kms_ccs.c b/tests/intel/kms_ccs.c
> > > > > > > index 3e9a57863..fd2fe9d3d 100644
> > > > > > > --- a/tests/intel/kms_ccs.c
> > > > > > > +++ b/tests/intel/kms_ccs.c
> > > > > > > @@ -190,6 +190,7 @@ typedef struct {
> > > > > > > bool user_seed;
> > > > > > > enum igt_commit_style commit;
> > > > > > > int fb_list_length;
> > > > > > > + bool do_hibernate;
> > > > > > > struct {
> > > > > > > struct igt_fb fb;
> > > > > > > int width, height; @@ -271,6 +272,154 @@
> > > > > > > static const struct {
> > > > > > > */
> > > > > > > #define MAX_SPRITE_PLANE_WIDTH 2000
> > > > > > >
> > > > > > > +/* constants for hibenate test */ #define RTC_WAKE_CMD
> > > > > > > +"rtcwake -m no -s %d"
> > > > > >
> > > > > > Why are you calling rtcwake directly instead of using
> > > > > > igt_system_suspend_autoresume(SUSPEND_STATE_DISK...) ?!
> > > > > >
> > > > > > Please check lib/igt_aux lib/igt_pm and tests/xe/xe_pm
> > > > >
> > > > > I wanted rtc only to wake up the machine. On
> > > > > igt_system_suspend_autoresume(..) I'd get pm_test handling which
> > > > > I want to avoid here.
> > >
> > > you can do that.... just pass SUSPEND_TEST_NONE instead of
> > > SUSPEND_TEST_DEVICES as the second argument. So, a lot of the logic
> > > here can be simplified.
> > >
> > > And as I told perhaps we can have 3 cases:
> > >
> > > hibernate-manual:
> > > igt_system_suspend_autoresume(SUSPEND_STATE_DISK,
> > > SUSPEND_TEST_NONE);
> > > hibernate-auto:
> > > igt_system_suspend_autoresume(SUSPEND_STATE_DISK,
> > > SUSPEND_TEST_DEVICES);
> > > suspend:
> > > igt_system_suspend_autoresume(SUSPEND_STATE_MEM,
> > > SUSPEND_TEST_NONE);
> >
> > I'll try that, if that reproduce the bug then I can cut out those
> > hibernate_system(..) and set_rtc_wake_timer(..) and move
> > check_hibernation_support(..) to libigt. For reproducing I'll anyway
> > need to go to lab to see the machine, when I try these on ril system
> > it seems to want to restore my machine when it first time stay down
> > instead of resuming.
>
> There is something fishy going on with
> igt_system_suspend_autoresume(..). When I try to use SUSPEND_STATE_DISK
> and SUSPEND_TEST_NONE, lnl boxes I've tried never successfully hibernated.
> These boxes start to go down but don't power off and I see same code always
> stuck on board led display. I'm not able to pinpoint the issue since I'm not so
> familiar with pm code. On same boxes, the hibernate code I have on this patch
> did work 5/5. As above mentioned difference is I use rtc to wake the system
> but I on my own go write /sys/power/state. While it should do the same
> thing, something is different..no idea could it be just some firmware issue vs
> timing but result is anyway different. I had quick try with mtl and
> i915 igt_system_suspend_autoresume(..) where things did work as expected.
>
> >
> > >
> > >
> > > > >
> > > > > >
> > > > > > > +#define PROC_CMDLINE "/proc/cmdline"
> > > > > > > +#define SYS_POWER_STATE "/sys/power/state"
> > > > > > > +#define GRUB_CFG_PATH "/boot/grub/grub.cfg"
> > > > > > > +
> > > > > > > +static void check_hibernation_support(void) {
> > > > > > > + int fd;
> > > > > > > + char buffer[2048];
> > > > > > > + ssize_t bytes_read;
> > > > > > > + FILE *cmdline;
> > > > > > > +
> > > > > > > + /* Check if hibernation is supported in /sys/power/state */
> > > > > > > + fd = open(SYS_POWER_STATE, O_RDONLY);
> > > > > > > + igt_require_f(fd > 0, "Failed to open /sys/power/state");
> > > > > > > + bytes_read = read(fd, buffer, sizeof(buffer) - 1);
> > > > > > > + close(fd);
> > > > > > > +
> > > > > > > + igt_require_f(bytes_read > 0, "Failed to read
> > > > > > > + /sys/power/state");
> > > > > > > +
> > > > > > > + buffer[bytes_read] = '\0';
> > > > > > > + igt_require_f(strstr(buffer, "disk") != NULL,
> > > > > > > + "Hibernation (suspend to disk) is not
> > > > > > > + supported on this system.\n");
> > > > > > > +
> > > > > > > + /* Check if resume is configured in kernel command line */
> > > > > > > + cmdline = fopen(PROC_CMDLINE, "r");
> > > > > > > + igt_require_f(cmdline, "Failed to open /proc/cmdline");
> > > > > > > + fread(buffer, 1, sizeof(buffer) - 1, cmdline);
> > > > > > > + fclose(cmdline);
> > > > > > > +
> > > > > > > + igt_require_f(strstr(buffer, "resume="),
> > > > > > > + "Kernel does not have 'resume' parameter
> > > > > > > +configured for hibernation.\n"); }
> > > > > >
> > > > > > we should probably have this in the igt_pm lib to be used in
> > > > > > other places like xe_pm. I remember we had discussions when s4
> > > > > > started failing in CI, but apparently the solution was to make
> > > > > > CI to work with it instead of protecting our s4 tests...
> > > > >
> > > > > I was last week talking with Jari Tahvanainen about this. Our ci
> > > > > is not configured for the machines really hibernating, he
> > > > > suggested we first start with one test, start to run it in
> > > > > special box, and we go on from there .. so, here we are :) I
> > > > > agree final destination for this function should be in lib/ like
> > > > > in igt_pm or so.
> > > > >
> > > > > >
> > > > > > > +
> > > > > > > +static void set_rtc_wake_timer(int seconds) {
> > > > > > > + char command[256];
> > > > > > > +
> > > > > > > + snprintf(command, sizeof(command), RTC_WAKE_CMD,
> > > > > > > + seconds);
> > > > > > > +
> > > > > > > + igt_require_f(system(command) == 0,
> > > > > > > + "Failed to set RTC wake timer.\n"); }
> > > > > > > +
> > > > > > > +static void hibernate_system(void) {
> > > > > > > + int fd;
> > > > > > > +
> > > > > > > + fd = open(SYS_POWER_STATE, O_WRONLY);
> > > > > > > + igt_require_f(fd > 0, "Failed to open
> > > > > > > + /sys/power/state");
> > > > > > > +
> > > > > > > + if (write(fd, "disk", 4) != 4) {
> > > > > > > + close(fd);
> > > > > > > + igt_require_f(0, "Failed to write 'disk' to /sys/power/state");
> > > > > > > + }
> > > > > > > + close(fd);
> > > > > > > +}
> > > > > > > +
> > > > > > > +static void ensure_grub_boots_same_kernel(void)
> > > > > >
> > > > > > I don't believe that this should be to the test case to ensure.
> > > > > > This should be ensured by the infra to support these s4 testcases.
> > > > >
> > > > > This is bit so-so. If guys would run this test on their own
> > > > > boxes to do some debugging, forgetting to always set the correct
> > > > > kernel for the next in-test reboot would cause incorrect results.
> > > > >
> > > > > >
> > > > > > > +{
> > > > > > > + char cmdline[1024];
> > > > > > > + char current_kernel[256];
> > > > > > > + char last_menuentry[512] = "";
> > > > > > > + char grub_entry[512];
> > > > > > > + char command[1024];
> > > > > > > + FILE *cmdline_file, *grub_cfg;
> > > > > > > + char line[1024];
> > > > > > > + bool kernel_found = false;
> > > > > > > + char *kernel_arg;
> > > > > > > + char *kernel_end;
> > > > > > > +
> > > > > > > + /* Read /proc/cmdline to get the current kernel image */
> > > > > > > + cmdline_file = fopen(PROC_CMDLINE, "r");
> > > > > > > + igt_require_f(cmdline_file, "Failed to open
> > > > > > > + /proc/cmdline");
> > > > > > > +
> > > > > > > + if (!fgets(cmdline, sizeof(cmdline), cmdline_file)) {
> > > > > > > + fclose(cmdline_file);
> > > > > > > + igt_require_f(0, "Failed to read /proc/cmdline");
> > > > > > > + }
> > > > > > > + fclose(cmdline_file);
> > > > > > > +
> > > > > > > + /* Parse the kernel image from cmdline */
> > > > > > > + kernel_arg = strstr(cmdline, "BOOT_IMAGE=");
> > > > > > > + igt_require_f(kernel_arg, "BOOT_IMAGE= not found in
> > > > > > > + /proc/cmdline\n");
> > > > > > > +
> > > > > > > + kernel_arg += strlen("BOOT_IMAGE=");
> > > > > > > + kernel_end = strchr(kernel_arg, ' ');
> > > > > > > +
> > > > > > > + if (!kernel_end)
> > > > > > > + kernel_end = kernel_arg + strlen(kernel_arg);
> > > > > > > +
> > > > > > > + snprintf(current_kernel, sizeof(current_kernel), "%.*s",
> > > > > > > + (int)(kernel_end - kernel_arg), kernel_arg);
> > > > > > > + igt_debug("Current kernel image: %s\n",
> > > > > > > + current_kernel);
> > > > > > > +
> > > > > > > + /* Open GRUB config file to find matching entry */
> > > > > > > + grub_cfg = fopen(GRUB_CFG_PATH, "r");
> > > > > > > + igt_require_f(grub_cfg, "Failed to open GRUB
> > > > > > > + configuration file");
> > > > > > > +
> > > > > > > +
> > > > > > > + while (fgets(line, sizeof(line), grub_cfg)) {
> > > > > > > + /* Check if the line contains a menuentry */
> > > > > > > + if (strstr(line, "menuentry")) {
> > > > > > > + /* Store the menuentry line */
> > > > > > > + char *start = strchr(line, '\'');
> > > > > > > + char *end = start ? strchr(start + 1,
> > > > > > > + '\'') : NULL;
> > > > > > > +
> > > > > > > + if (start && end) {
> > > > > > > + snprintf(last_menuentry,
> > > > > > > + sizeof(last_menuentry),
> > > > > > > + "%.*s", (int)(end - start - 1),
> > > > > > > + start + 1);
> > > > > > > + }
> > > > > > > + }
> > > > > > > +
> > > > > > > + /* Check if the current line contains the kernel */
> > > > > > > + if (strstr(line, current_kernel)) {
> > > > > > > + /* Use the last seen menuentry as the match */
> > > > > > > + snprintf(grub_entry, sizeof(grub_entry), "%s",
> > > > > > > + last_menuentry);
> > > > > > > + kernel_found = true;
> > > > > > > + break;
> > > > > > > + }
> > > > > > > + }
> > > > > > > +
> > > > > > > + fclose(grub_cfg);
> > > > > > > +
> > > > > > > + igt_require_f(kernel_found, "Failed to find matching GRUB entry
> for kernel: %s\n",
> > > > > > > + current_kernel);
> > > > > > > +
> > > > > > > + /* Set the GRUB boot target using grub-reboot */
> > > > > > > + snprintf(command, sizeof(command), "grub-reboot \"%s\"",
> grub_entry);
> > > > > > > + igt_require_f(system(command) == 0, "Failed to set GRUB boot
> target to: %s\n",
> > > > > > > + grub_entry);
> > > > > > > +
> > > > > > > + igt_debug("Set GRUB to boot kernel: %s (GRUB entry: %s)\n",
> > > > > > > + current_kernel, grub_entry); }
> > > > > > > +
> > > > > > > +static void hibernate_autoresume(int resume_delay) {
> > > > > > > + check_hibernation_support();
> > > > > > > + set_rtc_wake_timer(resume_delay);
> > > > > > > + ensure_grub_boots_same_kernel();
> > > > > > > + hibernate_system();
> > > > > > > +}
> > > > > > > +
> > > > > > > static void addfb_init(struct igt_fb *fb, struct
> > > > > > > drm_mode_fb_cmd2 *f) {
> > > > > > > int i;
> > > > > > > @@ -839,8 +988,11 @@ static bool try_config(data_t *data,
> > > > > > > enum test_fb_flags fb_flags,
> > > > > > >
> > > > > > > if (ret == 0 && !(fb_flags & TEST_BAD_ROTATION_90) && crc) {
> > > > > > > if (data->flags & TEST_SUSPEND && fb_flags &
> FB_COMPRESSED) {
> > > > > > > -
> igt_system_suspend_autoresume(SUSPEND_STATE_MEM,
> > > > > > > - SUSPEND_TEST_NONE);
> > > > > > > + if (data->do_hibernate)
> > > > > > > + hibernate_autoresume(30);
> > > > > > > + else
> > > > > > > +
> igt_system_suspend_autoresume(SUSPEND_STATE_MEM,
> > > > > > > +
> > > > > > > + SUSPEND_TEST_NONE);
> > > > > > >
> > > > > > > /* on resume check flat ccs is still compressed */
> > > > > > > if (is_xe_device(data->drm_fd) && @@
> > > > > > > -1044,6 +1196,9 @@ static int opt_handler(int opt, int opt_index,
> void *opt_data)
> > > > > > > data->user_seed = true;
> > > > > > > data->seed = strtoul(optarg, NULL, 0);
> > > > > > > break;
> > > > > > > + case 'r':
> > > > > > > + data->do_hibernate = true;
> > > > > > > + break;
> > > > > > > default:
> > > > > > > return IGT_OPT_HANDLER_ERROR;
> > > > > > > }
> > > > > > > @@ -1056,9 +1211,10 @@ static data_t data; static const
> > > > > > > char *help_str = " -c\t\tCheck the presence of compression
> > > > > > > meta-data\n"
> > > > > > > " -s <seed>\tSeed for random number generator\n"
> > > > > > > +" -r\t\tOn suspend test do full hibernate with reboot\n"
> > > > > > > ;
> > > > > > >
> > > > > > > -igt_main_args("cs:", NULL, help_str, opt_handler, &data)
> > > > > > > +igt_main_args("csr:", NULL, help_str, opt_handler, &data)
> > > > > > > {
> > > > > > > igt_fixture {
> > > > > > > data.drm_fd =
> > > > > > > drm_open_driver_master(DRIVER_INTEL | DRIVER_XE);
> > > > > > > --
> > > > > > > 2.45.2
> > > > > > >
More information about the igt-dev
mailing list