[PATCH i-g-t 1/1] tests/intel/kms_ccs: add hiberbate test

Rodrigo Vivi rodrigo.vivi at intel.com
Thu Nov 28 12:44:47 UTC 2024


On Wed, Nov 27, 2024 at 03:21:17PM +0200, Juha-Pekka Heikkilä wrote:
> On Tue, Nov 26, 2024 at 7:12 PM Juha-Pekka Heikkilä
> <juhapekka.heikkila at gmail.com> wrote:
> >
> > On Tue, Nov 26, 2024 at 3:00 PM Rodrigo Vivi <rodrigo.vivi at intel.com> wrote:
> > >
> > > On Tue, Nov 26, 2024 at 10:07:37AM +0200, Juha-Pekka Heikkilä wrote:
> > > > Seems I accidentally replied only to Rodrigo so I'll add igt-dev back here.
> > > >
> > > > On Mon, Nov 25, 2024 at 10:03 PM Juha-Pekka Heikkilä
> > > > <juhapekka.heikkila at gmail.com> wrote:
> > > > >
> > > > > On Mon, Nov 25, 2024 at 8:20 PM Rodrigo Vivi <rodrigo.vivi at intel.com> wrote:
> > > > > >
> > > > > > On Mon, Nov 25, 2024 at 06:19:58PM +0200, Juha-Pekka Heikkila wrote:
> > > > > > > Add hibernate test which bring entire system down for short
> > > > > > > hibernate. This mode is added to suspend tests to be run
> > > > > > > manually with '-r' flag because this is not ci friendly test,
> > > > > > > on hibernate ci would lose connection to the hibernated box.
> > > > > >
> > > > > > I know that nowadays it is a beast to get hibernate to work in the distros,
> > > > > > but afaik our CI is prepared for that, otherwise we wouldn't be able to
> > > > > > get these:
> > > > > >
> > > > > > https://intel-gfx-ci.01.org/tree/intel-xe/index.html?testfilter=s4
> > > > >
> > > > > If you go see those test machine bootlogs you see there is no resume
> > > > > configured. I think all these current s4 tests work with pm_test hence
> > > > > also original problem where tests didn't reproduce that broken ccs
> > > > > after resume. If you try running on your own box any of these s4 tests
> > > > > you'll see they don't go fully down. I wanted to have a test that does
> > > > > the same as real hibernate.
> > >
> > > hmmm... something indeed changed because of CI.
> > >
> > > I originally implemented them without SUSPEND_TEST_DEVICES, in a way that
> > > the machine was going fully down in my ADL+DG2 machine.
> > >
> > > But commit 4b767566bbc ("tests/intel/xe_pm: S4 to go up to devices only")
> > > modified that to make it work for CI.
> > >
> > > So, perhaps we can do both, have the manual setup with -r and no TEST
> > > and have the one with TEST but not fully down.
> > >
> > > Have you reproduced the CSS reported bug? Then have you tried the solution
> > > with the TEST?
> >
> > I suspect I earlier got lot of frozen test machines, with these test
> > it always affect testing of new test. This test I now have here does
> > reproduce that ccs bug with kernel where it's not fixed.
> >
> > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > For this test to work kernel resume point need to be set using swapfile, from
> > > > > > > kernel command line is checked if there is found something along the lines of
> > > > > > > "resume=/dev/nvme0n1p2 resume_offset=73527296" or so to verify hibernate
> > > > > > > will be successfull.
> > > > > >
> > > > > > Indeed painful nowadays... I can't even believe this gap came from user
> > > > > > report... is anyone really still using hibernation nowadays?! :)
> > > > > >
> > > > > > >
> > > > > > > Signed-off-by: Juha-Pekka Heikkila <juhapekka.heikkila at gmail.com>
> > > > > > > ---
> > > > > > >  tests/intel/kms_ccs.c | 162 +++++++++++++++++++++++++++++++++++++++++-
> > > > > > >  1 file changed, 159 insertions(+), 3 deletions(-)
> > > > > > >
> > > > > > > diff --git a/tests/intel/kms_ccs.c b/tests/intel/kms_ccs.c
> > > > > > > index 3e9a57863..fd2fe9d3d 100644
> > > > > > > --- a/tests/intel/kms_ccs.c
> > > > > > > +++ b/tests/intel/kms_ccs.c
> > > > > > > @@ -190,6 +190,7 @@ typedef struct {
> > > > > > >       bool user_seed;
> > > > > > >       enum igt_commit_style commit;
> > > > > > >       int fb_list_length;
> > > > > > > +     bool do_hibernate;
> > > > > > >       struct {
> > > > > > >               struct igt_fb fb;
> > > > > > >               int width, height;
> > > > > > > @@ -271,6 +272,154 @@ static const struct {
> > > > > > >   */
> > > > > > >  #define MAX_SPRITE_PLANE_WIDTH 2000
> > > > > > >
> > > > > > > +/* constants for hibenate test */
> > > > > > > +#define RTC_WAKE_CMD "rtcwake -m no -s %d"
> > > > > >
> > > > > > Why are you calling rtcwake directly instead of using
> > > > > > igt_system_suspend_autoresume(SUSPEND_STATE_DISK...) ?!
> > > > > >
> > > > > > Please check lib/igt_aux lib/igt_pm and tests/xe/xe_pm
> > > > >
> > > > > I wanted rtc only to wake up the machine. On
> > > > > igt_system_suspend_autoresume(..) I'd get pm_test handling which I
> > > > > want to avoid here.
> > >
> > > you can do that.... just pass SUSPEND_TEST_NONE instead of
> > > SUSPEND_TEST_DEVICES as the second argument. So, a lot of the logic
> > > here can be simplified.
> > >
> > > And as I told perhaps we can have 3 cases:
> > >
> > > hibernate-manual:
> > >  igt_system_suspend_autoresume(SUSPEND_STATE_DISK,
> > >                                SUSPEND_TEST_NONE);
> > > hibernate-auto:
> > > igt_system_suspend_autoresume(SUSPEND_STATE_DISK,
> > >                               SUSPEND_TEST_DEVICES);
> > > suspend:
> > > igt_system_suspend_autoresume(SUSPEND_STATE_MEM,
> > >                               SUSPEND_TEST_NONE);
> >
> > I'll try that, if that reproduce the bug then I can cut out those
> > hibernate_system(..) and set_rtc_wake_timer(..) and move
> > check_hibernation_support(..) to libigt. For reproducing I'll anyway
> > need to go to lab to see the machine, when I try these on ril system
> > it seems to want to restore my machine when it first time stay down
> > instead of resuming.
> 
> There is something fishy going on with
> igt_system_suspend_autoresume(..). When I try to use
> SUSPEND_STATE_DISK and SUSPEND_TEST_NONE, lnl boxes I've tried never
> successfully hibernated. These boxes start to go down but don't power
> off and I see same code always stuck on board led display. I'm not
> able to pinpoint the issue since I'm not so familiar with pm code. On
> same boxes, the hibernate code I have on this patch did work 5/5. As
> above mentioned difference is I use rtc to wake the system but I on my
> own go write  /sys/power/state. While it should do the same thing,
> something is different..no idea could it be just some firmware issue
> vs timing but result is anyway different. I had quick try with mtl and
> i915 igt_system_suspend_autoresume(..) where things did work as
> expected.

hmm that's indeed interesting... the autoresume uses rtcwake anyway no?
perhaps some different option passed to rtcwake?

well, to be honest, I have seen many cases where our autoresume gets
stuck in different platforms with differents levels of suspend :/

Maybe we could try to improve our autoresume to ensure it is taking
your safer sequence here? or at least have your safer sequence as
an alternative inside the lib/igt_pm as well?

Thanks a lot for all the experiments

> 
> >
> > >
> > >
> > > > >
> > > > > >
> > > > > > > +#define PROC_CMDLINE "/proc/cmdline"
> > > > > > > +#define SYS_POWER_STATE "/sys/power/state"
> > > > > > > +#define GRUB_CFG_PATH "/boot/grub/grub.cfg"
> > > > > > > +
> > > > > > > +static void check_hibernation_support(void)
> > > > > > > +{
> > > > > > > +     int fd;
> > > > > > > +     char buffer[2048];
> > > > > > > +     ssize_t bytes_read;
> > > > > > > +     FILE *cmdline;
> > > > > > > +
> > > > > > > +     /* Check if hibernation is supported in /sys/power/state */
> > > > > > > +     fd = open(SYS_POWER_STATE, O_RDONLY);
> > > > > > > +     igt_require_f(fd > 0, "Failed to open /sys/power/state");
> > > > > > > +     bytes_read = read(fd, buffer, sizeof(buffer) - 1);
> > > > > > > +     close(fd);
> > > > > > > +
> > > > > > > +     igt_require_f(bytes_read > 0, "Failed to read /sys/power/state");
> > > > > > > +
> > > > > > > +     buffer[bytes_read] = '\0';
> > > > > > > +     igt_require_f(strstr(buffer, "disk") != NULL,
> > > > > > > +                   "Hibernation (suspend to disk) is not supported on this system.\n");
> > > > > > > +
> > > > > > > +     /* Check if resume is configured in kernel command line */
> > > > > > > +     cmdline = fopen(PROC_CMDLINE, "r");
> > > > > > > +     igt_require_f(cmdline, "Failed to open /proc/cmdline");
> > > > > > > +     fread(buffer, 1, sizeof(buffer) - 1, cmdline);
> > > > > > > +     fclose(cmdline);
> > > > > > > +
> > > > > > > +     igt_require_f(strstr(buffer, "resume="),
> > > > > > > +                   "Kernel does not have 'resume' parameter configured for hibernation.\n");
> > > > > > > +}
> > > > > >
> > > > > > we should probably have this in the igt_pm lib to be used in other places
> > > > > > like xe_pm. I remember we had discussions when s4 started failing in CI,
> > > > > > but apparently the solution was to make CI to work with it instead of
> > > > > > protecting our s4 tests...
> > > > >
> > > > > I was last week talking with Jari Tahvanainen about this. Our ci is
> > > > > not configured for the machines really hibernating, he suggested we
> > > > > first start with one test, start to run it in special box, and we go
> > > > > on from there .. so, here we are :)
> > > > > I agree final destination for this function should be in lib/ like in
> > > > > igt_pm or so.
> > > > >
> > > > > >
> > > > > > > +
> > > > > > > +static void set_rtc_wake_timer(int seconds)
> > > > > > > +{
> > > > > > > +     char command[256];
> > > > > > > +
> > > > > > > +     snprintf(command, sizeof(command), RTC_WAKE_CMD, seconds);
> > > > > > > +
> > > > > > > +     igt_require_f(system(command) == 0,
> > > > > > > +                   "Failed to set RTC wake timer.\n");
> > > > > > > +}
> > > > > > > +
> > > > > > > +static void hibernate_system(void)
> > > > > > > +{
> > > > > > > +     int fd;
> > > > > > > +
> > > > > > > +     fd = open(SYS_POWER_STATE, O_WRONLY);
> > > > > > > +     igt_require_f(fd > 0, "Failed to open /sys/power/state");
> > > > > > > +
> > > > > > > +     if (write(fd, "disk", 4) != 4) {
> > > > > > > +             close(fd);
> > > > > > > +             igt_require_f(0, "Failed to write 'disk' to /sys/power/state");
> > > > > > > +     }
> > > > > > > +     close(fd);
> > > > > > > +}
> > > > > > > +
> > > > > > > +static void ensure_grub_boots_same_kernel(void)
> > > > > >
> > > > > > I don't believe that this should be to the test case to ensure.
> > > > > > This should be ensured by the infra to support these s4 testcases.
> > > > >
> > > > > This is bit so-so. If guys would run this test on their own boxes to
> > > > > do some debugging, forgetting to always set the correct kernel for the
> > > > > next in-test reboot would cause incorrect results.
> > > > >
> > > > > >
> > > > > > > +{
> > > > > > > +     char cmdline[1024];
> > > > > > > +     char current_kernel[256];
> > > > > > > +     char last_menuentry[512] = "";
> > > > > > > +     char grub_entry[512];
> > > > > > > +     char command[1024];
> > > > > > > +     FILE *cmdline_file, *grub_cfg;
> > > > > > > +     char line[1024];
> > > > > > > +     bool kernel_found = false;
> > > > > > > +     char *kernel_arg;
> > > > > > > +     char *kernel_end;
> > > > > > > +
> > > > > > > +     /* Read /proc/cmdline to get the current kernel image */
> > > > > > > +     cmdline_file = fopen(PROC_CMDLINE, "r");
> > > > > > > +     igt_require_f(cmdline_file, "Failed to open /proc/cmdline");
> > > > > > > +
> > > > > > > +     if (!fgets(cmdline, sizeof(cmdline), cmdline_file)) {
> > > > > > > +             fclose(cmdline_file);
> > > > > > > +             igt_require_f(0, "Failed to read /proc/cmdline");
> > > > > > > +     }
> > > > > > > +     fclose(cmdline_file);
> > > > > > > +
> > > > > > > +     /* Parse the kernel image from cmdline */
> > > > > > > +     kernel_arg = strstr(cmdline, "BOOT_IMAGE=");
> > > > > > > +     igt_require_f(kernel_arg, "BOOT_IMAGE= not found in /proc/cmdline\n");
> > > > > > > +
> > > > > > > +     kernel_arg += strlen("BOOT_IMAGE=");
> > > > > > > +     kernel_end = strchr(kernel_arg, ' ');
> > > > > > > +
> > > > > > > +     if (!kernel_end)
> > > > > > > +             kernel_end = kernel_arg + strlen(kernel_arg);
> > > > > > > +
> > > > > > > +     snprintf(current_kernel, sizeof(current_kernel), "%.*s",
> > > > > > > +              (int)(kernel_end - kernel_arg), kernel_arg);
> > > > > > > +     igt_debug("Current kernel image: %s\n", current_kernel);
> > > > > > > +
> > > > > > > +     /* Open GRUB config file to find matching entry */
> > > > > > > +     grub_cfg = fopen(GRUB_CFG_PATH, "r");
> > > > > > > +     igt_require_f(grub_cfg, "Failed to open GRUB configuration file");
> > > > > > > +
> > > > > > > +
> > > > > > > +     while (fgets(line, sizeof(line), grub_cfg)) {
> > > > > > > +             /* Check if the line contains a menuentry */
> > > > > > > +             if (strstr(line, "menuentry")) {
> > > > > > > +             /* Store the menuentry line */
> > > > > > > +                     char *start = strchr(line, '\'');
> > > > > > > +                     char *end = start ? strchr(start + 1, '\'') : NULL;
> > > > > > > +
> > > > > > > +                     if (start && end) {
> > > > > > > +                             snprintf(last_menuentry,
> > > > > > > +                                      sizeof(last_menuentry),
> > > > > > > +                                      "%.*s", (int)(end - start - 1),
> > > > > > > +                                      start + 1);
> > > > > > > +                     }
> > > > > > > +             }
> > > > > > > +
> > > > > > > +             /* Check if the current line contains the kernel */
> > > > > > > +             if (strstr(line, current_kernel)) {
> > > > > > > +                     /* Use the last seen menuentry as the match */
> > > > > > > +                     snprintf(grub_entry, sizeof(grub_entry), "%s",
> > > > > > > +                              last_menuentry);
> > > > > > > +                     kernel_found = true;
> > > > > > > +                     break;
> > > > > > > +             }
> > > > > > > +     }
> > > > > > > +
> > > > > > > +     fclose(grub_cfg);
> > > > > > > +
> > > > > > > +     igt_require_f(kernel_found, "Failed to find matching GRUB entry for kernel: %s\n",
> > > > > > > +                   current_kernel);
> > > > > > > +
> > > > > > > +     /* Set the GRUB boot target using grub-reboot */
> > > > > > > +     snprintf(command, sizeof(command), "grub-reboot \"%s\"", grub_entry);
> > > > > > > +     igt_require_f(system(command) == 0, "Failed to set GRUB boot target to: %s\n",
> > > > > > > +                   grub_entry);
> > > > > > > +
> > > > > > > +     igt_debug("Set GRUB to boot kernel: %s (GRUB entry: %s)\n",
> > > > > > > +               current_kernel, grub_entry);
> > > > > > > +}
> > > > > > > +
> > > > > > > +static void hibernate_autoresume(int resume_delay)
> > > > > > > +{
> > > > > > > +     check_hibernation_support();
> > > > > > > +     set_rtc_wake_timer(resume_delay);
> > > > > > > +     ensure_grub_boots_same_kernel();
> > > > > > > +     hibernate_system();
> > > > > > > +}
> > > > > > > +
> > > > > > >  static void addfb_init(struct igt_fb *fb, struct drm_mode_fb_cmd2 *f)
> > > > > > >  {
> > > > > > >       int i;
> > > > > > > @@ -839,8 +988,11 @@ static bool try_config(data_t *data, enum test_fb_flags fb_flags,
> > > > > > >
> > > > > > >       if (ret == 0 && !(fb_flags & TEST_BAD_ROTATION_90) && crc) {
> > > > > > >               if (data->flags & TEST_SUSPEND && fb_flags & FB_COMPRESSED) {
> > > > > > > -                     igt_system_suspend_autoresume(SUSPEND_STATE_MEM,
> > > > > > > -                                                   SUSPEND_TEST_NONE);
> > > > > > > +                     if (data->do_hibernate)
> > > > > > > +                             hibernate_autoresume(30);
> > > > > > > +                     else
> > > > > > > +                             igt_system_suspend_autoresume(SUSPEND_STATE_MEM,
> > > > > > > +                                                           SUSPEND_TEST_NONE);
> > > > > > >
> > > > > > >                       /* on resume check flat ccs is still compressed */
> > > > > > >                       if (is_xe_device(data->drm_fd) &&
> > > > > > > @@ -1044,6 +1196,9 @@ static int opt_handler(int opt, int opt_index, void *opt_data)
> > > > > > >               data->user_seed = true;
> > > > > > >               data->seed = strtoul(optarg, NULL, 0);
> > > > > > >               break;
> > > > > > > +     case 'r':
> > > > > > > +             data->do_hibernate = true;
> > > > > > > +             break;
> > > > > > >       default:
> > > > > > >               return IGT_OPT_HANDLER_ERROR;
> > > > > > >       }
> > > > > > > @@ -1056,9 +1211,10 @@ static data_t data;
> > > > > > >  static const char *help_str =
> > > > > > >  "  -c\t\tCheck the presence of compression meta-data\n"
> > > > > > >  "  -s <seed>\tSeed for random number generator\n"
> > > > > > > +"  -r\t\tOn suspend test do full hibernate with reboot\n"
> > > > > > >  ;
> > > > > > >
> > > > > > > -igt_main_args("cs:", NULL, help_str, opt_handler, &data)
> > > > > > > +igt_main_args("csr:", NULL, help_str, opt_handler, &data)
> > > > > > >  {
> > > > > > >       igt_fixture {
> > > > > > >               data.drm_fd = drm_open_driver_master(DRIVER_INTEL | DRIVER_XE);
> > > > > > > --
> > > > > > > 2.45.2
> > > > > > >


More information about the igt-dev mailing list