[Intel-gfx] [igt-dev] [PATCH i-g-t] tests/perf_pmu: Handle CPU hotplug failures better
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Fri Feb 23 14:20:59 UTC 2018
On 23/02/2018 11:58, Petri Latvala wrote:
> On Fri, Feb 23, 2018 at 11:34:53AM +0000, Tvrtko Ursulin wrote:
>> From: Chris Wilson <chris at chris-wilson.co.uk>
>>
>> CPU hotplug, especially CPU0, can be flaky on commodity hardware.
>>
>> To improve test reliability and reponse times when testing larger runs we
>> need to handle those cases better.
>>
>> Handle failures to off-line a CPU by immediately skipping the test, and
>> failures to on-line a CPU by immediately rebooting the machine.
>>
>> This patch includes igt_sysrq_reboot implementation from Chris Wilson.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>> Cc: Chris Wilson <chris at chris-wilson.co.uk>
>> ---
>> lib/Makefile.sources | 2 ++
>> lib/igt_sysrq.c | 22 ++++++++++++++++++++++
>> lib/igt_sysrq.h | 30 ++++++++++++++++++++++++++++++
>> lib/meson.build | 2 ++
>> tests/perf_pmu.c | 42 ++++++++++++++++++++++++++++++++++--------
>> 5 files changed, 90 insertions(+), 8 deletions(-)
>> create mode 100644 lib/igt_sysrq.c
>> create mode 100644 lib/igt_sysrq.h
>>
>> diff --git a/lib/Makefile.sources b/lib/Makefile.sources
>> index 5b13ef8896c0..3d37ef1d1984 100644
>> --- a/lib/Makefile.sources
>> +++ b/lib/Makefile.sources
>> @@ -35,6 +35,8 @@ lib_source_list = \
>> igt_stats.h \
>> igt_sysfs.c \
>> igt_sysfs.h \
>> + igt_sysrq.c \
>> + igt_sysrq.h \
>> igt_x86.h \
>> igt_x86.c \
>> igt_vgem.c \
>> diff --git a/lib/igt_sysrq.c b/lib/igt_sysrq.c
>> new file mode 100644
>> index 000000000000..fe3d2e344ff1
>> --- /dev/null
>> +++ b/lib/igt_sysrq.c
>> @@ -0,0 +1,22 @@
>> +#include <unistd.h>
>> +#include <fcntl.h>
>> +#include <stdlib.h>
>> +#include <sys/reboot.h>
>> +
>> +#include "igt_core.h"
>> +
>> +#include "igt_sysrq.h"
>> +
>> +void igt_sysrq_reboot(void)
>> +{
>> + sync();
>> +
>> + /* Try to be nice at first, and if that fails pull the trigger */
>> + if (reboot(RB_AUTOBOOT)) {
>> + int fd = open("/proc/sysrq-trigger", O_WRONLY);
>> + igt_ignore_warn(write(fd, "b", 2));
>> + close(fd);
>> + }
>> +
>> + abort();
>> +}
>
>
> While the cause for taking this action might be dire, rebooting
> people's machines can be kind of a dick move, even considering they're
> running tests that can be fatal to the machine in other ways.
>
> We have IGT_HANG and IGT_HANG_WITHOUT_RESET so the users can opt
> in/out of some fatal behaviour already. I'm fine with auto-rebooting,
> even as the default, if users can opt out of it with
> IGT_NO_REBOOT_PRETTY_PLEASE or so.
I am fine with something like that. Just lets define how to call the env
variable and what the default should be?
Do we have a return code from a test which stops the test runner?
I am thinking that the best approach would be not to reboot but to halt
testing, unless this environment option is set.
But then it is up to CI people to say if they want to be setting this
option across all systems, or would actually prefer to reboot by default.
Regards,
Tvrtko
More information about the Intel-gfx
mailing list