[RFC v4 i-g-t] tests/intel/xe_sec_exec_queue_timeslice: Timeslice Abuse on Exec Queues
Michał Winiarski
michal.winiarski at intel.com
Wed May 14 21:26:28 UTC 2025
On Thu, May 08, 2025 at 10:51:43AM +0200, Peter Senna Tschudin wrote:
> The objective is to test the behavior of the GPU scheduler under
> conditions where one execution queue ("attacker") is configured with an
> abnormally large timeslice, potentially disrupting the normal execution
> of another queue ("attacked"). The explicit attack point is the ioctl
> DRM_XE_EXEC_QUEUE_EXTENSION_SET_PROPERTY.
>
> This disables drm and xe logging to prevent it from slowing down and
> serializing the tests using igt_drm_debug_level_update() that installs
> an exit handler to restore the logging to its previous value.
>
> This RFC implements the first step that is reading timeslice values for
> each engine and run the following tests on timeslice limits:
> - Boundary Value Analysis: Focuses on testing at the boundaries of
> input values with values such as max - 1, max, max + 1.
That should already be covered by xe_exec_queue_property test, right?
> - Equivalence Partitioning: Divide values into partitions where all
> values behave similarly and test one value from each partition.
I don't understand this one - but looking at actual implementation, it
looks pretty similar to boundary value analysis (except with min - 1,
min).
> - Fuzz Testing: Basic fuzz tester for
> - Large numbers
> - Empty value
> - '\0'
But those are all constants. And empty value is just 0 :)
I fail to see what we're "fuzzing" by trying same constants multiple
times (again looks more like invalid values, similar to boundary value
checks).
> - 1M u64 random values
> - Fuzz stress test: Create 50k threads for the fuzzing test with 500
> random numbers each taking about 30 seconds to run
Perhaps we can extend the xe_exec_queue_property test? If we want to go
beyond boundary testing? We could then hit other properties as well.
>
> The proposed steps are (this patch goes until 2a):
>
> 1. Determine the values for the following parameters:
> - `timeslice_duration_us`: Default timeslice duration in
> microseconds.
> - `timeslice_duration_min`: Minimum allowable timeslice duration.
> - `timeslice_duration_max`: Maximum allowable timeslice duration.
>
> 2. Create two execution queues with the following configurations:
> - `attacked`: Queue with standard/default settings.
> - `attacker`: Queue configured with an extended timeslice
> duration. The goal is to:
> a) Try to set timeslice to invalid values. This is
> expected to fail.
>
> b) Create the attacker queue setting the timeslice to
> `timeslice_duration_max`.
>
> 3. Submit tasks to both queues:
> - Submit a workload to the `attacked` queue with normal
> operations.
> - Submit a workload to the `attacker` queue designed to
> maximize its timeslice and potentially disrupt the GPU
> scheduler.
>
> 4. Verify the behavior of the `attacked` queue:
> - Ensure that tasks in the `attacked` queue execute within the
> expected time constraints and are not delayed or blocked due
> to the extended timeslice of the `attacker` queue.
> - Specifically, confirm that tasks in the `attacked` queue do
> not exceed `timeslice_duration_max` in terms of execution
> delays or interruptions.
I think we should separate functional timeslice testing (which is
verifying whether Xe driver is indeed respecting what was set as a valid
timeslice by the user, and is a proposed future extension of this test),
from checking uAPI abuse attempts (which is what this test is currently
doing).
Thanks,
-Michał
More information about the igt-dev
mailing list