[PATCH 3/3] drm/v3d: Add job timeout module param

Yukimasa Sugizaki ysugi at idein.jp
Fri Feb 5 12:28:19 UTC 2021


On 05/02/2021, Eric Anholt <eric at anholt.net> wrote:
> On Thu, Feb 4, 2021 at 10:09 AM Chema Casanova <jmcasanova at igalia.com>
> wrote:
>>
>> On 3/9/20 18:48, Yukimasa Sugizaki wrote:
>> > From: Yukimasa Sugizaki <ysugi at idein.jp>
>> >
>> > The default timeout is 500 ms which is too short for some workloads
>> > including Piglit.  Adding this parameter will help users to run heavier
>> > tasks.
>> >
>> > Signed-off-by: Yukimasa Sugizaki <ysugi at idein.jp>
>> > ---
>> >   drivers/gpu/drm/v3d/v3d_sched.c | 24 +++++++++++++-----------
>> >   1 file changed, 13 insertions(+), 11 deletions(-)
>> >
>> > diff --git a/drivers/gpu/drm/v3d/v3d_sched.c
>> > b/drivers/gpu/drm/v3d/v3d_sched.c
>> > index feef0c749e68..983efb018560 100644
>> > --- a/drivers/gpu/drm/v3d/v3d_sched.c
>> > +++ b/drivers/gpu/drm/v3d/v3d_sched.c
>> > @@ -19,11 +19,17 @@
>> >    */
>> >
>> >   #include <linux/kthread.h>
>> > +#include <linux/moduleparam.h>
>> >
>> >   #include "v3d_drv.h"
>> >   #include "v3d_regs.h"
>> >   #include "v3d_trace.h"
>> >
>> > +static uint timeout = 500;
>> > +module_param(timeout, uint, 0444);
>> > +MODULE_PARM_DESC(timeout,
>> > +     "Timeout for a job in ms (0 means infinity and default is 500
>> > ms)");
>> > +
>>
>> I think that  parameter definition could be included at v3d_drv.c as
>> other drivers do. Then we would have all future parameters together. In
>> that case we would need also to include the extern definition at
>> v3d_drv.h for the driver variable.
>>
>> The param name could be more descriptive like "sched_timeout_ms" in the
>> lima driver.
>>
>> I wonder if it would make sense to have different timeouts parameters
>> for different type of jobs we have. At least this series come from the
>> need additional time to complete compute jobs. This is what amdgpu does,
>> but we probably don't need something such complex.
>>
>> I think it would also make sense to increase our default value for the
>> compute jobs.
>>
>> What do you think?
>>
>> In any case I think that having this general scheduling timeout
>> parameter as other drivers do is a good idea.

I agree with your observations.
I'm going to add bin_timeout_ms, render_timeout_ms, tfu_timeout_ms,
v3d_timeout_ms, and cache_clean_timeout_ms parameters to v3d_drv.c
instead of the sole timeout parameter.
Though not sophisticated, this should be enough.

> Having a param for being able to test if the job completes eventually
> is a good idea, but if tests are triggering timeouts, then you
> probably need to investigate what's going on in the driver -- it's not
> like you want .5second unpreemptible jobs to be easy to produce.
>
> Also, for CS, I wonder if we have another reg besides CSD_CURRENT_CFG4
> we could be looking at for "we're making progress on the job".  Half a
> second with no discernible progress sounds like a bug.

If binning/render/TFU/cache_clean jobs keep running over 0.5 seconds,
then it should be a bug because they normally complete processing
within the period.
However, a CSD job can do anything.
For example, SaschaWillems's ray tracing example requires about 0.7
seconds to compose a frame with compute shader with Igalia's Vulkan
driver.
Another example is our matrix computation library for QPU:
https://github.com/Idein/qmkl6
It requires about 1.1 seconds to multiply two 2048x2048 matrices.
Because in general we do not know what is the expected behavior of a
QPU program and what is not, we also cannot detect a progress the
program is making.
This is why we want to have the timeout parameters to be able to be modified.

Regards,
Y. Sugizaki.


More information about the dri-devel mailing list