[Intel-gfx] [PATCH 1/3] drm/i915: Limit C-states when waiting for the active request

Fri Aug 3 10:48:44 UTC 2018

On 30/07/2018 16:25, Chris Wilson wrote:
> If we are waiting for the currently executing request, we have a good
> idea that it will be completed in the very near future and so want to
> cap the CPU_DMA_LATENCY to ensure that we wake up the client quickly.

Maybe, but I have a feeling we shouldn't assume what the userspace 
wants. On the other hand "seqno - 1" guard alleviates some of my 
concerns, just not sure if all.

> v2: Not allowed to block in kmalloc after setting TASK_INTERRUPTIBLE.
> v3: Avoid the blocking notifier as well for TASK_INTERRUPTIBLE
> v4: Beautification?
> v5: And ignore the preemptibility of queue_work before schedule.
> 
> Testcase: igt/gem_sync/store-default
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin at linux.intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> Cc: Eero Tamminen <eero.t.tamminen at intel.com>
> Cc: Francisco Jerez <currojerez at riseup.net>
> Cc: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> ---
>   drivers/gpu/drm/i915/i915_request.c | 52 +++++++++++++++++++++++++++++
>   1 file changed, 52 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 5c2c93cbab12..f3ff8dbe363d 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -1258,6 +1258,51 @@ static bool __i915_wait_request_check_and_reset(struct i915_request *request)
>   	return true;
>   }
>   
> +struct wait_dma_qos {
> +	struct pm_qos_request req;
> +	struct work_struct add, del;
> +};
> +
> +static void __wait_dma_qos_add(struct work_struct *work)
> +{
> +	struct wait_dma_qos *qos = container_of(work, typeof(*qos), add);
> +
> +	pm_qos_add_request(&qos->req, PM_QOS_CPU_DMA_LATENCY, 50);
> +}
> +
> +static void __wait_dma_qos_del(struct work_struct *work)
> +{
> +	struct wait_dma_qos *qos = container_of(work, typeof(*qos), del);
> +
> +	if (!cancel_work_sync(&qos->add))
> +		pm_qos_remove_request(&qos->req);
> +
> +	kfree(qos);
> +}
> +
> +static struct wait_dma_qos *wait_dma_qos_add(void)
> +{
> +	struct wait_dma_qos *qos;
> +
> +	/* Called under TASK_INTERRUPTIBLE, so not allowed to sleep/block. */
> +	qos = kzalloc(sizeof(*qos), GFP_NOWAIT | __GFP_NOWARN);

Is it too big to put on the stack in i915_request_wait? Looks like that 
would be simpler.

> +	if (!qos)
> +		return NULL;
> +
> +	INIT_WORK(&qos->add, __wait_dma_qos_add);
> +	INIT_WORK(&qos->del, __wait_dma_qos_del);
> +	schedule_work_on(raw_smp_processor_id(), &qos->add);

Do we want to use the highpri wq? But in any case we do have a worker 
latency here, which may completely defeat the 50us QoS request. :(

Also, do you need to specify the CPU manually or is that in fact 
detrimental to the worker running ASAP? AFAIU this makes the worker only 
be able to start once we go to sleep, with potentially other stuff in 
there preceding our work item.

> +
> +	return qos;
> +}
> +
> +static void wait_dma_qos_del(struct wait_dma_qos *qos)
> +{
> +	/* Defer to worker so not incur extra latency for our woken client. */
> +	if (qos)
> +		schedule_work(&qos->del);
> +}
> +
>   /**
>    * i915_request_wait - wait until execution of request has finished
>    * @rq: the request to wait upon
> @@ -1286,6 +1331,7 @@ long i915_request_wait(struct i915_request *rq,
>   	wait_queue_head_t *errq = &rq->i915->gpu_error.wait_queue;
>   	DEFINE_WAIT_FUNC(reset, default_wake_function);
>   	DEFINE_WAIT_FUNC(exec, default_wake_function);
> +	struct wait_dma_qos *qos = NULL;
>   	struct intel_wait wait;
>   
>   	might_sleep();
> @@ -1363,6 +1409,11 @@ long i915_request_wait(struct i915_request *rq,
>   			break;
>   		}
>   
> +		if (!qos &&
> +		    i915_seqno_passed(intel_engine_get_seqno(rq->engine),
> +				      wait.seqno - 1))
> +			qos = wait_dma_qos_add();
> +
>   		timeout = io_schedule_timeout(timeout);
>   
>   		if (intel_wait_complete(&wait) &&
> @@ -1412,6 +1463,7 @@ long i915_request_wait(struct i915_request *rq,
>   	if (flags & I915_WAIT_LOCKED)
>   		remove_wait_queue(errq, &reset);
>   	remove_wait_queue(&rq->execute, &exec);
> +	wait_dma_qos_del(qos);
>   	trace_i915_request_wait_end(rq);
>   
>   	return timeout;
> 

Another thing we talked about on IRC is a potential to introduce an 
explicit low-latency flag to gem_wait ioctl. That would punt the 
responsibility to userspace to know if it cares, but on the other hand 
if some benchmark benefit from implicit setting that could be tempting. 
You said media-bench likes it so I'll try it.

Explicit request would also simplify the code by removing the need for 
workers. And remove the worker latency from the request which may be the 
most attractive benefit.

And also could apply the QoS request to before the seqno assignment. 
Currently I think there is a small window where wait can race with it, 
and fall into high-latency sleep, even if later it would chose to 
request low-latency.

Regards,

Tvrtko