[Intel-gfx] [PATCH v7 5/9] drm/i915: vgpu context submission pv optimization

Mon Jul 8 10:40:51 UTC 2019

Quoting Xiaolin Zhang (2019-07-08 02:35:18)
> +static void pv_submit(struct intel_engine_cs *engine,
> +              struct i915_request **out,
> +              struct i915_request **end)
> +{
> +       struct intel_engine_execlists * const execlists = &engine->execlists;
> +       struct i915_virtual_gpu_pv *pv = engine->i915->vgpu.pv;
> +       struct pv_submission *pv_elsp = pv->pv_elsp[engine->hw_id];
> +       struct i915_request *rq;
> +
> +       u64 descs[2];
> +       int n, err;
> +
> +       memset(descs, 0, sizeof(descs));
> +       n = 0;
> +       do {
> +               rq = *out++;
> +               descs[n++] = execlists_update_context(rq);
> +       } while (out != end);
> +
> +       for (n = 0; n < execlists_num_ports(execlists); n++)
> +               pv_elsp->descs[n] = descs[n];

You can polish this a bit, minor nit.

> +       writel(PV_ACTION_ELSP_SUBMISSION, execlists->submit_reg);
> +
> +#define done (READ_ONCE(pv_elsp->submitted) == true)
> +       err = wait_for_us(done, 1);
> +       if (err)
> +               err = wait_for(done, 1);

Strictly, you need to use wait_for_atomic_us() [under a spinlock here],
and there's no need for a second pass since you are not allowed to sleep
anyway. So just set the timeout to 1000us.

> +#undef done
> +
> +       if (unlikely(err))
> +               DRM_ERROR("PV (%s) workload submission failed\n", engine->name);
> +
> +       pv_elsp->submitted = false;

However, that looks solid wrt to serialisation of this engine with its
pv host, without cross-interference (at least in the comms channel).

If you want to get fancy, you should be able to simply not dequeue until
!pv_elsp->submitted so the wait-for-ack occurs naturally. So long as the
pv host plays nicely, we should always see submitted acked before the
request is signaled. Give or take problems with preemption and the pv
host being a black box that may allow requests to complete and so our
submission be a no-op (and so not generate an interrupt to allow further
submission). Indeed, I would strongly recommend you use the delayed ack
plus jiffie timer to avoid the no-op submission problem.

If you want to prove this in a bunch of mocked up selftests that provide
the pv channel ontop of the local driver....
-Chris