[Intel-gfx] [PATCH 31/32] drm/i915/execlists: Virtual engine bonding

Thu Apr 18 09:50:30 UTC 2019

On 18/04/2019 10:13, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-04-18 09:57:43)
>>
>> On 18/04/2019 07:57, Chris Wilson wrote:
>>> Quoting Tvrtko Ursulin (2019-04-18 07:47:51)
>>>>
>>>> On 17/04/2019 08:56, Chris Wilson wrote:
>>>>> +static void
>>>>> +virtual_bond_execute(struct i915_request *rq, struct dma_fence *signal)
>>>>> +{
>>>>> +     struct virtual_engine *ve = to_virtual_engine(rq->engine);
>>>>> +     struct ve_bond *bond;
>>>>> +
>>>>> +     bond = virtual_find_bond(ve, to_request(signal)->engine);
>>>>> +     if (bond) {
>>>>> +             intel_engine_mask_t old, new, cmp;
>>>>> +
>>>>> +             cmp = READ_ONCE(rq->execution_mask);
>>>>> +             do {
>>>>> +                     old = cmp;
>>>>> +                     new = cmp & bond->sibling_mask;
>>>>> +             } while ((cmp = cmpxchg(&rq->execution_mask, old, new)) != old);
>>>>
>>>> Loop implies someone else might be modifying the rq->execution_mask in
>>>> parallel?
>>>
>>> There's nothing that prevents there being multiple bonds being
>>> executed simultaneously (other than practicality). There's also nothing
>>> that says this should be the only way to modify rq->execution_mask in
>>> the future.
>>
>> But request is one, how can it be submitted multiple times simultaneously?
> 
> You mean "How can it be signaled multiple times simultaneously?"

Okay yes, signaled. You could give same submit fence to multiple slaves, 
but you can't have same slave request receive notification from multiple 
masters.

Or you can if you build a composite fence and pass that in? Is this the 
story about signal-on-any vs signal-on-all?

>>
>>>>> +static int
>>>>> +set_engines__bond(struct i915_user_extension __user *base, void *data)
>>>>> +{
>>>>> +     struct i915_context_engines_bond __user *ext =
>>>>> +             container_of_user(base, typeof(*ext), base);
>>>>> +     const struct set_engines *set = data;
>>>>> +     struct intel_engine_cs *virtual;
>>>>> +     struct intel_engine_cs *master;
>>>>> +     u16 class, instance;
>>>>> +     u16 idx, num_bonds;
>>>>> +     int err, n;
>>>>> +
>>>>> +     if (get_user(idx, &ext->virtual_index))
>>>>> +             return -EFAULT;
>>>>> +
>>>>> +     if (idx >= set->engines->num_engines) {
>>>>> +             DRM_DEBUG("Invalid index for virtual engine: %d >= %d\n",
>>>>> +                       idx, set->engines->num_engines);
>>>>> +             return -EINVAL;
>>>>> +     }
>>>>> +
>>>>> +     idx = array_index_nospec(idx, set->engines->num_engines);
>>>>> +     if (!set->engines->engines[idx]) {
>>>>> +             DRM_DEBUG("Invalid engine at %d\n", idx);
>>>>> +             return -EINVAL;
>>>>> +     }
>>>>> +
>>>>> +     /*
>>>>> +      * A non-virtual engine has 0 siblings to choose between; and submit
>>>>> +      * fence will always be directed to the one engine.
>>>>> +      */
>>>>> +     virtual = set->engines->engines[idx]->engine;
>>>>> +     if (!intel_engine_is_virtual(virtual))
>>>>> +             return 0;
>>>>
>>>> Hmm wouldn't we strictly speaking need to distinguish between uAPI
>>>> errors and auto-magic-single-veng-replacement? Latter is OK to return
>>>> success, but former should be reported as -EINVAL I think.
>>>
>>> Is it a uAPI error if it works? :)
>>
>> It works but what is the practical use? It more signals userspace got
>> it's configuration wrong and if we silently accept it gets more
>> difficult to figure out.
> 
> At that point, I was being facetious. Memory says it was simpler to just
> stick the virtual check at the start than have to insert it later. But
> it's trivial to move later, so it's done.
> 
>>>>> +
>>>>> +     err = check_user_mbz(&ext->flags);
>>>>> +     if (err)
>>>>> +             return err;
>>>>> +
>>>>> +     for (n = 0; n < ARRAY_SIZE(ext->mbz64); n++) {
>>>>> +             err = check_user_mbz(&ext->mbz64[n]);
>>>>> +             if (err)
>>>>> +                     return err;
>>>>> +     }
>>>>> +
>>>>> +     if (get_user(class, &ext->master_class))
>>>>> +             return -EFAULT;
>>>>> +
>>>>> +     if (get_user(instance, &ext->master_instance))
>>>>> +             return -EFAULT;
>>>>> +
>>>>> +     master = intel_engine_lookup_user(set->ctx->i915, class, instance);
>>>>> +     if (!master) {
>>>>> +             DRM_DEBUG("Unrecognised master engine: { class:%d, instance:%d }\n",
>>>>> +                       class, instance);
>>>>> +             return -EINVAL;
>>>>> +     }
>>>>> +
>>>>> +     if (get_user(num_bonds, &ext->num_bonds))
>>>>> +             return -EFAULT;
>>>>
>>>> Should num_bonds > virtual->num_siblings be an error?
>>>
>>> They could specify the same bond multiple times for whatever reason (and
>>> probably should allow skipping NONE?), if the target doesn't exist that's
>>> definitely an error.
>>
>> So which bond we pick if they specify multiple ones? Just the first one
>> found. Hm actually I was thinking about making sure each master is only
>> specified once, not siblings. For siblings we indeed do not care.
> 
> No, it's a mask of if parent executes on master, use this set of
> children.
> 
> I was reasonably happy to use a cumulative mask if master is specified
> by more than one bond ext; but maybe it should be an intersection. Hmm.

Do you see a realistic and making sense use case for specifying the same 
master in multiple bonds? If not I'd just disallow it and then we don't 
have a question of union vs intersection policy.

Regards,

Tvrtko