[Intel-gfx] [PATCH 09/21] drm/i915/gem: Disallow creating contexts with too many engines

Tvrtko Ursulin tvrtko.ursulin at linux.intel.com
Fri Apr 30 11:40:01 UTC 2021


On 29/04/2021 20:16, Jason Ekstrand wrote:
> On Thu, Apr 29, 2021 at 3:01 AM Tvrtko Ursulin
> <tvrtko.ursulin at linux.intel.com> wrote:
>> On 28/04/2021 18:09, Jason Ekstrand wrote:
>>> On Wed, Apr 28, 2021 at 9:26 AM Tvrtko Ursulin
>>> <tvrtko.ursulin at linux.intel.com> wrote:
>>>> On 28/04/2021 15:02, Daniel Vetter wrote:
>>>>> On Wed, Apr 28, 2021 at 11:42:31AM +0100, Tvrtko Ursulin wrote:
>>>>>>
>>>>>> On 28/04/2021 11:16, Daniel Vetter wrote:
>>>>>>> On Fri, Apr 23, 2021 at 05:31:19PM -0500, Jason Ekstrand wrote:
>>>>>>>> There's no sense in allowing userspace to create more engines than it
>>>>>>>> can possibly access via execbuf.
>>>>>>>>
>>>>>>>> Signed-off-by: Jason Ekstrand <jason at jlekstrand.net>
>>>>>>>> ---
>>>>>>>>      drivers/gpu/drm/i915/gem/i915_gem_context.c | 7 +++----
>>>>>>>>      1 file changed, 3 insertions(+), 4 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
>>>>>>>> index 5f8d0faf783aa..ecb3bf5369857 100644
>>>>>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
>>>>>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
>>>>>>>> @@ -1640,11 +1640,10 @@ set_engines(struct i915_gem_context *ctx,
>>>>>>>>                      return -EINVAL;
>>>>>>>>              }
>>>>>>>> -  /*
>>>>>>>> -   * Note that I915_EXEC_RING_MASK limits execbuf to only using the
>>>>>>>> -   * first 64 engines defined here.
>>>>>>>> -   */
>>>>>>>>              num_engines = (args->size - sizeof(*user)) / sizeof(*user->engines);
>>>>>>>
>>>>>>> Maybe add a comment like /* RING_MASK has not shift, so can be used
>>>>>>> directly here */ since I had to check that :-)
>>>>>>>
>>>>>>> Same story about igt testcases needed, just to be sure.
>>>>>>>
>>>>>>> Reviewed-by: Daniel Vetter <daniel.vetter at ffwll.ch>
>>>>>>
>>>>>> I am not sure about the churn vs benefit ratio here. There are also patches
>>>>>> which extend the engine selection field in execbuf2 over the unused
>>>>>> constants bits (with an explicit flag). So churn upstream and churn in
>>>>>> internal (if interesting) for not much benefit.
>>>>>
>>>>> This isn't churn.
>>>>>
>>>>> This is "lock done uapi properly".
>>>
>>> Pretty much.
>>
>> Still haven't heard what concrete problems it solves.
>>
>>>> IMO it is a "meh" patch. Doesn't fix any problems and will create work
>>>> for other people and man hours spent which no one will ever properly
>>>> account against.
>>>>
>>>> Number of contexts in the engine map should not really be tied to
>>>> execbuf2. As is demonstrated by the incoming work to address more than
>>>> 63 engines, either as an extension to execbuf2 or future execbuf3.
>>>
>>> Which userspace driver has requested more than 64 engines in a single context?
>>
>> No need to artificially limit hardware capabilities in the uapi by
>> implementing a policy in the kernel. Which will need to be
>> removed/changed shortly anyway. This particular patch is work and
>> creates more work (which other people who will get to fix the fallout
>> will spend man hours to figure out what and why broke) for no benefit.
>> Or you are yet to explain what the benefit is in concrete terms.
> 
> You keep complaining about how much work it takes and yet I've spent
> more time replying to your e-mails on this patch than I spent writing
> the patch and the IGT test.  Also, if it takes so much time to add a
> restriction, then why are we spending time figuring out how to modify
> the uAPI to allow you to execbuf on a context with more than 64
> engines?  If we're worried about engineering man-hours, then limiting
> to 64 IS the pragmatic solution.

a)

Question of what problem does the patch fix is still unanswered.

b)

You miss the point. I'll continue in the next paragraph..

> 
>> Why don't you limit it to number of physical engines then? Why don't you
>> filter out duplicates? Why not limit the number of buffer objects per
>> client or global based on available RAM + swap relative to minimum
>> object size? Reductio ad absurdum yes, but illustrating the, in this
>> case, a thin line between "locking down uapi" and adding too much policy
>> where it is not appropriate.
> 
> All this patch does is say that  you're not allowed to create a
> context with more engines than the execbuf API will let you use.  We
> already have an artificial limit.  All this does is push the error
> handling further up the stack.  If someone comes up with a mechanism
> to execbuf on engine 65 (they'd better have an open-source user if it
> involves changing API), I'm very happy for them to bump this limit at
> the same time.  It'll take them 5 minutes and it'll be something they
> find while writing the IGT test.

.. no it won't take five minutes.

If I need to spell everything out - you will put this patch in, which 
fixes nothing, and it will propagate to the internal kernel at some 
point. Then a bunch of tests will start failing in a strange manner. 
Which will result in people triaging them, then assigning them, then 
reserving machines, setting them up, running the repro, then digging 
into the code, and eventually figuring out what happened.

It will take hours not five minutes. And there will likely be multiple 
bug reports which most likely won't be joined so mutliple people will be 
doing multi hour debug. All for nothing. So it is rather uninteresting 
how small the change is. Interesting part is how much pointless effort 
it will create across the organisation.

Of course you may not care that much about that side of things, or you 
are just not familiar in how it works in practice since you haven't been 
involved in the past years. I don't know really, but I have to raise the 
point it makes no sense to do this. Cost vs benefit is simply not nearly 
there.

>>> Also, for execbuf3, I'd like to get rid of contexts entirely and have
>>> engines be their own userspace-visible object.  If we go this
>>> direction, you can have UINT32_MAX of them.  Problem solved.
>>
>> Not the problem I am pointing at though.
> 
> You listed two ways that accessing engine 65 can happen: Extending
> execbuf2 and adding a new execbuf3.  When/if execbuf3 happens, as I
> pointed out above, it'll hopefully be a non-issue.  If someone extends
> execbuf2 to support more than 64 engines and does not have a userspace
> customer that wants said new API change, I will NAK the patch.  If
> you've got a 3rd way that someone can get at engine 65 such that this
> is a problem, I'd love to hear about it.

It's ever so easy to take a black and white stance but the world is more 
like shades of grey. I too am totally perplexed why we have to spend 
time arguing on a inconsequential patch.

Context create is not called "create execbuf2 context" so why be so 
wedded to adding execbuf2 restrictions into it I have no idea. If you 
were fixing some vulnerability or something I'd understand but all I've 
heard so far is along the lines of "This is proper locking down of uapi 
- end of". And endless waste of time discussion follows. We don't have 
to agree on everything anyway and I have raised my concern enough times 
now. Up to you guys to re-figure out the cost benefit on your own then.

Regards,

Tvrtko


More information about the Intel-gfx mailing list