[igt-dev] [RFC PATCH v10 2/6] lib/i915: add gem_engine_topology library

Thu Mar 7 16:25:35 UTC 2019

On 07/03/2019 14:59, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-03-07 14:16:13)
>>
>> On 07/03/2019 13:42, Andi Shyti wrote:
>>> Hi Tvrtko,
>>>
>>>>> +#include "drmtest.h"
>>>>> +#include "igt_gt.h"
>>>>> +#include "ioctl_wrappers.h"
>>>>> +
>>>>> +#include "i915/gem_engine_topology.h"
>>>>> +
>>>>> +#define SIZEOF_CTX_PARAM   offsetof(struct i915_context_param_engines, \
>>>>> +                                   class_instance[I915_EXEC_RING_MASK + 1])
>>>>> +#define SIZEOF_QUERY               offsetof(struct drm_i915_query_engine_info, \
>>>>> +                                   engines[I915_EXEC_RING_MASK + 1])
>>>>
>>>> There is no relationship between I915_EXEC_RING_MASK and the number of
>>>> engines we can query. I'll suggest how I think it should work a bit later.
>>>
>>> I915_EXEC_RING_MASK is the upper limit checked in the driver,
>>> that's why I used that one. I could have hardcoded it to 64, but
>>> I wanted to keep it consistent to the driver.
>>
>> I'll argue with Chris to remove this limit from the engine map (come eb3
>> might not be relevant any more) - nevertheless the query does not have
>> this limit so I think it would be best to decouple the code from the start.
> 
> At the moment it fits in a u64 and I don't have to worry about flexible
> bitmaps. It's a bridge to cross later, we've plenty of space in the enum
> api to version a new interface.

Load balancing mask? Or engine->mask? Is the latter relevant when not 
load balancing?

>>
>>>>> +
>>>>> +static struct intel_execution_engine2 *intel_active_engines2;
>>>>> +
>>>>> +int __gem_query(int fd, struct drm_i915_query *q)
>>>>> +{
>>>>> +   int err = 0;
>>>>> +
>>>>> +   if (igt_ioctl(fd, DRM_IOCTL_I915_QUERY, q))
>>>>> +           err = -errno;
>>>>> +
>>>>> +   errno = 0;
>>>>> +   return err;
>>>>> +}
>>>>> +
>>>>> +void gem_query(int fd, struct drm_i915_query *q)
>>>>> +{
>>>>> +   igt_assert_eq(__gem_query(fd, q), 0);
>>>>> +}
>>>>
>>>> I would keep these two static if there are no external callers for now.
>>>
>>> Yes, I can make them static, I thought someone might need them in
>>> the future.
>>
>> Could be but it will be easy to export them at that point. I wanted to
>> keep the patch smallest possible for now so I can focus on reviewing the
>> important bits.
>>
>>>
>>>>> +
>>>>> +static void query_engines(int fd,
>>>>> +                     struct drm_i915_query_engine_info *query_engines)
>>>>> +{
>>>>> +   struct drm_i915_query_item item = { };
>>>>> +   struct drm_i915_query query = { };
>>>>> +
>>>>> +   item.query_id = DRM_I915_QUERY_ENGINE_INFO;
>>>>> +   query.items_ptr = to_user_pointer(&item);
>>>>> +   query.num_items = 1;
>>>>> +   item.length = SIZEOF_QUERY;
>>>>
>>>> I think making this function take buffer size and return the queried size
>>>> would be simplest.
>>>
>>> I haven't really understood, do you mean doing something like:
>>>
>>> static int query_engines(...., int size)
>>> {
>>>      [...]
>>>      return size;
>>> }
>>>
>>> or calling the gem_query twice, first time for getting the size
>>> and second time for allocating the necessary resources.
>>>
>>> Because in the first case, we have the size in query_engines, is
>>> that right?
>>>
>>> While if you mean the second case, this is how I did it at the
>>> very beginning, but then with Chris we agreed that 64
>>> (I915_EXEC_RING_MASK) is "big enough". If you are strong with it,
>>> I can revert it back.
>>
>> Hm.. what happened to the old Chris!? :)) I am not objecting that
>> strongly. Especially if you already changed it once then just leave it.
>> It can be changed later if someone gets the desire to tidy..
> 
> If I recall, my position is that you can have a fixed stack and then
> fallback to a heap. But for these interfaces where we have to return a
> heap pointer, we might as well use heaps and be flexible.
> 
>>
>>>>> +
>>>>> +   item.data_ptr = to_user_pointer(query_engines);
>>>>> +
>>>>> +   gem_query(fd, &query);
>>>>> +}
>>>>> +
>>>>> +bool gem_has_engine_topology(void)
>>>>> +{
>>>>> +   return intel_active_engines2 != intel_execution_engines2;
>>>>
>>>> A problem for the exported API if init_engine_list hasn't been called yet?
>>>> Maybe:
>>>>
>>>>       if (!intel_active_engines2)
>>>>               init_engine_list(fd);
>>>>
>>>>       return intel_active_engines2 != intel_execution_engines2;
>>>>
>>>> ?
>>>
>>> Yes, it's indeed prone to be misused, I would need to make a more
>>> generic way, because the two lines
>>>
>>>      if (!intel_active_engines2)
>>>             init_engine_list(fd);
>>>
>>> are called so many times that I'm having an overdose of them :)
>>
>> Yes the call chains and responsibilities do feel a bit unobvious.
> 
> I really don't want init_engine_list(fd) at all :) Ideally the query
> interface is cheap enough that it doesn't matter (other than debug noise
> in strace) and if we can't make it that cheap, it is cached in sensible
> scopes.

Maybe it would indeed be more future proof to enumerate every time 
before iterating, but we also don't necessarily have to do that right 
now. For instance I think the problem of test enumeration will likely 
come back to trouble us as long as we are not allowed to enumerate 
dynamically.

> That may be an igt_device class (i915_device subclass).

For something that needs to be delivered next week we have to be 
pragmatical and keep the wish list in check. :)

>>
>>>
>>>>> +}
>>>>> +
>>>>> +static void set_ctx_param_engines(int fd, uint32_t ctx_id)
>>>>> +{
>>>>> +   struct i915_context_param_engines *ctx_engine;
>>>>> +   struct drm_i915_gem_context_param ctx_param;
>>>>> +   const struct intel_execution_engine2 *e2;
>>>>> +   uint8_t buff[SIZEOF_CTX_PARAM] = { };
>>>>> +   int i;
>>>>> +
>>>>> +   if (!gem_has_engine_topology())
>>>>> +           return;
>>>>
>>>> AFAICS this can be assert here.
>>>
>>> I think this check is redundant as it's never going to happen.
>>> This function is 'static' and the calling function checks that
>>> already. I will remove it.
>>
>> Better to have it as assert but as you wish. I guess even without an
>> assert it would segfault when it tries to iterate intel_active_engines2.
>>
>>>
>>>>> +
>>>>> +   ctx_engine = (struct i915_context_param_engines *) buff;
>>>>> +
>>>>> +   ctx_param.ctx_id = ctx_id;
>>>>> +   ctx_param.param = I915_CONTEXT_PARAM_ENGINES;
>>>>> +
>>>>> +   ctx_engine->extensions = 0;
>>>>> +   for (i = 0, e2 = intel_active_engines2; e2->name; i++, e2++) {
>>>>
>>>> Here I think is the place where we check that the total number of engines
>>>> does not exceed the engine map execbuf index.
>>>>
>>>>       igt_require(i <= I915_EXEC_RING_MASK);
>>>>
>>>> ?
>>>
>>> I can add that, but it would be redundant as the list would have
>>> never been created otherwise... anyway, better be safe than sorry :)
>>
>> Yes I know, I wrote that because I was suggesting to remove the 64
>> engine limit from the query and just have it in the engine map.
>>
>> But as said if you already discussed that point then leave it.
>>
>>> BTW, given my natural confusion between 'require' and 'assert',
>>> you meant 'assert', right?
>>
>> I meant require. But I don't know either, it's so hard to choose
>> sometimes. :)
>>
>>>>> +           ctx_engine->class_instance[i].engine_class = e2->class;
>>>>> +           ctx_engine->class_instance[i].engine_instance = e2->instance;
>>>>> +   }
>>>>> +
>>>>> +   ctx_param.size = offsetof(typeof(*ctx_engine), class_instance[i + 1]);
>>>>> +   ctx_param.value = to_user_pointer(ctx_engine);
>>>>> +
>>>>> +   gem_context_set_param(fd, &ctx_param);
>>>>> +}
>>>>> +
>>>>> +/*
>>>>> + * Initializes the list of engines.
>>>>> + *
>>>>> + * Returns:
>>>>> + *
>>>>> + *  - 0 in case of success and the get/set_param ioctls are implemented
>>>>> + *  - -ENODEV in case of success but I915_CONTEXT_PARAM_ENGINES is not
>>>>> + *    implemented in DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM ioctl.
>>>>> + */
>>>>> +static int init_engine_list(int fd)
>>>>> +{
>>>>> +   const char *class_names[] = { "rcs", "bcs", "vcs", "vecs" };
>>>>> +   struct drm_i915_query_engine_info *query_engine;
>>>>> +   unsigned char query_buffer[SIZEOF_QUERY] = { };
>>>>> +   struct drm_i915_gem_context_param ctx_param = {
>>>>> +           .param = I915_CONTEXT_PARAM_ENGINES,
>>>>> +   };
>>>>> +   int i;
>>>>> +
>>>>> +   /* the list is already initialized */
>>>>> +   if (intel_active_engines2)
>>>>> +           return gem_has_engine_topology() ? 0 : -ENODEV;
>>>>> +
>>>>> +   /*
>>>>> +    * We check first whether the I915_CONTEXT_PARAM_ENGINES parameter is
>>>>> +    * supported by the running kernel. If not, __gem_context_get_param()
>>>>> +    * will return -EINVAL which, at this point, is not necessarily a
>>>>> +    * failure but it means that we need to fall beck to polling the engines
>>>>> +    * directly from intel_execution_engines2[].
>>>>> +    *
>>>>> +    * We will return -ENODEV with the meaning of missing interface
>>>>> +    */
>>>>> +   if (__gem_context_get_param(fd, &ctx_param)) {
>>>>> +           intel_active_engines2 = intel_execution_engines2;
>>>>> +           return -ENODEV;
>>>>> +   }
>>>>> +
>>>>> +   query_engine = (struct drm_i915_query_engine_info *) query_buffer;
>>>>> +   query_engines(fd, query_engine);
>>>>
>>>> Do a two stage query here - first ask for required buffer size, then
>>>> allocate it and query:
>>>>
>>>>       size = query_engines(fd, NULL, 0);
>>>>       query_engine = malloc(size);
>>>>       query_engines(fd, query_engine, size);
>>>
>>> As above :) this is how I did it at the beginning and with Chris
>>> we agreed that 64 is big enough, and then we had some review
>>> rounds on agreeing on buffer size, name, calculation and so
>>> on.
>>>
>>> Shall I move back? Honestly I like the double call, as well.
>>
>> Leave it at least for now then.
>>
>>>>> +
>>>>> +   igt_assert(query_engine->num_engines < I915_EXEC_RING_MASK + 1);
>>>>
>>>> Remove this since we don't need to limit the size of the array during the
>>>> query. It is only the engine map that has the limitation.
>>>
>>> This came after one of the 10 rounds of review as well :)
>>> It's indeed a bit redundant, It's mainly for the malloc,
>>> to avoid crazy allocation in case something goes wrong.
>>>
>>>>> +
>>>>> +   intel_active_engines2 = malloc((query_engine->num_engines + 1) *
>>>>> +                                  sizeof(*intel_active_engines2));
>>>>> +   igt_assert(intel_active_engines2);
>>>>> +
>>>>> +   for (i = 0; i < query_engine->num_engines; i++) {
>>>>> +           char *name;
>>>>> +           int class = query_engine->engines[i].engine_class;
>>>>> +           int instance = query_engine->engines[i].engine_instance;
>>>>
>>>> Could use __u16 to match uapi.
>>>
>>> Sure!
>>>
>>>>> +
>>>>> +           intel_active_engines2[i].class = class;
>>>>> +           intel_active_engines2[i].instance = instance;
>>>>> +
>>>>> +           /* if we don't recognise the class, then we mark it as "unk" */
>>>>> +           if (class >= ARRAY_SIZE(class_names) || !class_names[class])
>>>>
>>>> Second condition seems to be not needed at the moment.
>>>
>>> Yes, of course, it must be a leftover.
>>>
>>>>> +struct intel_execution_engine2 *get_active_engines(int fd)
>>>>
>>>> Unexport this one?
>>>
>>> I thought someone might need it, I will do it static and rework
>>> it with the bool gem_has_engine_topology() as they would become
>>> very similar.
>>>
>>>>> +{
>>>>> +   if (init_engine_list(fd)) {
>>>>> +           igt_info("using pre-allocated engine list\n");
>>>>> +           return NULL;
>>>>> +   }
>>>>> +
>>>>> +   return intel_active_engines2;
>>>>> +}
>>>>> +
>>>>> +struct intel_execution_engine2 *gem_set_context_get_engines(int fd,
>>>>> +                                                       uint32_t ctx_id)
>>>>
>>>> I find the name of this function really confusing (set get what?). Can I
>>>> suggest __gem_prepare_context_engines or something like that?
>>>
>>> The name means "set context and get engines", as it's a "get_"
>>> kind of function. But as it came out, I have a very limited
>>> imagination for names choice. I will call it
>>> __gem_prepare_context_engines(), then.
>>>
>>>>> +{
>>>>> +   struct intel_execution_engine2 *active_engines = get_active_engines(fd);
>>>>> +
>>>>> +   if (!active_engines)
>>>>> +           return intel_execution_engines2;
>>>>
>>>> So without engine discovery the function will not configure the engine map.
>>>> How will the code inside for_each_engine2 be able to work then?
>>>
>>> It's this part of the for_each_engine2 that would make it work:
>>>
>>>    for_if (gem_has_engine_topology() || \
>>>            gem_has_engine(fd, e2__->class, e2__->instance))
>>>
>>> We had quite some discussions about backward compatibility and
>>> the whole thing is designed to be backward compatible, otherwise,
>>> I would have removed quite some code from the above.
>>
>> Yes, I comment on that in the other reply.
>>
>>>> I guess it's okay since we don't expect to see a kernel with ctx engine map
>>>> support and no engine discovery but it feels a bit unnatural.
>>>>
>>>> But I think it effectively means you should have igt_require_gem_engine_list
>>>> in here since it is otherwise not usable from the for_each_engine2 iterator.
>>>>
>>>>> +
>>>>> +   set_ctx_param_engines(fd, ctx_id);
>>>>
>>>> You need to handle the use case where the context already has a map
>>>> configured instead of overriding it. Should be easy.
>>>
>>> Yes, this is relevant to your comment above, about the
>>> igt_require(<new_api>). This part is checked in
>>> 'get_active_engines(fd)', that returns NULL if we do not have the
>>> new uapi and therefore it returns the old list of engines.
>>>
>>> adding the igt_require() (asset?) would be redundant.
>>> Am I missing any use case?
>>
>> AFAIR Chris wanted it to support the use case where he configures the
>> engine map and passes the context to for_each_engine2. Maybe he changed
>> his mind?
> 
> I want to change engine maps on the fly, have multiple contexts with
> different engines, and have multiple devices, and have it all work.
>   
>> It would be tricky to do since then there is no intel_active_engine
>> array to walk.. Could only define the iterator that the pointer to
>> engine is sometimes not valid (must be checked) and added index used to
>> submit. Not sure, depends if Chris still wants this feature.
> 
> Just construct the engine array in the iterator. DO NOT USE A STATIC
> MAP! :)

Multiple devices I think we really need to punt for later.

In place engine discovery has some value to try if Andi wants to try.

Regards,

Tvrtko