[igt-dev] [RFC PATCH v10 2/6] lib/i915: add gem_engine_topology library

Thu Mar 7 14:59:27 UTC 2019

Quoting Tvrtko Ursulin (2019-03-07 14:16:13)
> 
> On 07/03/2019 13:42, Andi Shyti wrote:
> > Hi Tvrtko,
> > 
> >>> +#include "drmtest.h"
> >>> +#include "igt_gt.h"
> >>> +#include "ioctl_wrappers.h"
> >>> +
> >>> +#include "i915/gem_engine_topology.h"
> >>> +
> >>> +#define SIZEOF_CTX_PARAM   offsetof(struct i915_context_param_engines, \
> >>> +                                   class_instance[I915_EXEC_RING_MASK + 1])
> >>> +#define SIZEOF_QUERY               offsetof(struct drm_i915_query_engine_info, \
> >>> +                                   engines[I915_EXEC_RING_MASK + 1])
> >>
> >> There is no relationship between I915_EXEC_RING_MASK and the number of
> >> engines we can query. I'll suggest how I think it should work a bit later.
> > 
> > I915_EXEC_RING_MASK is the upper limit checked in the driver,
> > that's why I used that one. I could have hardcoded it to 64, but
> > I wanted to keep it consistent to the driver.
> 
> I'll argue with Chris to remove this limit from the engine map (come eb3 
> might not be relevant any more) - nevertheless the query does not have 
> this limit so I think it would be best to decouple the code from the start.

At the moment it fits in a u64 and I don't have to worry about flexible
bitmaps. It's a bridge to cross later, we've plenty of space in the enum
api to version a new interface.
> 
> >>> +
> >>> +static struct intel_execution_engine2 *intel_active_engines2;
> >>> +
> >>> +int __gem_query(int fd, struct drm_i915_query *q)
> >>> +{
> >>> +   int err = 0;
> >>> +
> >>> +   if (igt_ioctl(fd, DRM_IOCTL_I915_QUERY, q))
> >>> +           err = -errno;
> >>> +
> >>> +   errno = 0;
> >>> +   return err;
> >>> +}
> >>> +
> >>> +void gem_query(int fd, struct drm_i915_query *q)
> >>> +{
> >>> +   igt_assert_eq(__gem_query(fd, q), 0);
> >>> +}
> >>
> >> I would keep these two static if there are no external callers for now.
> > 
> > Yes, I can make them static, I thought someone might need them in
> > the future.
> 
> Could be but it will be easy to export them at that point. I wanted to 
> keep the patch smallest possible for now so I can focus on reviewing the 
> important bits.
> 
> > 
> >>> +
> >>> +static void query_engines(int fd,
> >>> +                     struct drm_i915_query_engine_info *query_engines)
> >>> +{
> >>> +   struct drm_i915_query_item item = { };
> >>> +   struct drm_i915_query query = { };
> >>> +
> >>> +   item.query_id = DRM_I915_QUERY_ENGINE_INFO;
> >>> +   query.items_ptr = to_user_pointer(&item);
> >>> +   query.num_items = 1;
> >>> +   item.length = SIZEOF_QUERY;
> >>
> >> I think making this function take buffer size and return the queried size
> >> would be simplest.
> > 
> > I haven't really understood, do you mean doing something like:
> > 
> > static int query_engines(...., int size)
> > {
> >     [...]
> >     return size;
> > }
> > 
> > or calling the gem_query twice, first time for getting the size
> > and second time for allocating the necessary resources.
> > 
> > Because in the first case, we have the size in query_engines, is
> > that right?
> > 
> > While if you mean the second case, this is how I did it at the
> > very beginning, but then with Chris we agreed that 64
> > (I915_EXEC_RING_MASK) is "big enough". If you are strong with it,
> > I can revert it back.
> 
> Hm.. what happened to the old Chris!? :)) I am not objecting that 
> strongly. Especially if you already changed it once then just leave it. 
> It can be changed later if someone gets the desire to tidy..

If I recall, my position is that you can have a fixed stack and then
fallback to a heap. But for these interfaces where we have to return a
heap pointer, we might as well use heaps and be flexible.

> 
> >>> +
> >>> +   item.data_ptr = to_user_pointer(query_engines);
> >>> +
> >>> +   gem_query(fd, &query);
> >>> +}
> >>> +
> >>> +bool gem_has_engine_topology(void)
> >>> +{
> >>> +   return intel_active_engines2 != intel_execution_engines2;
> >>
> >> A problem for the exported API if init_engine_list hasn't been called yet?
> >> Maybe:
> >>
> >>      if (!intel_active_engines2)
> >>              init_engine_list(fd);
> >>
> >>      return intel_active_engines2 != intel_execution_engines2;
> >>
> >> ?
> > 
> > Yes, it's indeed prone to be misused, I would need to make a more
> > generic way, because the two lines
> > 
> >     if (!intel_active_engines2)
> >            init_engine_list(fd);
> > 
> > are called so many times that I'm having an overdose of them :)
> 
> Yes the call chains and responsibilities do feel a bit unobvious.

I really don't want init_engine_list(fd) at all :) Ideally the query
interface is cheap enough that it doesn't matter (other than debug noise
in strace) and if we can't make it that cheap, it is cached in sensible
scopes.

That may be an igt_device class (i915_device subclass).
> 
> > 
> >>> +}
> >>> +
> >>> +static void set_ctx_param_engines(int fd, uint32_t ctx_id)
> >>> +{
> >>> +   struct i915_context_param_engines *ctx_engine;
> >>> +   struct drm_i915_gem_context_param ctx_param;
> >>> +   const struct intel_execution_engine2 *e2;
> >>> +   uint8_t buff[SIZEOF_CTX_PARAM] = { };
> >>> +   int i;
> >>> +
> >>> +   if (!gem_has_engine_topology())
> >>> +           return;
> >>
> >> AFAICS this can be assert here.
> > 
> > I think this check is redundant as it's never going to happen.
> > This function is 'static' and the calling function checks that
> > already. I will remove it.
> 
> Better to have it as assert but as you wish. I guess even without an 
> assert it would segfault when it tries to iterate intel_active_engines2.
> 
> > 
> >>> +
> >>> +   ctx_engine = (struct i915_context_param_engines *) buff;
> >>> +
> >>> +   ctx_param.ctx_id = ctx_id;
> >>> +   ctx_param.param = I915_CONTEXT_PARAM_ENGINES;
> >>> +
> >>> +   ctx_engine->extensions = 0;
> >>> +   for (i = 0, e2 = intel_active_engines2; e2->name; i++, e2++) {
> >>
> >> Here I think is the place where we check that the total number of engines
> >> does not exceed the engine map execbuf index.
> >>
> >>      igt_require(i <= I915_EXEC_RING_MASK);
> >>
> >> ?
> > 
> > I can add that, but it would be redundant as the list would have
> > never been created otherwise... anyway, better be safe than sorry :)
> 
> Yes I know, I wrote that because I was suggesting to remove the 64 
> engine limit from the query and just have it in the engine map.
> 
> But as said if you already discussed that point then leave it.
> 
> > BTW, given my natural confusion between 'require' and 'assert',
> > you meant 'assert', right?
> 
> I meant require. But I don't know either, it's so hard to choose 
> sometimes. :)
> 
> >>> +           ctx_engine->class_instance[i].engine_class = e2->class;
> >>> +           ctx_engine->class_instance[i].engine_instance = e2->instance;
> >>> +   }
> >>> +
> >>> +   ctx_param.size = offsetof(typeof(*ctx_engine), class_instance[i + 1]);
> >>> +   ctx_param.value = to_user_pointer(ctx_engine);
> >>> +
> >>> +   gem_context_set_param(fd, &ctx_param);
> >>> +}
> >>> +
> >>> +/*
> >>> + * Initializes the list of engines.
> >>> + *
> >>> + * Returns:
> >>> + *
> >>> + *  - 0 in case of success and the get/set_param ioctls are implemented
> >>> + *  - -ENODEV in case of success but I915_CONTEXT_PARAM_ENGINES is not
> >>> + *    implemented in DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM ioctl.
> >>> + */
> >>> +static int init_engine_list(int fd)
> >>> +{
> >>> +   const char *class_names[] = { "rcs", "bcs", "vcs", "vecs" };
> >>> +   struct drm_i915_query_engine_info *query_engine;
> >>> +   unsigned char query_buffer[SIZEOF_QUERY] = { };
> >>> +   struct drm_i915_gem_context_param ctx_param = {
> >>> +           .param = I915_CONTEXT_PARAM_ENGINES,
> >>> +   };
> >>> +   int i;
> >>> +
> >>> +   /* the list is already initialized */
> >>> +   if (intel_active_engines2)
> >>> +           return gem_has_engine_topology() ? 0 : -ENODEV;
> >>> +
> >>> +   /*
> >>> +    * We check first whether the I915_CONTEXT_PARAM_ENGINES parameter is
> >>> +    * supported by the running kernel. If not, __gem_context_get_param()
> >>> +    * will return -EINVAL which, at this point, is not necessarily a
> >>> +    * failure but it means that we need to fall beck to polling the engines
> >>> +    * directly from intel_execution_engines2[].
> >>> +    *
> >>> +    * We will return -ENODEV with the meaning of missing interface
> >>> +    */
> >>> +   if (__gem_context_get_param(fd, &ctx_param)) {
> >>> +           intel_active_engines2 = intel_execution_engines2;
> >>> +           return -ENODEV;
> >>> +   }
> >>> +
> >>> +   query_engine = (struct drm_i915_query_engine_info *) query_buffer;
> >>> +   query_engines(fd, query_engine);
> >>
> >> Do a two stage query here - first ask for required buffer size, then
> >> allocate it and query:
> >>
> >>      size = query_engines(fd, NULL, 0);
> >>      query_engine = malloc(size);
> >>      query_engines(fd, query_engine, size);
> > 
> > As above :) this is how I did it at the beginning and with Chris
> > we agreed that 64 is big enough, and then we had some review
> > rounds on agreeing on buffer size, name, calculation and so
> > on.
> > 
> > Shall I move back? Honestly I like the double call, as well.
> 
> Leave it at least for now then.
> 
> >>> +
> >>> +   igt_assert(query_engine->num_engines < I915_EXEC_RING_MASK + 1);
> >>
> >> Remove this since we don't need to limit the size of the array during the
> >> query. It is only the engine map that has the limitation.
> > 
> > This came after one of the 10 rounds of review as well :)
> > It's indeed a bit redundant, It's mainly for the malloc,
> > to avoid crazy allocation in case something goes wrong.
> > 
> >>> +
> >>> +   intel_active_engines2 = malloc((query_engine->num_engines + 1) *
> >>> +                                  sizeof(*intel_active_engines2));
> >>> +   igt_assert(intel_active_engines2);
> >>> +
> >>> +   for (i = 0; i < query_engine->num_engines; i++) {
> >>> +           char *name;
> >>> +           int class = query_engine->engines[i].engine_class;
> >>> +           int instance = query_engine->engines[i].engine_instance;
> >>
> >> Could use __u16 to match uapi.
> > 
> > Sure!
> > 
> >>> +
> >>> +           intel_active_engines2[i].class = class;
> >>> +           intel_active_engines2[i].instance = instance;
> >>> +
> >>> +           /* if we don't recognise the class, then we mark it as "unk" */
> >>> +           if (class >= ARRAY_SIZE(class_names) || !class_names[class])
> >>
> >> Second condition seems to be not needed at the moment.
> > 
> > Yes, of course, it must be a leftover.
> > 
> >>> +struct intel_execution_engine2 *get_active_engines(int fd)
> >>
> >> Unexport this one?
> > 
> > I thought someone might need it, I will do it static and rework
> > it with the bool gem_has_engine_topology() as they would become
> > very similar.
> > 
> >>> +{
> >>> +   if (init_engine_list(fd)) {
> >>> +           igt_info("using pre-allocated engine list\n");
> >>> +           return NULL;
> >>> +   }
> >>> +
> >>> +   return intel_active_engines2;
> >>> +}
> >>> +
> >>> +struct intel_execution_engine2 *gem_set_context_get_engines(int fd,
> >>> +                                                       uint32_t ctx_id)
> >>
> >> I find the name of this function really confusing (set get what?). Can I
> >> suggest __gem_prepare_context_engines or something like that?
> > 
> > The name means "set context and get engines", as it's a "get_"
> > kind of function. But as it came out, I have a very limited
> > imagination for names choice. I will call it
> > __gem_prepare_context_engines(), then.
> > 
> >>> +{
> >>> +   struct intel_execution_engine2 *active_engines = get_active_engines(fd);
> >>> +
> >>> +   if (!active_engines)
> >>> +           return intel_execution_engines2;
> >>
> >> So without engine discovery the function will not configure the engine map.
> >> How will the code inside for_each_engine2 be able to work then?
> > 
> > It's this part of the for_each_engine2 that would make it work:
> > 
> >   for_if (gem_has_engine_topology() || \
> >           gem_has_engine(fd, e2__->class, e2__->instance))
> > 
> > We had quite some discussions about backward compatibility and
> > the whole thing is designed to be backward compatible, otherwise,
> > I would have removed quite some code from the above.
> 
> Yes, I comment on that in the other reply.
> 
> >> I guess it's okay since we don't expect to see a kernel with ctx engine map
> >> support and no engine discovery but it feels a bit unnatural.
> >>
> >> But I think it effectively means you should have igt_require_gem_engine_list
> >> in here since it is otherwise not usable from the for_each_engine2 iterator.
> >>
> >>> +
> >>> +   set_ctx_param_engines(fd, ctx_id);
> >>
> >> You need to handle the use case where the context already has a map
> >> configured instead of overriding it. Should be easy.
> > 
> > Yes, this is relevant to your comment above, about the
> > igt_require(<new_api>). This part is checked in
> > 'get_active_engines(fd)', that returns NULL if we do not have the
> > new uapi and therefore it returns the old list of engines.
> > 
> > adding the igt_require() (asset?) would be redundant.
> > Am I missing any use case?
> 
> AFAIR Chris wanted it to support the use case where he configures the 
> engine map and passes the context to for_each_engine2. Maybe he changed 
> his mind?

I want to change engine maps on the fly, have multiple contexts with
different engines, and have multiple devices, and have it all work.

> It would be tricky to do since then there is no intel_active_engine 
> array to walk.. Could only define the iterator that the pointer to 
> engine is sometimes not valid (must be checked) and added index used to 
> submit. Not sure, depends if Chris still wants this feature.

Just construct the engine array in the iterator. DO NOT USE A STATIC
MAP! :)
-Chris