[Intel-gfx] [PATCH] drm/i915/selftests: Add request throughput measurement to perf

Thu Apr 23 14:59:37 UTC 2020

Quoting Andi Shyti (2020-04-23 15:18:47)
> Hi Chris,
> 
> > > > +static int s_many(void *arg)
> > > > +{
> > > > +     struct perf_series *ps = arg;
> > > 
> > > why do we need to go through void... all functions are taking a
> > > perf_series structure.
> > 
> > The kthread API defines the function pointer as int (*fn)(void *arg);
> 
> In the parallel tes, not in the serial test. right? But, OK,
> it's not a big deal.

Imagine I copy-paste the kthread_run as well :)

> > > > +                     p->time = ktime_sub(ktime_get(), p->time);
> > > > +                     if (p->busy) {
> > > > +                             p->busy = ktime_sub(intel_engine_get_busy_time(p->engine),
> > > > +                                                 p->busy - 1);
> > > > +                             intel_disable_engine_stats(p->engine);
> > > > +                     }
> > > > +
> > > > +                     err = switch_to_kernel_sync(ce, err);
> > > 
> > > how about err?
> > 
> > It's a case of where we need to flush all the engines regardless of the
> > error, so we wait until the end.
> 
> so you put it there to remove also the warning :)

The idea was that it would keep the error until the end, trying to flush
as it went.

> > > > +                     p->runtime += intel_context_get_total_runtime_ns(ce);
> > > 
> > > assigning a negative value to an unsigned so that you can add it
> > > later? looks nice but odd :)
> > > 
> > > It's easier to understand if we do
> > > 
> > > p->runtime = intel_context_get_total_runtime_ns(ce) - p->runtime;
> > > 
> > > if you like it, no need to change, though.
> > 
> > I've done both. I'm fond of the unsigned addition of a negative number.
> > It's a few instructions less. :-p
> 
> Yes, nevertheless, I like it, even though it's somehow an abuse
> of the concept of unsigned.

It's not negative, just a large positive number!

> > > > +     count = 0;
> > > > +     p->time = ktime_get();
> > > > +     do {
> > > > +             struct i915_request *rq;
> > > > +
> > > > +             rq = i915_request_create(ce);
> > > > +             if (IS_ERR(rq)) {
> > > > +                     err = PTR_ERR(rq);
> > > > +                     break;
> > > > +             }
> > > > +
> > > > +             i915_request_add(rq);
> > > 
> > > do we need a request_put as well?
> > 
> > No. Nasty API, add consumes the reference. It's on the list to update
> > now that we have a ~majority of cases that want to keep a reference to
> > their request.
> 
> arrghh!
> 
> > > > +             for_each_uabi_engine(engine, i915) {
> > > > +                     int status;
> > > > +
> > > > +                     if (IS_ERR(engines[idx].tsk))
> > > > +                             break;
> > > 
> > > if we break here aren't we missing engine_pm_put and put_task?
> > 
> > No. We broke in the earlier loop as well, so the ERR_PTR is the
> > sentinel, indicating the end of the assignments.
> 
> Oh yes, true!
> 
> Reviewed-by: Andi Shyti <andi.shyti at intel.com>
> 
> Now finally, after a month it has been lingering around, you can
> finally push it.

And get back to the real question of how we can improve the backend so
that we can saturate the GPU with the least number of CPUs.

And if we are happy with the kernel, there's the issue of where all the
throughput goes between the kernel and userspace. (The purpose of this
test was to ensure we had no silly bugs inside the kernel that was
causing the sub-par userspace performance. We don't seem to, but it's
also not perfect, plenty of room for improvement.)
-Chris