[Intel-gfx] [PATCH 2/2] drm/i915/pmu: Fix CPU hotplug with multiple GPUs

Tue Oct 20 12:10:02 UTC 2020

Quoting Chris Wilson (2020-10-20 12:59:57)
> Quoting Tvrtko Ursulin (2020-10-20 11:08:22)
> > From: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > 
> > Since we keep a driver global mask of online CPUs and base the decision
> > whether PMU needs to be migrated upon it, we need to make sure the
> > migration is done for all registered PMUs (so GPUs).
> > 
> > To do this we need to track the current CPU for each PMU and base the
> > decision on whether to migrate on a comparison between global and local
> > state.
> > 
> > At the same time, since dynamic CPU hotplug notification slots are a
> > scarce resource and given how we already register the multi instance type
> > state, we can and should add multiple instance of the i915 PMU to this
> > same state and not allocate a new one for every GPU.
> > 
> > v2:
> >  * Use pr_notice. (Chris)
> > 
> > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > Suggested-by: Daniel Vetter <daniel.vetter at intel.com> # dynamic slot optimisation
> > Cc: Chris Wilson <chris at chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/i915/i915_pci.c |  7 ++++-
> >  drivers/gpu/drm/i915/i915_pmu.c | 50 ++++++++++++++++++++-------------
> >  drivers/gpu/drm/i915/i915_pmu.h |  6 +++-
> >  3 files changed, 41 insertions(+), 22 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
> > index 27964ac0638a..a384f51c91c1 100644
> > --- a/drivers/gpu/drm/i915/i915_pci.c
> > +++ b/drivers/gpu/drm/i915/i915_pci.c
> > @@ -1150,9 +1150,13 @@ static int __init i915_init(void)
> >                 return 0;
> >         }
> >  
> > +       i915_pmu_init();
> > +
> >         err = pci_register_driver(&i915_pci_driver);
> > -       if (err)
> > +       if (err) {
> > +               i915_pmu_exit();
> >                 return err;
> > +       }
> >  
> >         i915_perf_sysctl_register();
> >         return 0;
> > @@ -1166,6 +1170,7 @@ static void __exit i915_exit(void)
> >         i915_perf_sysctl_unregister();
> >         pci_unregister_driver(&i915_pci_driver);
> >         i915_globals_exit();
> > +       i915_pmu_exit();
> >  }
> >  
> >  module_init(i915_init);
> > diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
> > index 51ed7d0efcdc..0d6c0945621e 100644
> > --- a/drivers/gpu/drm/i915/i915_pmu.c
> > +++ b/drivers/gpu/drm/i915/i915_pmu.c
> > @@ -30,6 +30,7 @@
> >  #define ENGINE_SAMPLE_BITS (1 << I915_PMU_SAMPLE_BITS)
> >  
> >  static cpumask_t i915_pmu_cpumask;
> > +static unsigned int i915_pmu_target_cpu = -1;
> >  
> >  static u8 engine_config_sample(u64 config)
> >  {
> > @@ -1049,25 +1050,32 @@ static int i915_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
> >  static int i915_pmu_cpu_offline(unsigned int cpu, struct hlist_node *node)
> >  {
> >         struct i915_pmu *pmu = hlist_entry_safe(node, typeof(*pmu), cpuhp.node);
> > -       unsigned int target;
> > +       unsigned int target = i915_pmu_target_cpu;
> 
> So we still have multiple callbacks, one per pmu. But each callback is
> now stored in a list from the cpuhp_slot instead of each callback having
> its own slot.
> 
> >  
> >         GEM_BUG_ON(!pmu->base.event_init);
> >  
> >         if (cpumask_test_and_clear_cpu(cpu, &i915_pmu_cpumask)) {
> 
> On first callback...
> 
> >                 target = cpumask_any_but(topology_sibling_cpumask(cpu), cpu);
> 
> Pick any other cpu.
> 
> > +
> >                 /* Migrate events if there is a valid target */
> >                 if (target < nr_cpu_ids) {
> >                         cpumask_set_cpu(target, &i915_pmu_cpumask);
> > -                       perf_pmu_migrate_context(&pmu->base, cpu, target);
> > +                       i915_pmu_target_cpu = target;
> 
> Store target for all callbacks.
> 
> >                 }
> >         }
> >  
> > +       if (target < nr_cpu_ids && target != pmu->cpuhp.cpu) {
> 
> If global [i915_pmu_target_cpu] target has changed, update perf.
> 
> > +               perf_pmu_migrate_context(&pmu->base, cpu, target);
> > +               pmu->cpuhp.cpu = target;
> 
> It is claimed that cpuhp_state_remove_instance() will call the offline
> callback for all online cpus... Do we need a pmu->base.state != STOPPED
> guard?

s/claimed/it definitely does :)/

Or rather pmu->closed.
-Chris