[Intel-gfx] [PATCH v2] drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture

Bloomfield, Jon jon.bloomfield at intel.com
Wed Dec 6 17:25:37 UTC 2017


> -----Original Message-----
> From: Intel-gfx [mailto:intel-gfx-bounces at lists.freedesktop.org] On Behalf
> Of Bloomfield, Jon
> Sent: Wednesday, December 6, 2017 9:01 AM
> To: Chris Wilson <chris at chris-wilson.co.uk>; intel-gfx at lists.freedesktop.org
> Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
> Subject: Re: [Intel-gfx] [PATCH v2] drm/i915: Prevent machine hang from
> Broxton's vtd w/a and error capture
> 
> > -----Original Message-----
> > From: Chris Wilson [mailto:chris at chris-wilson.co.uk]
> > Sent: Wednesday, December 6, 2017 7:38 AM
> > To: intel-gfx at lists.freedesktop.org
> > Cc: Chris Wilson <chris at chris-wilson.co.uk>; Bloomfield, Jon
> > <jon.bloomfield at intel.com>; Harrison, John C
> <john.c.harrison at intel.com>;
> > Ursulin, Tvrtko <tvrtko.ursulin at intel.com>; Joonas Lahtinen
> > <joonas.lahtinen at linux.intel.com>; Daniel Vetter <daniel.vetter at ffwll.ch>
> > Subject: [PATCH v2] drm/i915: Prevent machine hang from Broxton's vtd
> w/a
> > and error capture
> >
> > Since capturing the error state requires fiddling around with the GGTT
> > to read arbitrary buffers and is itself run under stop_machine(), it
> > deadlocks the machine (effectively a hard hang) when run in conjunction
> > with Broxton's VTd workaround to serialize GGTT access.
> >
> > v2: Store the ERR_PTR in first_error so that the error can be reported
> > to the user via sysfs.
> >
> > Fixes: 0ef34ad6222a ("drm/i915: Serialize GTT/Aperture accesses on BXT")
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Jon Bloomfield <jon.bloomfield at intel.com>
> > Cc: John Harrison <john.C.Harrison at intel.com>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> > Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
> 
> It's  a real shame to lose error capture on BXT. Can we wrap stop_machine to
> make it recursive ?
> 
> Something like...
> 
> static cpumask_t sm_mask;
> 
> struct sm_args {
>         cpu_stop_fn_t *fn;
>         void *data;
> };
> 
> void do_recursive_stop(void *sm_arg_data)
> {
>         struct sm_arg *args = sm_arg_data;
> 
>         /* We're stopped - flag the fact to prevent recursion */
>         cpumask_set_cpu(smp_processor_id(), &sm_mask);
> 
>         args->fn(args->data);
> 
>         /* Re-enable recursion */
>         cpumask_clear_cpu(smp_processor_id(), &sm_mask);
> }
> 
> void recursive_stop_machine(cpu_stop_fn_t fn, void *data)
> {
>         if (cpumask_test_cpu(smp_processor_id(), &sm_mask)) {
>                 /* We were already stopped, so can just call directly */
>                 fn(data);
>         }
>         else {
>                 /* Our CPU is not currently stopped */
>                 struct sm_args *args = {fn, data};
>                 stop_machine(do_recursive_stop, args, NULL);
>         }
> }

... I think a single bool is sufficient in place of the cpumask, since it is set and cleared
within stop_machine - I started out trying to set/clear outside.


More information about the Intel-gfx mailing list