[Intel-gfx] [PATCH v2] drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture
Bloomfield, Jon
jon.bloomfield at intel.com
Wed Dec 6 17:25:37 UTC 2017
> -----Original Message-----
> From: Intel-gfx [mailto:intel-gfx-bounces at lists.freedesktop.org] On Behalf
> Of Bloomfield, Jon
> Sent: Wednesday, December 6, 2017 9:01 AM
> To: Chris Wilson <chris at chris-wilson.co.uk>; intel-gfx at lists.freedesktop.org
> Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
> Subject: Re: [Intel-gfx] [PATCH v2] drm/i915: Prevent machine hang from
> Broxton's vtd w/a and error capture
>
> > -----Original Message-----
> > From: Chris Wilson [mailto:chris at chris-wilson.co.uk]
> > Sent: Wednesday, December 6, 2017 7:38 AM
> > To: intel-gfx at lists.freedesktop.org
> > Cc: Chris Wilson <chris at chris-wilson.co.uk>; Bloomfield, Jon
> > <jon.bloomfield at intel.com>; Harrison, John C
> <john.c.harrison at intel.com>;
> > Ursulin, Tvrtko <tvrtko.ursulin at intel.com>; Joonas Lahtinen
> > <joonas.lahtinen at linux.intel.com>; Daniel Vetter <daniel.vetter at ffwll.ch>
> > Subject: [PATCH v2] drm/i915: Prevent machine hang from Broxton's vtd
> w/a
> > and error capture
> >
> > Since capturing the error state requires fiddling around with the GGTT
> > to read arbitrary buffers and is itself run under stop_machine(), it
> > deadlocks the machine (effectively a hard hang) when run in conjunction
> > with Broxton's VTd workaround to serialize GGTT access.
> >
> > v2: Store the ERR_PTR in first_error so that the error can be reported
> > to the user via sysfs.
> >
> > Fixes: 0ef34ad6222a ("drm/i915: Serialize GTT/Aperture accesses on BXT")
> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > Cc: Jon Bloomfield <jon.bloomfield at intel.com>
> > Cc: John Harrison <john.C.Harrison at intel.com>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> > Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> > Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
>
> It's a real shame to lose error capture on BXT. Can we wrap stop_machine to
> make it recursive ?
>
> Something like...
>
> static cpumask_t sm_mask;
>
> struct sm_args {
> cpu_stop_fn_t *fn;
> void *data;
> };
>
> void do_recursive_stop(void *sm_arg_data)
> {
> struct sm_arg *args = sm_arg_data;
>
> /* We're stopped - flag the fact to prevent recursion */
> cpumask_set_cpu(smp_processor_id(), &sm_mask);
>
> args->fn(args->data);
>
> /* Re-enable recursion */
> cpumask_clear_cpu(smp_processor_id(), &sm_mask);
> }
>
> void recursive_stop_machine(cpu_stop_fn_t fn, void *data)
> {
> if (cpumask_test_cpu(smp_processor_id(), &sm_mask)) {
> /* We were already stopped, so can just call directly */
> fn(data);
> }
> else {
> /* Our CPU is not currently stopped */
> struct sm_args *args = {fn, data};
> stop_machine(do_recursive_stop, args, NULL);
> }
> }
... I think a single bool is sufficient in place of the cpumask, since it is set and cleared
within stop_machine - I started out trying to set/clear outside.
More information about the Intel-gfx
mailing list