[Intel-gfx] [PATCH v2] drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture

Bloomfield, Jon jon.bloomfield at intel.com
Wed Dec 6 17:01:18 UTC 2017


> -----Original Message-----
> From: Chris Wilson [mailto:chris at chris-wilson.co.uk]
> Sent: Wednesday, December 6, 2017 7:38 AM
> To: intel-gfx at lists.freedesktop.org
> Cc: Chris Wilson <chris at chris-wilson.co.uk>; Bloomfield, Jon
> <jon.bloomfield at intel.com>; Harrison, John C <john.c.harrison at intel.com>;
> Ursulin, Tvrtko <tvrtko.ursulin at intel.com>; Joonas Lahtinen
> <joonas.lahtinen at linux.intel.com>; Daniel Vetter <daniel.vetter at ffwll.ch>
> Subject: [PATCH v2] drm/i915: Prevent machine hang from Broxton's vtd w/a
> and error capture
> 
> Since capturing the error state requires fiddling around with the GGTT
> to read arbitrary buffers and is itself run under stop_machine(), it
> deadlocks the machine (effectively a hard hang) when run in conjunction
> with Broxton's VTd workaround to serialize GGTT access.
> 
> v2: Store the ERR_PTR in first_error so that the error can be reported
> to the user via sysfs.
> 
> Fixes: 0ef34ad6222a ("drm/i915: Serialize GTT/Aperture accesses on BXT")
> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> Cc: Jon Bloomfield <jon.bloomfield at intel.com>
> Cc: John Harrison <john.C.Harrison at intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen at linux.intel.com>
> Cc: Daniel Vetter <daniel.vetter at ffwll.ch>

It's  a real shame to lose error capture on BXT. Can we wrap stop_machine to make it recursive ?

Something like...

static cpumask_t sm_mask;

struct sm_args {
        cpu_stop_fn_t *fn;
        void *data;
};

void do_recursive_stop(void *sm_arg_data)
{
        struct sm_arg *args = sm_arg_data;

        /* We're stopped - flag the fact to prevent recursion */
        cpumask_set_cpu(smp_processor_id(), &sm_mask);

        args->fn(args->data);

        /* Re-enable recursion */
        cpumask_clear_cpu(smp_processor_id(), &sm_mask);
}

void recursive_stop_machine(cpu_stop_fn_t fn, void *data)
{
        if (cpumask_test_cpu(smp_processor_id(), &sm_mask)) {
                /* We were already stopped, so can just call directly */
                fn(data);
        }
        else {
                /* Our CPU is not currently stopped */
                struct sm_args *args = {fn, data};
                stop_machine(do_recursive_stop, args, NULL);
        }
}


More information about the Intel-gfx mailing list