[PATCH 1/3] drm/i915: Allocate intel_engine_coredump_alloc with ALLOW_FAIL
Matthew Brost
matthew.brost at intel.com
Wed Jan 19 20:47:22 UTC 2022
On Tue, Jan 18, 2022 at 05:29:54PM -0800, John Harrison wrote:
> On 1/18/2022 13:43, Matthew Brost wrote:
> > Allocate intel_engine_coredump_alloc with ALLOW_FAIL rather than
> > GFP_KERNEL do fully decouple the error capture from fence signalling.
> s/do/to/
>
Yep.
> >
> > Fixes: 8b91cdd4f8649 ("drm/i915: Use __GFP_KSWAPD_RECLAIM in the capture code")
> Does this really count as a bug fix over that patch? Isn't it more of a
> changing in policy now that we do DRM fence signalling and that other
> changes related to error capture behaviour have been implemented.
>
That patch was supposed to allow signalling annotations to be added,
without this change I think these annotations would be broken. So I
think the Fixes is correct.
> >
> > Signed-off-by: Matthew Brost <matthew.brost at intel.com>
> > ---
> > drivers/gpu/drm/i915/i915_gpu_error.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> > index 67f3515f07e7a..aee42eae4729f 100644
> > --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> > +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> > @@ -1516,7 +1516,7 @@ capture_engine(struct intel_engine_cs *engine,
> > struct i915_request *rq = NULL;
> > unsigned long flags;
> > - ee = intel_engine_coredump_alloc(engine, GFP_KERNEL);
> > + ee = intel_engine_coredump_alloc(engine, ALLOW_FAIL);
> This still makes me nervous that we will fail to allocate engine captures in
> stress test scenarios, which are exactly the kind of situations where we
> need valid error captures.
>
Me too, but this whole file has been changed to the ALLOW_FAIL. Thomas
and Daniel seem to think this is correct. For what it's worth this
allocation is less than a page, so it should be pretty safe to do with
ALLOW_FAIL.
> There is also still a GFP_KERNEL in __i915_error_grow(). Doesn't that need
> updating as well?
>
Probably just should be deleted. If look it tries with ALLOW_FAIL first,
then falls back to GFP_KERNEL. I didn't want to make that update in this
series yet but that is something to keep an eye on.
Matt
> John.
>
> > if (!ee)
> > return NULL;
>
More information about the dri-devel
mailing list