[Intel-gfx] [PATCH 1/3] drm/i915: Allocate intel_engine_coredump_alloc with ALLOW_FAIL

John Harrison john.c.harrison at intel.com
Wed Jan 19 20:56:11 UTC 2022


On 1/19/2022 12:47, Matthew Brost wrote:
> On Tue, Jan 18, 2022 at 05:29:54PM -0800, John Harrison wrote:
>> On 1/18/2022 13:43, Matthew Brost wrote:
>>> Allocate intel_engine_coredump_alloc with ALLOW_FAIL rather than
>>> GFP_KERNEL do fully decouple the error capture from fence signalling.
>> s/do/to/
>>
> Yep.
>
>>> Fixes: 8b91cdd4f8649 ("drm/i915: Use __GFP_KSWAPD_RECLAIM in the capture code")
>> Does this really count as a bug fix over that patch? Isn't it more of a
>> changing in policy now that we do DRM fence signalling and that other
>> changes related to error capture behaviour have been implemented.
>>
> That patch was supposed to allow signalling annotations to be added,
> without this change I think these annotations would be broken. So I
> think the Fixes is correct.
>   
>>> Signed-off-by: Matthew Brost <matthew.brost at intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/i915_gpu_error.c | 2 +-
>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
>>> index 67f3515f07e7a..aee42eae4729f 100644
>>> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
>>> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
>>> @@ -1516,7 +1516,7 @@ capture_engine(struct intel_engine_cs *engine,
>>>    	struct i915_request *rq = NULL;
>>>    	unsigned long flags;
>>> -	ee = intel_engine_coredump_alloc(engine, GFP_KERNEL);
>>> +	ee = intel_engine_coredump_alloc(engine, ALLOW_FAIL);
>> This still makes me nervous that we will fail to allocate engine captures in
>> stress test scenarios, which are exactly the kind of situations where we
>> need valid error captures.
>>
> Me too, but this whole file has been changed to the ALLOW_FAIL. Thomas
> and Daniel seem to think this is correct. For what it's worth this
> allocation is less than a page, so it should be pretty safe to do with
> ALLOW_FAIL.
>
>> There is also still a GFP_KERNEL in __i915_error_grow(). Doesn't that need
>> updating as well?
>>
> Probably just should be deleted. If look it tries with ALLOW_FAIL first,
> then falls back to GFP_KERNEL. I didn't want to make that update in this
> series yet but that is something to keep an eye on.
>
> Matt
>   
Okay. Makes sense. With the description typo fixed:
Reviewed-by: John Harrison <John.C.Harrison at Intel.com>

>> John.
>>
>>>    	if (!ee)
>>>    		return NULL;



More information about the Intel-gfx mailing list