[Intel-gfx] [PATCH v3 2/3] drm/i915: Update error capture code to avoid using the current vma state
Thomas Hellström
thomas.hellstrom at linux.intel.com
Fri Oct 29 06:31:26 UTC 2021
On 10/29/21 00:55, Matthew Brost wrote:
> On Thu, Oct 28, 2021 at 02:01:27PM +0200, Thomas Hellström wrote:
>> With asynchronous migrations, the vma state may be several migrations
>> ahead of the state that matches the request we're capturing.
>> Address that by introducing an i915_vma_snapshot structure that
>> can be used to snapshot relevant state at request submission.
>> In order to make sure we access the correct memory, the snapshots take
>> references on relevant sg-tables and memory regions.
>>
>> Also move the capture list allocation out of the fence signaling
>> critical path and use the CONFIG_DRM_I915_CAPTURE_ERROR define to
>> avoid compiling in members and functions used for error capture
>> when they're not used.
>>
>> Finally, correct lockdep annotation would reveal that error capture is
>> typically done in the fence signalling critical path. Alter the
>> error capture memory allocation mode accordingly.
>>
> I've seen this as well:
> https://patchwork.freedesktop.org/patch/451415/?series=93704&rev=5
>
> John Harrison and Daniele feeling was if a NOWAIT memory allocation
> context was used if the system was under any amount of memory pressure
> the error capture is likely to fail due to the size of the objects being
> allocated. Daniel's Vetter has purposed another solution - basically
> allocate a page at the NOWAIT context which is a larger rework.
>
> We have Jira for this. I'll dig this up and send it over off the list if
> you want to join that discussion.
>
> Matt
>
Please do, I basically agree with John and Daniele error capture may
fail under memory pressure, but I couldn't see how we could avoid that
short of exposing us to dma-fence deadlocks.
I figure basically we'd have to pin all vmas, reset, retire the request
and *then* do the allocating parts of the capture.
I'll ping Daniel about the best course of action meanwhile for the above
series.
/Thomas
More information about the Intel-gfx
mailing list