amdgpu: ratelimit on vm faults to avoid spaming console

Tue Nov 8 13:53:02 UTC 2016

I had a mini discussion about this issue the other day on the IRC
channel. Essentially yes - we would want to map back from the offending
context back to the pid and I suppose kill it to restore a 'good state'.
I don't know yet how to go about implementing such a mechanism so any
suggestions would be really helpful to me!

I think, when the driver does winds up in these kinds of state, for us
to have the ability to provide something meaningful for the user to do
to file a bug report much like how the intel driver does would be a nice
addition.

In any case, this point of those patches were just to give some
breathing room and allow for machines to stay responsive enough to allow
for further debug and maintain access to a working console for the user.

Cheers,
Edward.

On 11/08/2016 09:10 PM, Christian König wrote:
> I would rather say that we need to add this to the per context lockup
> informations.
> 
> Marek had a rather good start for this, but from the VM fault handler it
> is actually rather tricky to figure out reliable who caused it.
> 
> Because of this we didn't followed this approach to the end.
> 
> Christian.
> 
> Am 07.11.2016 um 19:42 schrieb StDenis, Tom:
>>
>> Could we provide fault information through a ring buffer and a debugfs
>> or drm ioctl interface?
>>
>>
>> Tom
>>
>>
>>
>> ------------------------------------------------------------------------
>> *From:* amd-gfx <amd-gfx-bounces at lists.freedesktop.org> on behalf of
>> Alex Deucher <alexdeucher at gmail.com>
>> *Sent:* Monday, November 7, 2016 13:35
>> *To:* Edward O'Callaghan; Nicolai Hähnle; Marek Olsak
>> *Cc:* Michel Dänzer; amd-gfx list
>> *Subject:* Re: amdgpu: ratelimit on vm faults to avoid spaming console
>>  
>> On Sun, Nov 6, 2016 at 11:35 PM, Edward O'Callaghan
>> <funfunctor at folklore1984.net> wrote:
>> > These are rather minor however should help stop some folks
>> > machines grinding to a halt when a userspace application somehow
>> > gets the GPU into some horrible state causing the console to fill
>> > very quickly. Applies on top of master.
>>
>> I'm generally ok with the idea.  My only concern would be if this
>> would adversely affect the radeon GPUVM debugging options in mesa and
>> piglit. Nicolai? Marek?
>>
>> Alex
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>> amd-gfx Info Page - lists.freedesktop.org
>> <https://lists.freedesktop.org/mailman/listinfo/amd-gfx>
>> lists.freedesktop.org
>> To see the collection of prior postings to the list, visit the amd-gfx
>> Archives. Using amd-gfx: To post a message to all the list members,
>> send email ...
>>
>>
>>
>>
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> 
> 
> 
> 
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: OpenPGP digital signature
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20161109/32758606/attachment.sig>