[RFC v2 4/8] drm/amdgpu: Serialize non TDR gpu recovery with TDRs

Christian König ckoenig.leichtzumerken at gmail.com
Mon Jan 17 19:17:29 UTC 2022


Am 17.01.22 um 20:14 schrieb Andrey Grodzovsky:
>
> Ping on the question
>

Oh, my! That was already more than a week ago and is completely swapped 
out of my head again.

> Andrey
>
> On 2022-01-05 1:11 p.m., Andrey Grodzovsky wrote:
>>>> Also, what about having the reset_active or in_reset flag in the 
>>>> reset_domain itself?
>>>
>>> Of hand that sounds like a good idea.
>>
>>
>> What then about the adev->reset_sem semaphore ? Should we also move 
>> this to reset_domain ?  Both of the moves have functional
>> implications only for XGMI case because there will be contention over 
>> accessing those single instance variables from multiple devices
>> while now each device has it's own copy.

Since this is a rw semaphore that should be unproblematic I think. It 
could just be that the cache line of the lock then plays ping/pong 
between the CPU cores.

>>
>> What benefit the centralization into reset_domain gives - is it for 
>> example to prevent one device in a hive trying to access through MMIO 
>> another one's
>> VRAM (shared FB memory) while the other one goes through reset ?

I think that this is the killer argument for a centralized lock, yes.

Christian.

>>
>> Andrey 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20220117/2a816d84/attachment.htm>


More information about the amd-gfx mailing list