[RFC v2 4/8] drm/amdgpu: Serialize non TDR gpu recovery with TDRs

Andrey Grodzovsky andrey.grodzovsky at amd.com
Wed Jan 26 15:52:00 UTC 2022


JingWen - could you maybe give those patches a try on SRIOV XGMI system 
? If you see issues maybe you could let me connect and debug. My SRIOV 
XGMI system which Shayun kindly arranged for me is not loading the 
driver with my drm-misc-next branch even without my patches.

Andrey

On 2022-01-17 14:21, Andrey Grodzovsky wrote:
>
>
> On 2022-01-17 2:17 p.m., Christian König wrote:
>> Am 17.01.22 um 20:14 schrieb Andrey Grodzovsky:
>>>
>>> Ping on the question
>>>
>>
>> Oh, my! That was already more than a week ago and is completely 
>> swapped out of my head again.
>>
>>> Andrey
>>>
>>> On 2022-01-05 1:11 p.m., Andrey Grodzovsky wrote:
>>>>>> Also, what about having the reset_active or in_reset flag in the 
>>>>>> reset_domain itself?
>>>>>
>>>>> Of hand that sounds like a good idea.
>>>>
>>>>
>>>> What then about the adev->reset_sem semaphore ? Should we also move 
>>>> this to reset_domain ?  Both of the moves have functional
>>>> implications only for XGMI case because there will be contention 
>>>> over accessing those single instance variables from multiple devices
>>>> while now each device has it's own copy.
>>
>> Since this is a rw semaphore that should be unproblematic I think. It 
>> could just be that the cache line of the lock then plays ping/pong 
>> between the CPU cores.
>>
>>>>
>>>> What benefit the centralization into reset_domain gives - is it for 
>>>> example to prevent one device in a hive trying to access through 
>>>> MMIO another one's
>>>> VRAM (shared FB memory) while the other one goes through reset ?
>>
>> I think that this is the killer argument for a centralized lock, yes.
>
>
> np, i will add a patch with centralizing both flag into reset domain 
> and resend.
>
> Andrey
>
>
>>
>> Christian.
>>
>>>>
>>>> Andrey 
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20220126/6ffc4a81/attachment-0001.htm>


More information about the amd-gfx mailing list