[RFC v2 4/8] drm/amdgpu: Serialize non TDR gpu recovery with TDRs
JingWen Chen
jingwech at amd.com
Mon Feb 7 02:41:14 UTC 2022
Hi Andrey,
I don't have any XGMI machines here, maybe you can reach out shaoyun for help.
On 2022/1/29 上午12:57, Grodzovsky, Andrey wrote:
> Just a gentle ping.
>
> Andrey
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> *From:* Grodzovsky, Andrey
> *Sent:* 26 January 2022 10:52
> *To:* Christian König <ckoenig.leichtzumerken at gmail.com>; Koenig, Christian <Christian.Koenig at amd.com>; Lazar, Lijo <Lijo.Lazar at amd.com>; dri-devel at lists.freedesktop.org <dri-devel at lists.freedesktop.org>; amd-gfx at lists.freedesktop.org <amd-gfx at lists.freedesktop.org>; Chen, JingWen <JingWen.Chen2 at amd.com>
> *Cc:* Chen, Horace <Horace.Chen at amd.com>; Liu, Monk <Monk.Liu at amd.com>
> *Subject:* Re: [RFC v2 4/8] drm/amdgpu: Serialize non TDR gpu recovery with TDRs
>
>
> JingWen - could you maybe give those patches a try on SRIOV XGMI system ? If you see issues maybe you could let me connect and debug. My SRIOV XGMI system which Shayun kindly arranged for me is not loading the driver with my drm-misc-next branch even without my patches.
>
> Andrey
>
> On 2022-01-17 14:21, Andrey Grodzovsky wrote:
>>
>>
>> On 2022-01-17 2:17 p.m., Christian König wrote:
>>> Am 17.01.22 um 20:14 schrieb Andrey Grodzovsky:
>>>>
>>>> Ping on the question
>>>>
>>>
>>> Oh, my! That was already more than a week ago and is completely swapped out of my head again.
>>>
>>>> Andrey
>>>>
>>>> On 2022-01-05 1:11 p.m., Andrey Grodzovsky wrote:
>>>>>>> Also, what about having the reset_active or in_reset flag in the reset_domain itself?
>>>>>>
>>>>>> Of hand that sounds like a good idea.
>>>>>
>>>>>
>>>>> What then about the adev->reset_sem semaphore ? Should we also move this to reset_domain ? Both of the moves have functional
>>>>> implications only for XGMI case because there will be contention over accessing those single instance variables from multiple devices
>>>>> while now each device has it's own copy.
>>>
>>> Since this is a rw semaphore that should be unproblematic I think. It could just be that the cache line of the lock then plays ping/pong between the CPU cores.
>>>
>>>>>
>>>>> What benefit the centralization into reset_domain gives - is it for example to prevent one device in a hive trying to access through MMIO another one's
>>>>> VRAM (shared FB memory) while the other one goes through reset ?
>>>
>>> I think that this is the killer argument for a centralized lock, yes.
>>
>>
>> np, i will add a patch with centralizing both flag into reset domain and resend.
>>
>> Andrey
>>
>>
>>>
>>> Christian.
>>>
>>>>>
>>>>> Andrey
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20220207/7ae8c26f/attachment.htm>
More information about the amd-gfx
mailing list