[PATCH] drm/amdgpu: Fix multiple GPU resets in XGMI hive.

Andrey Grodzovsky andrey.grodzovsky at amd.com
Thu May 12 13:44:11 UTC 2022


Sure, I will investigate that. What about the ticket which LIjo raised 
which was basically doing 8 resets instead of one  ? Lijo - can this 
ticket wait until I come up with this new design for amdgpu reset 
function or u need a quick solution now in which case we can use the 
already existing patch temporary.

Andrey

On 2022-05-12 09:15, Christian König wrote:
>> I am not sure why HIVE is the object we should work with, hive is one 
>> use case, single device is another, then Lijo described something 
>> called partition which is what ? Particular pipe within GPU ?. What 
>> they all share in common
>> IMHO is that all of them use reset domain when they want a recovery 
>> operation, so maybe GPU reset should be oriented to work with reset 
>> domain ?
>
> Yes, exactly that's the idea.
>
> Basically the reset domain knowns which amdgpu devices it needs to 
> reset together.
>
> If you then represent that so that you always have a hive even when 
> you only have one device in it, or if you put an array of devices 
> which needs to be reset together into the reset domain doesn't matter.
>
> Maybe go for the later approach, that is probably a bit cleaner and 
> less code to change.
>
> Christian. 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20220512/4ce5ef03/attachment.htm>


More information about the amd-gfx mailing list