[PATCH] drm/amdgpu: Fix multiple GPU resets in XGMI hive.
Andrey Grodzovsky
andrey.grodzovsky at amd.com
Thu May 12 13:44:11 UTC 2022
Sure, I will investigate that. What about the ticket which LIjo raised
which was basically doing 8 resets instead of one ? Lijo - can this
ticket wait until I come up with this new design for amdgpu reset
function or u need a quick solution now in which case we can use the
already existing patch temporary.
Andrey
On 2022-05-12 09:15, Christian König wrote:
>> I am not sure why HIVE is the object we should work with, hive is one
>> use case, single device is another, then Lijo described something
>> called partition which is what ? Particular pipe within GPU ?. What
>> they all share in common
>> IMHO is that all of them use reset domain when they want a recovery
>> operation, so maybe GPU reset should be oriented to work with reset
>> domain ?
>
> Yes, exactly that's the idea.
>
> Basically the reset domain knowns which amdgpu devices it needs to
> reset together.
>
> If you then represent that so that you always have a hive even when
> you only have one device in it, or if you put an array of devices
> which needs to be reset together into the reset domain doesn't matter.
>
> Maybe go for the later approach, that is probably a bit cleaner and
> less code to change.
>
> Christian.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20220512/4ce5ef03/attachment.htm>
More information about the amd-gfx
mailing list