[PATCH] drm/amdkfd: handle errors from svm validate and map
Felix Kuehling
felix.kuehling at amd.com
Wed Sep 13 18:27:09 UTC 2023
On 2023-09-13 13:33, Philip Yang wrote:
>
>
> On 2023-09-13 12:14, Felix Kuehling wrote:
>> On 2023-09-13 11:16, Philip Yang wrote:
>>> If new range is added to update list, splited to multiple pranges with
>>> max_svm_range_pages alignment, and svm validate and map returns error
>>> for the first prange, then the caller retry should add pranges with
>>> prange->is_error_flag or prange without prange->mapped_to_gpu to the
>>> update list, to update GPU mapping for the entire range.
>>
>> It looks like the only new thing here is to remove the "same
>> attribute" optimization for ranges that are not mapped on the GPU. I
>> don't fully understand the scenario you're describing here, but it
>> feels like this change has a bigger impact than it needs to have.
>> Your description specifically talks about ranges split at
>> max_svm_range_pages boundaries. But your patch affects all ranges not
>> mapped on the GPU, even it prange->is_error_flag is not set.
>>
>> Maybe that's OK, because the expensive thing is updating existing
>> mappings unnecessarily. If there is no existing mapping yet, it's
>> probably not a big deal. I just don't understand the scenario that
>> requires a retry without the prange->is_error_flag being set. Maybe
>> a better fix would be to ensure that prange->is_error_flag gets set
>> in your scenario.
>
> Another way to fix the issue, set_attr continue to call
> svm_range_validate_and_map for all pranges of update_list after
> svm_range_validate_and_map return error, this change will have less
> side-effect.
>
Another option is to get rid of the is_error_flag completely. Instead
only use mapped_to_gpu and set it to false if validate_and_map fails.
Regards,
Felix
> Regards,
>
> Philip
>
>>
>> Regards,
>> Felix
>>
>>
>>>
>>> Fixes: c22b04407097 ("drm/amdkfd: flag added to handle errors from
>>> svm validate and map")
>>> Signed-off-by: Philip Yang <Philip.Yang at amd.com>
>>> Tested-by: James Zhu <james.zhu at amd.com>
>>> ---
>>> drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 5 +++--
>>> 1 file changed, 3 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>> index 61dd66bddc3c..8871329e9cbd 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>>> @@ -825,7 +825,7 @@ svm_range_is_same_attrs(struct kfd_process *p,
>>> struct svm_range *prange,
>>> }
>>> }
>>> - return !prange->is_error_flag;
>>> + return true;
>>> }
>>> /**
>>> @@ -2228,7 +2228,8 @@ svm_range_add(struct kfd_process *p, uint64_t
>>> start, uint64_t size,
>>> next = interval_tree_iter_next(node, start, last);
>>> next_start = min(node->last, last) + 1;
>>> - if (svm_range_is_same_attrs(p, prange, nattr, attrs)) {
>>> + if (!prange->is_error_flag && prange->mapped_to_gpu &&
>>> + svm_range_is_same_attrs(p, prange, nattr, attrs)) {
>>> /* nothing to do */
>>> } else if (node->start < start || node->last > last) {
>>> /* node intersects the update range and its attributes
More information about the amd-gfx
mailing list