[PATCH 1/2] drm/amdgpu: increase hmm range get pages timeout

James Zhu jamesz at amd.com
Wed Dec 13 16:55:15 UTC 2023


On 2023-12-13 11:23, Felix Kuehling wrote:
>
> On 2023-12-13 10:24, James Zhu wrote:
>> Ping ...
>>
>> On 2023-12-08 18:01, James Zhu wrote:
>>> When application tries to allocate all system memory and cause memory
>>> to swap out. Needs more time for hmm_range_fault to validate the
>>> remaining page for allocation. To be safe, increase timeout value to
>>> 1 second for 64MB range.
>>>
>>> Signed-off-by: James Zhu <James.Zhu at amd.com>
>
> This is not the first time we're incrementing this timeout. Eventually 
> we should get rid of that and find a way to make this work reliably 
> without a timeout. There can always be situations where faults take 
> longer, and we should not fail randomly in those cases.
>
> There are also some FIXMEs in this code that should be addressed at 
> the same time.
>
> That said, as a short-term fix, this patch is
[JZ] Yes, it is just a short-term fix. the root cause is still under study,
>
> Acked-by: Felix Kuehling <Felix.Kuehling at amd.com>
>
>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c | 4 ++--
>>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
>>> index 081267161d40..b24eb5821fd1 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
>>> @@ -190,8 +190,8 @@ int amdgpu_hmm_range_get_pages(struct 
>>> mmu_interval_notifier *notifier,
>>>           pr_debug("hmm range: start = 0x%lx, end = 0x%lx",
>>>               hmm_range->start, hmm_range->end);
>>>   -        /* Assuming 128MB takes maximum 1 second to fault page 
>>> address */
>>> -        timeout = max((hmm_range->end - hmm_range->start) >> 27, 1UL);
>>> +        /* Assuming 64MB takes maximum 1 second to fault page 
>>> address */
>>> +        timeout = max((hmm_range->end - hmm_range->start) >> 26, 1UL);
>>>           timeout *= HMM_RANGE_DEFAULT_TIMEOUT;
>>>           timeout = jiffies + msecs_to_jiffies(timeout);


More information about the amd-gfx mailing list