[PATCH] drm/amdgpu: use new HMM APIs and helpers

Christian König ckoenig.leichtzumerken at gmail.com
Mon Jun 3 11:23:14 UTC 2019


Am 03.06.19 um 12:17 schrieb Christian König:
> Am 01.06.19 um 00:01 schrieb Kuehling, Felix:
>> On 2019-05-31 5:32 p.m., Yang, Philip wrote:
>>> On 2019-05-31 3:42 p.m., Kuehling, Felix wrote:
>>>> On 2019-05-31 1:28 p.m., Yang, Philip wrote:
>>>>> On 2019-05-30 6:36 p.m., Kuehling, Felix wrote:
>>>>>>>               #if IS_ENABLED(CONFIG_DRM_AMDGPU_USERPTR)
>>>>>>> -    if (gtt->ranges &&
>>>>>>> -        ttm->pages[0] == hmm_pfn_to_page(&gtt->ranges[0],
>>>>>>> -                         gtt->ranges[0].pfns[0]))
>>>>>>> +    if (gtt->range &&
>>>>>>> +        ttm->pages[0] == hmm_device_entry_to_page(gtt->range,
>>>>>>> + gtt->range->pfns[0]))
>>>>>> I think just checking gtt->range here is enough. If gtt->range is 
>>>>>> not
>>>>>> NULL here, we're leaking memory.
>>>>>>
>>>>> If just checking gtt->range, there is a false warning in amdgpu_test
>>>>> userptr case in amdgpu_cs_list_validate path. If userptr is 
>>>>> invalidated,
>>>>> then ttm->pages[0] is outdated pages, lobj->user_pages is new 
>>>>> pages, it
>>>>> goes to ttm_tt_unbind first to unpin old pages (this causes false
>>>>> warning) then call amdgpu_ttm_tt_set_user_pages.
>>>> But doesn't that mean we're leaking the gtt->range somewhere?
>>>>
>>> ttm_tt_unbind is called from ttm_tt_destroy and 
>>> amdgpu_cs_list_validate,
>>> the later one causes false warning. ttm_ttm_destory path is fine to 
>>> only
>>> check gtt->range.
>>>
>>> Double checked, amdgpu_ttm_tt_get_user_pages and
>>> amdgpu_ttm_tt_get_user_pages_done always match in both paths, so no 
>>> leak
>>> gtt->range.
>>>
>>> 1. amdgpu_gem_userptr_ioctl
>>>          amdgpu_ttm_tt_get_user_pages
>>>              ttm->pages for userptr pages
>>>          amdgpu_ttm_tt_get_user_pages_done
>>>
>>> 2. amdgpu_cs_ioctl
>>>          amdgpu_cs_parser_bos
>>>              amdgpu_ttm_tt_get_user_pages
>>>              if (userpage_invalidated)
>>>                  e->user_pages for new pages
>>>              amdgpu_cs_list_validate
>>>                  if (userpage_invalidated)
>>>                     ttm_tt_unbind ttm->pages // this causes warning
>>> amdgpu_ttm_tt_set_user_pages(ttm->pages, e->user_pages)
>> Hmm, I think amdgpu_cs is doing something weird there. It does some
>> double book-keeping of the user pages in the BO list and the TTM BO. We
>> did something similar for KFD and simplified it when moving to HMM. It
>> could probably be simplified for amdgpu_cs as well. But not in this
>> patch series.
>
> That actually sounds like a bug to me.
>
> It is most likely a leftover from the time when we couldn't get the 
> pages for a BO while the BO was reserved.

Mhm, at least it's not racy in the way I thought it would be. But it is 
certainly still overkill and should be simplified.

Philip are you taking a look or should I tackle this?

Thanks,
Christian.

>
>
> Going to take a closer look,
> Christian.
>
>>
>> I'll review your updated change.
>>
>> Thanks,
>>     Felix
>>
>>
>>>          amdgpu_cs_submit
>>>              amdgpu_ttm_tt_get_user_pages_done
>>>
>>>> Regards,
>>>>       Felix
>>>>
>>>>
>>>>> I will submit patch v2, to add retry if hmm_range_fault returns 
>>>>> -EAGAIN.
>>>>> use kzalloc to allocate small size range.
>>>>>
>>>>> Thanks,
>>>>> Philip
>>>>>
>>>>>> Regards,
>>>>>>         Felix
>>>>>>
>>>>>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>



More information about the amd-gfx mailing list