[PATCH] drm/amdgpu: use new HMM APIs and helpers

Christian König ckoenig.leichtzumerken at gmail.com
Mon Jun 3 10:17:13 UTC 2019


Am 01.06.19 um 00:01 schrieb Kuehling, Felix:
> On 2019-05-31 5:32 p.m., Yang, Philip wrote:
>> On 2019-05-31 3:42 p.m., Kuehling, Felix wrote:
>>> On 2019-05-31 1:28 p.m., Yang, Philip wrote:
>>>> On 2019-05-30 6:36 p.m., Kuehling, Felix wrote:
>>>>>>        
>>>>>>        #if IS_ENABLED(CONFIG_DRM_AMDGPU_USERPTR)
>>>>>> -	if (gtt->ranges &&
>>>>>> -	    ttm->pages[0] == hmm_pfn_to_page(&gtt->ranges[0],
>>>>>> -					     gtt->ranges[0].pfns[0]))
>>>>>> +	if (gtt->range &&
>>>>>> +	    ttm->pages[0] == hmm_device_entry_to_page(gtt->range,
>>>>>> +						      gtt->range->pfns[0]))
>>>>> I think just checking gtt->range here is enough. If gtt->range is not
>>>>> NULL here, we're leaking memory.
>>>>>
>>>> If just checking gtt->range, there is a false warning in amdgpu_test
>>>> userptr case in amdgpu_cs_list_validate path. If userptr is invalidated,
>>>> then ttm->pages[0] is outdated pages, lobj->user_pages is new pages, it
>>>> goes to ttm_tt_unbind first to unpin old pages (this causes false
>>>> warning) then call amdgpu_ttm_tt_set_user_pages.
>>> But doesn't that mean we're leaking the gtt->range somewhere?
>>>
>> ttm_tt_unbind is called from ttm_tt_destroy and amdgpu_cs_list_validate,
>> the later one causes false warning. ttm_ttm_destory path is fine to only
>> check gtt->range.
>>
>> Double checked, amdgpu_ttm_tt_get_user_pages and
>> amdgpu_ttm_tt_get_user_pages_done always match in both paths, so no leak
>> gtt->range.
>>
>> 1. amdgpu_gem_userptr_ioctl
>>          amdgpu_ttm_tt_get_user_pages
>>              ttm->pages for userptr pages
>>          amdgpu_ttm_tt_get_user_pages_done
>>
>> 2. amdgpu_cs_ioctl
>>          amdgpu_cs_parser_bos
>>              amdgpu_ttm_tt_get_user_pages
>>              if (userpage_invalidated)
>>                  e->user_pages for new pages
>>              amdgpu_cs_list_validate
>>                  if (userpage_invalidated)
>>                     ttm_tt_unbind ttm->pages // this causes warning
>>                     amdgpu_ttm_tt_set_user_pages(ttm->pages, e->user_pages)
> Hmm, I think amdgpu_cs is doing something weird there. It does some
> double book-keeping of the user pages in the BO list and the TTM BO. We
> did something similar for KFD and simplified it when moving to HMM. It
> could probably be simplified for amdgpu_cs as well. But not in this
> patch series.

That actually sounds like a bug to me.

It is most likely a leftover from the time when we couldn't get the 
pages for a BO while the BO was reserved.

Going to take a closer look,
Christian.

>
> I'll review your updated change.
>
> Thanks,
>     Felix
>
>
>>          amdgpu_cs_submit
>>              amdgpu_ttm_tt_get_user_pages_done
>>
>>> Regards,
>>>       Felix
>>>
>>>
>>>> I will submit patch v2, to add retry if hmm_range_fault returns -EAGAIN.
>>>> use kzalloc to allocate small size range.
>>>>
>>>> Thanks,
>>>> Philip
>>>>
>>>>> Regards,
>>>>>         Felix
>>>>>
>>>>>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx



More information about the amd-gfx mailing list