[PATCH 17/35] drm/amdkfd: register HMM device private zone

Thomas Hellström (Intel) thomas_os at shipmail.org
Mon Mar 1 09:30:12 UTC 2021


On 3/1/21 9:58 AM, Daniel Vetter wrote:
> On Mon, Mar 01, 2021 at 09:46:44AM +0100, Thomas Hellström (Intel) wrote:
>> On 3/1/21 9:32 AM, Daniel Vetter wrote:
>>> On Wed, Jan 06, 2021 at 10:01:09PM -0500, Felix Kuehling wrote:
>>>> From: Philip Yang <Philip.Yang at amd.com>
>>>>
>>>> Register vram memory as MEMORY_DEVICE_PRIVATE type resource, to
>>>> allocate vram backing pages for page migration.
>>>>
>>>> Signed-off-by: Philip Yang <Philip.Yang at amd.com>
>>>> Signed-off-by: Felix Kuehling <Felix.Kuehling at amd.com>
>>> So maybe I'm getting this all wrong, but I think that the current ttm
>>> fault code relies on devmap pte entries (especially for hugepte entries)
>>> to stop get_user_pages. But this only works if the pte happens to not
>>> point at a range with devmap pages.
>> I don't think that's in TTM yet, but the proposed fix, yes (see email I just
>> sent in another thread),
>> but only for huge ptes.
>>
>>> This patch here changes that, and so probably breaks this devmap pte hack
>>> ttm is using?
>>>
>>> If I'm not wrong here then I think we need to first fix up the ttm code to
>>> not use the devmap hack anymore, before a ttm based driver can register a
>>> dev_pagemap. Also adding Thomas since that just came up in another
>>> discussion.
>> It doesn't break the ttm devmap hack per se, but it indeed allows gup to the
>> range registered, but here's where my lack of understanding why we can't
>> allow gup-ing TTM ptes if there indeed is a backing struct-page? Because
>> registering MEMORY_DEVICE_PRIVATE implies that, right?
> We need to keep supporting buffer based memory management for all the
> non-compute users. Because those require end-of-batch dma_fence semantics,
> which prevents us from using gpu page faults, which makes hmm not really
> work.
>
> And for buffer based memory manager we can't have gup pin random pages in
> there, that's not really how it works. Worst case ttm just assumes it can
> actually move buffers and reallocate them as it sees fit, and your gup
> mapping (for direct i/o or whatever) now points at a page of a buffer that
> you don't even own anymore. That's not good. Hence also all the
> discussions about preventing gup for bo mappings in general.
>
> Once we throw hmm into the mix we need to be really careful that the two
> worlds don't collide. Pure hmm is fine, pure bo managed memory is fine,
> mixing them is tricky.
> -Daniel

Hmm, OK so then registering MEMORY_DEVICE_PRIVATE means we can't set 
pxx_devmap because that would allow gup, which, in turn, means no huge 
TTM ptes.

/Thomas



More information about the amd-gfx mailing list