i915 ttm_tt shmem backend

Fri Sep 10 08:51:26 UTC 2021

Am 10.09.21 um 10:40 schrieb Thomas Hellström:
> On Fri, 2021-09-10 at 10:25 +0200, Christian König wrote:
>>
>> Am 10.09.21 um 10:08 schrieb Thomas Hellström:
>>> Perhaps some background and goal is worth mentioning here.
>>>
>>>
>>> On Thu, 2021-09-09 at 17:56 +0100, Matthew Auld wrote:
>>>> On Thu, 9 Sept 2021 at 17:43, Koenig, Christian
>>>> <Christian.Koenig at amd.com> wrote:
>>>>> Hi Matthew,
>>>>>
>>>>> this doesn't work, I've already tried something similar.
>>>>>
>>>>> TTM uses the reverse lookup functionality when migrating BOs
>>>>> between system and device memory. And that doesn't seem to work
>>>>> with pages from a shmem file.
>>>> Hmm, what do you mean by reverse lookup functionality? Could you
>>>> please point out where that is in the TTM code?
>>> I think this is in unmap_mapping_range() where, if we use
>>> VM_MIXEDMAP,
>>> there is a reverse lookup on the PTEs that point to real pages. Now
>>> that we move over to VM_PFNMAP, that problem should go away since
>>> core
>>> vm never has a page to investigate. Probably this is why things
>>> works
>>> on non-TTM i915 GEM.
>> Yeah, that was really likely the root problem. I didn't kept
>> investigating after realizing that my approach wouldn't work.
>>
>>> @Christian: Some background here:
>>> First I think that there might be things like the above that will
>>> pose
>>> problems, and we may or may not be able to overcome those but more
>>> importantly is that we agree with you that *if* we make it work, it
>>> is
>>> something that you as a maintainer of TTM can accept from a design-
>>> and
>>> maintainabiltiy point of view.
>>>
>>> The approach would be similar to the buddy allocator, we adapt some
>>> driver code to TTM in a way that it may be reused with other
>>> drivers,
>>> and if other drivers are interested, we'd assist in moving to core
>>> TTM.
>>> In essence it'd be a TTM shmem page pool with full shrinking
>>> ability
>>> for cached pages only.
>>>
>>> What we're really after here is the ability to shrink that doesn't
>>> regress much w r t the elaborate shrinker that's in i915 today that
>>> is
>>> power management aware and is also able to start shmem writebacks
>>> to
>>> avoid shmem just caching the pages instead of giving them back to
>>> the
>>> system (IIRC it was partly the lack of this that blocked earlier
>>> TTM
>>> shrinking efforts).
>>>
>>> And since it doesn't really matter whether the shrinker sits in
>>> core
>>> TTM or in a driver, I think a future goal might be a set of TTM
>>> shrinker helpers that makes sure we shrink the right TTM object,
>>> and
>>> perhaps a simple implementation that is typically used by simple
>>> drivers and other drivers can build on that for a more elaborate
>>> power-
>>> management aware shrinker.
>> That's understandable, but I think not necessary what we should aim
>> for
>> in the long term.
>>
>> First of all I would really like to move more of the functionality
>> from
>> ttm_pool.c into the core memory management, especially handling of
>> uncached and write combined memory.
>>
>> That's essentially completely architecture dependent and currently
>> implemented extremely awkward. Either Daniels suggestion of having a
>> GFP_WC or Christophs approach of moving all this into the DMA API is
>> the
>> way to go here.
>>
>> As long as i915 has no interest in USWC support implementing their
>> own
>> shmemfile backend sounds fine to me, but I have strong doubt that
>> this
>> will be of use to anybody else.
> OK. Sounds fine. In situations where we use WC system memory we will
> use what's in TTM today. BTW on the shrinking approach for WC pages,
> does the Christoph's DMA API solution envision some kind of support for
> this?

Not Christoph DMA API solution, but what I have in mind for the TTM 
shrinker should work.

Essentially a shmemfile per device should help in solving most of the 
issues we ran into.

Christian.

>
> /Thomas
>
>> Christian.
>>
>>> /Thomas
>>>
>>>
>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>> ________________________________
>>>>> Von: Matthew Auld <matthew.william.auld at gmail.com>
>>>>> Gesendet: Donnerstag, 9. September 2021 16:56
>>>>> An: Christian König <ckoenig.leichtzumerken at gmail.com>; Koenig,
>>>>> Christian <Christian.Koenig at amd.com>
>>>>> Cc: Thomas Hellström <thomas.hellstrom at linux.intel.com>; ML
>>>>> dri-
>>>>> devel <dri-devel at lists.freedesktop.org>
>>>>> Betreff: i915 ttm_tt shmem backend
>>>>>
>>>>> Hi Christian,
>>>>>
>>>>> We are looking into using shmem as a ttm_tt backend in i915 for
>>>>> cached
>>>>> system memory objects. We would also like to make such objects
>>>>> visible
>>>>> to the i915-gem shrinker, so that they may be swapped out or
>>>>> discarded
>>>>> when under memory pressure.
>>>>>
>>>>> One idea for handling this is roughly something like:
>>>>> - Add a new TTM_PAGE_FLAG_SHMEM flag, or similar.
>>>>> - Skip the ttm_pages_allocated accounting on such objects,
>>>>> similar
>>>>> to
>>>>> how FLAG_SG is already handled.
>>>>> - Skip all the page->mapping and page->index related bits, like
>>>>> in
>>>>> tt_add_mapping, since it looks like these are set and used by
>>>>> shmem.
>>>>> Not sure what functionally this might break, but looks like
>>>>> it's
>>>>> maybe
>>>>> only driver specific?
>>>>> - Skip calling into ttm_bo_swap_out/in and just have
>>>>> ttm_populate/unpopulate handle this directly for such objects.
>>>>> - Make such objects visible to the i915-gem shrinker.
>>>>>
>>>>> Does this approach look acceptable?
>