[Intel-gfx] [PATCH v9 2/8] drm/i915/ttm: add tt shmem backend

Tue Dec 7 13:10:13 UTC 2021

On 18/10/2021 10:10, Matthew Auld wrote:
> For cached objects we can allocate our pages directly in shmem. This
> should make it possible(in a later patch) to utilise the existing
> i915-gem shrinker code for such objects. For now this is still disabled.
> 
> v2(Thomas):
>    - Add optional try_to_writeback hook for objects. Importantly we need
>      to check if the object is even still shrinkable; in between us
>      dropping the shrinker LRU lock and acquiring the object lock it could for
>      example have been moved. Also we need to differentiate between
>      "lazy" shrinking and the immediate writeback mode. Also later we need to
>      handle objects which don't even have mm.pages, so bundling this into
>      put_pages() would require somehow handling that edge case, hence
>      just letting the ttm backend handle everything in try_to_writeback
>      doesn't seem too bad.
> v3(Thomas):
>    - Likely a bad idea to touch the object from the unpopulate hook,
>      since it's not possible to hold a reference, without also creating
>      circular dependency, so likely this is too fragile. For now just
>      ensure we at least mark the pages as dirty/accessed when called from the
>      shrinker on WILLNEED objects.
>    - s/try_to_writeback/shrinker_release_pages, since this can do more
>      than just writeback.
>    - Get rid of do_backup boolean and just set the SWAPPED flag prior to
>      calling unpopulate.
>    - Keep shmem_tt as lowest priority for the TTM LRU bo_swapout walk, since
>      these just get skipped anyway. We can try to come up with something
>      better later.
> v4(Thomas):
>    - s/PCI_DMA/DMA/. Also drop NO_KERNEL_MAPPING and NO_WARN, which
>      apparently doesn't do anything with streaming mappings.
>    - Just pass along the error for ->truncate, and assume nothing.
> 
> Signed-off-by: Matthew Auld <matthew.auld at intel.com>
> Cc: Thomas Hellström <thomas.hellstrom at linux.intel.com>
> Cc: Christian König <christian.koenig at amd.com>
> Cc: Oak Zeng <oak.zeng at intel.com>
> Reviewed-by: Thomas Hellström <thomas.hellstrom at linux.intel.com>
> Acked-by: Oak Zeng <oak.zeng at intel.com>

[snip]

> -static void try_to_writeback(struct drm_i915_gem_object *obj,
> -			     unsigned int flags)
> +static int try_to_writeback(struct drm_i915_gem_object *obj, unsigned int flags)
>   {
> +	if (obj->ops->shrinker_release_pages)
> +		return obj->ops->shrinker_release_pages(obj,
> +							flags & I915_SHRINK_WRITEBACK);

I have a high level question about how does this new vfunc fit with the existing code.

Because I notice in the implementation (i915_ttm_shrinker_release_pages) it ends up doing:
...

        switch (obj->mm.madv) {
        case I915_MADV_DONTNEED:
                return i915_ttm_purge(obj);
        case __I915_MADV_PURGED:
                return 0;
        }

... and then...

        if (should_writeback)
                __shmem_writeback(obj->base.size, i915_tt->filp->f_mapping);

Which on a glance looks like a possible conceptual duplication of the concepts in this very function (try_to_writeback):

> +
>   	switch (obj->mm.madv) {
>   	case I915_MADV_DONTNEED:
>   		i915_gem_object_truncate(obj);
> -		return;
> +		return 0;
>   	case __I915_MADV_PURGED:
> -		return;
> +		return 0;
>   	}
>   
>   	if (flags & I915_SHRINK_WRITEBACK)
>   		i915_gem_object_writeback(obj);

So question is this the final design or some futher tidy is possible/planned?

Background to my question is that I will float a patch which proposes to remove writeback altogether. There are reports from the fields that it causes deadlocks and looking at 2d6692e642e7 ("drm/i915: Start writeback from the shrinker") and its history it seems like it was a known risk.

Apart from the code organisation questions, on the practical level - do you need writeback from the TTM backend or while I am proposing to remove it from the "legacy" paths, I can propose removing it from the TTM flow as well?

Regards,

Tvrtko