[Intel-gfx] [PATCH 10/10] drm/i915/gtt: Setup phys pages for 3lvl pdps

Fri Jun 14 17:36:55 UTC 2019

Quoting Mika Kuoppala (2019-06-14 17:43:50)
> If we setup backing phys page for 3lvl pdps, even they
> are not used, we lose 5 pages per ppgtt.
> 
> Trading this memory on bsw, we gain more common code paths for all
> gen8+ directory manipulation. And those paths are now void of checks
> for page directory type, making the hot paths faster.
> 
> Signed-off-by: Mika Kuoppala <mika.kuoppala at linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 112 +++++++++++++++++-----------
>  1 file changed, 68 insertions(+), 44 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index ba2802c25d13..c76c92072d54 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -714,22 +714,14 @@ static struct i915_page_directory *alloc_pd(struct i915_address_space *vm)
>         return pd;
>  }
>  
> -static inline bool pd_has_phys_page(const struct i915_page_directory * const pd)
> -{
> -       return pd->base.page;
> -}
> -
>  static void free_pd(struct i915_address_space *vm,
>                     struct i915_page_directory *pd)
>  {
> -       if (likely(pd_has_phys_page(pd)))
> -               cleanup_page_dma(vm, &pd->base);
> -
> +       cleanup_page_dma(vm, &pd->base);
>         kfree(pd);
>  }
>  
>  #define init_pd(vm, pd, to) {                                  \
> -       GEM_DEBUG_BUG_ON(!pd_has_phys_page(pd));                \
>         fill_px((vm), (pd), gen8_pde_encode(px_dma(to), I915_CACHE_LLC)); \
>         memset_p((pd)->entry, (to), 512);                               \
>  }
> @@ -747,8 +739,7 @@ static void __set_pd_entry(struct i915_page_directory * const pd,
>  
>  #define set_pd_entry(pd, pde, to) ({                                   \
>         (pd)->entry[(pde)] = (to);                                      \
> -       if (likely(pd_has_phys_page(pd)))                               \
> -               __set_pd_entry((pd), (pde),                             \
> +       __set_pd_entry((pd), (pde),                                     \
>                        gen8_pde_encode(px_dma(to), I915_CACHE_LLC));    \
>  })

The macro seems like it is redundant now as well, since __set_pd_entry
should be able to set pd->entry[pde] = to. Or at least one step too
large.

Ok, I'm willing to make the sacrifice for simplicity. I think the extra
allocation will be in the noise, you need a lot of contexts with small
workloads for it to be a significant factor i.e. microbenchmarks. And
even GL microbenchmarks tend to do some GL so aren't that sensitive to
context allocation rates. OglDrvCtx springs to mind. That would be a
reasonable sanity check that this is indeed in the noise.

The only question is does this not become simpler if we do this earlier?
-Chris