[Intel-gfx] [PATCH v1] drm/i915/gem: Don't evict unmappable VMAs when pinning with PIN_MAPPABLE
Kasireddy, Vivek
vivek.kasireddy at intel.com
Fri Mar 18 07:39:20 UTC 2022
Hi Tvrtko,
>
> On 17/03/2022 07:23, Vivek Kasireddy wrote:
> > On platforms capable of allowing 8K (7680 x 4320) modes, pinning 2 or
> > more framebuffers/scanout buffers results in only one that is mappable/
> > fenceable. Therefore, pageflipping between these 2 FBs where only one
> > is mappable/fenceable creates latencies large enough to miss alternate
> > vblanks thereby producing less optimal framerate.
> >
> > This mainly happens because when i915_gem_object_pin_to_display_plane()
> > is called to pin one of the FB objs, the associated vma is identified
> > as misplaced -- because there is no space for it in the aperture --
> > and therefore i915_vma_unbind() is called which unbinds and evicts it.
> > This misplaced vma gets subseqently pinned only when
> > i915_gem_object_ggtt_pin_ww() is called without PIN_MAPPABLE. This whole
> > thing results in a latency of ~10ms and happens every other repaint cycle.
>
> Just out of curiosity - have you looked at where does this 10ms come
> from? Like is it simply clearing/writing PTEs so expensive, or there is
> more to it? Apologies if I asked this before..
[Kasireddy, Vivek] It appears the ~10ms latency seems to come from the
execution of gen8_ggtt_clear_range() as seen in the trace log:
weston-4124 [008] 161210.397563: funcgraph_entry: | __i915_vma_evict() {
weston-4124 [008] 161210.397564: funcgraph_entry: | ggtt_unbind_vma() {
weston-4124 [008] 161210.397564: funcgraph_entry: | gen8_ggtt_clear_range() {
weston-4124 [008] 161210.408012: funcgraph_exit: # 10416.281 us | }
weston-4124 [008] 161210.408012: funcgraph_exit: # 10448.740 us | }
weston-4124 [008] 161210.408012: funcgraph_entry: | __vma_put_pages() {
weston-4124 [008] 161210.408013: funcgraph_entry: 0.083 us | clear_pages();
weston-4124 [008] 161210.408013: funcgraph_exit: 0.578 us | }
weston-4124 [008] 161210.408013: funcgraph_exit: # 10449.622 us | }
And, for some reason, I can't get trace-cmd to capture more symbols to
gain more insights. However, gen8_ggtt_clear_range() seems to just do
this:
for (i = 0; i < num_entries; i++)
gen8_set_pte(>t_base[i], scratch_pte);
where,
vma's start = 182190080, length = 132710400, first = 44480, num_entries = 32400
and we have
void gen8_set_pte(void __iomem *addr, gen8_pte_t pte)
{
writeq(pte, addr);
}
Any idea why executing writeq 32400 times would result in a latency
of ~10ms?
>
> > Therefore, to fix this issue, we just ensure that the misplaced VMA
> > does not get evicted when we try to pin it with PIN_MAPPABLE -- by
> > returning early if the mappable/fenceable flag is not set.
> >
> > Testcase:
> > Running Weston and weston-simple-egl on an Alderlake_S (ADLS) platform
> > with a 8K at 60 mode results in only ~40 FPS (compared to ~59 FPS with
> > this patch). Since upstream Weston submits a frame ~7ms before the
> > next vblank, the latencies seen between atomic commit and flip event
> > are 7, 24 (7 + 16.66), 7, 24..... suggesting that it misses the
> > vblank every other frame.
> >
> > Here is the ftrace snippet that shows the source of the ~10ms latency:
> > i915_gem_object_pin_to_display_plane() {
> > 0.102 us | i915_gem_object_set_cache_level();
> > i915_gem_object_ggtt_pin_ww() {
> > 0.390 us | i915_vma_instance();
> > 0.178 us | i915_vma_misplaced();
> > i915_vma_unbind() {
> > __i915_active_wait() {
> > 0.082 us | i915_active_acquire_if_busy();
> > 0.475 us | }
> > intel_runtime_pm_get() {
> > 0.087 us | intel_runtime_pm_acquire();
> > 0.259 us | }
> > __i915_active_wait() {
> > 0.085 us | i915_active_acquire_if_busy();
> > 0.240 us | }
> > __i915_vma_evict() {
> > ggtt_unbind_vma() {
> > gen8_ggtt_clear_range() {
> > 10507.255 us | }
> > 10507.689 us | }
> > 10508.516 us | }
> >
> > Cc: Tvrtko Ursulin <tvrtko.ursulin at linux.intel.com>
> > Signed-off-by: Vivek Kasireddy <vivek.kasireddy at intel.com>
> > ---
> > drivers/gpu/drm/i915/i915_gem.c | 8 +++++++-
> > 1 file changed, 7 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 9747924cc57b..7307c5de1c58 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -939,8 +939,14 @@ i915_gem_object_ggtt_pin_ww(struct drm_i915_gem_object
> *obj,
> > if (i915_vma_is_pinned(vma) || i915_vma_is_active(vma))
> > return ERR_PTR(-ENOSPC);
> >
> > + /*
> > + * If this misplaced vma is too big (i.e, at-least
> > + * half the size of aperture) or just unmappable,
> > + * we would not be able to pin with PIN_MAPPABLE.
> > + */
>
> I would be tempted to describe the ping-pong issue in the comment. In
> short would do it, but just because git blame on a line of code tends to
> fail to lead to the correct commit message after a while.
>
> So maybe just say along the lines of "If the misplaced vma is too big
> ... or hasn't been pinned mappable before, we ignore the misplacement
> when PIN_NONBLOCK is set in order to avoid ping-pong of double (or more)
> -buffered frame buffers into the aperture and out on every vblank."
[Kasireddy, Vivek] Ok, will include more comments in v2.
>
> > if (flags & PIN_MAPPABLE &&
> > - vma->fence_size > ggtt->mappable_end / 2)
> > + (vma->fence_size > ggtt->mappable_end / 2 ||
> > + !i915_vma_is_map_and_fenceable(vma)))
> > return ERR_PTR(-ENOSPC);
> > }
> >
>
> With the expanded comment it looks good to me.
>
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
[Kasireddy, Vivek] Thank you for the review.
Thanks,
Vivek
>
> + Daniel if he wants to double check it.
>
> Regards,
>
> Tvrtko
More information about the Intel-gfx
mailing list