[Intel-gfx] [PATCH 07/24] drm/i915: Rely on spinlock protection for GPU error capture

Tue Jul 16 11:17:59 UTC 2019

Quoting Tvrtko Ursulin (2019-07-16 11:48:28)
> 
> On 15/07/2019 09:09, Chris Wilson wrote:
> > +/* single threaded page allocator with a reserved stash for emergencies */
> > +struct pool {
> > +     void *stash[31];
> 
> Why 31?

Random power-of-two. I considered just using struct pagevec.
> 
> > +     unsigned int count;
> > +};

> >   static bool compress_init(struct compress *c)
> >   {
> > -     struct z_stream_s *zstream = memset(&c->zstream, 0, sizeof(c->zstream));
> > +     struct z_stream_s *zstream = &c->zstream;
> >   
> > -     zstream->workspace =
> > -             kmalloc(zlib_deflate_workspacesize(MAX_WBITS, MAX_MEM_LEVEL),
> > -                     GFP_ATOMIC | __GFP_NOWARN);
> > -     if (!zstream->workspace)
> > +     if (pool_init(&c->pool, ALLOW_FAIL))
> >               return false;
> >   
> > -     if (zlib_deflateInit(zstream, Z_DEFAULT_COMPRESSION) != Z_OK) {
> > -             kfree(zstream->workspace);
> > +     zstream->workspace =
> > +             kmalloc(zlib_deflate_workspacesize(MAX_WBITS, MAX_MEM_LEVEL),
> > +                     ALLOW_FAIL);
> > +     if (!zstream->workspace) {
> > +             pool_fini(&c->pool);
> >               return false;
> >       }
> >   
> >       c->tmp = NULL;
> >       if (i915_has_memcpy_from_wc())
> > -             c->tmp = (void *)__get_free_page(GFP_ATOMIC | __GFP_NOWARN);
> > +             c->tmp = pool_alloc(&c->pool, ALLOW_FAIL);
> 
> So 31 stashed pages will be enough to not need atomic allocations, or I 
> missed a point?

No. It's just a backup in case we run out of atomic allocations. It's
modeled on mempool, except that mempool doesn't allow you to refill in
between atomic sections and instead relies on those being freed. Since
we continually allocate and keep the pages essentially forever, mempool
doesn't help.

> What determines 31 is enough?

It's just a backup, so we have a bit more leeway than current.
Currently, we have no non-atomic phase in which to wait for allocations,
so we should fare much better and have fewer blank error states.

Most compressed error states are less than 128k, so I plucked that as
being sufficiently large enough to capture a single compressed buffer in
case we ran out of atomic.

> > -static void print_error_buffers(struct drm_i915_error_state_buf *m,
> > -                             const char *name,
> > -                             struct drm_i915_error_buffer *err,
> > -                             int count)
> > -{
> > -     err_printf(m, "%s [%d]:\n", name, count);
> > -
> > -     while (count--) {
> > -             err_printf(m, "    %08x_%08x %8u %02x %02x",
> > -                        upper_32_bits(err->gtt_offset),
> > -                        lower_32_bits(err->gtt_offset),
> > -                        err->size,
> > -                        err->read_domains,
> > -                        err->write_domain);
> > -             err_puts(m, tiling_flag(err->tiling));
> > -             err_puts(m, dirty_flag(err->dirty));
> > -             err_puts(m, purgeable_flag(err->purgeable));
> > -             err_puts(m, err->userptr ? " userptr" : "");
> > -             err_puts(m, i915_cache_level_str(m->i915, err->cache_level));
> > -
> > -             if (err->name)
> > -                     err_printf(m, " (name: %d)", err->name);
> > -             if (err->fence_reg != I915_FENCE_REG_NONE)
> > -                     err_printf(m, " (fence: %d)", err->fence_reg);
> > -
> > -             err_puts(m, "\n");
> > -             err++;
> > -     }
> > -}
> 
> So no active and pinned bos any more.
> 
> Ca you put in the commit message what data is gone and what remains? And 
> some notes on how does that affect the usefulness of error state.

My purpose for including them was for cross-checking relocations in the
batch. It's been a long time since I've had to do that, and now mesa
likes to tag everything important with EXEC_CAPTURE so the buffers are
included in their entirety.
-Chris