[Intel-gfx] [PATCH v2 4/5] drm/i915/guc: rework guc_add_workqueue_item()

Fri May 6 08:55:49 UTC 2016

On 05/05/16 19:38, Dave Gordon wrote:
> On 29/04/2016 16:44, Tvrtko Ursulin wrote:
>>
>> On 27/04/16 19:03, Dave Gordon wrote:
>>> Mostly little optimisations; for instance, if the driver is correctly
>>> following the submission protocol, the "out of space" condition is
>>> impossible, so the previous runtime WARN_ON() is promoted to a
>>> GEM_BUG_ON() for a more dramatic effect in development and less impact
>>> in end-user systems.
>>>
>>> Similarly we can replace other WARN_ON() conditions that don't relate to
>>> the hardware state with either BUILD_BUG_ON() for compile-time-
>>> detectable issues, or GEM_BUG_ON() for logical "can't happen" errors.
>>>
>>> With those changes, we can convert it to void, as suggested by Chris
>>> Wilson, and update the calling code appropriately.
>>>
>>> Signed-off-by: Dave Gordon <david.s.gordon at intel.com>
>>> Cc: Chris Wilson <chris at chris-wilson.co.uk>
>>>
>>> ---
>>>   drivers/gpu/drm/i915/i915_guc_submission.c | 69
>>> +++++++++++++++---------------
>>>   drivers/gpu/drm/i915/intel_guc.h           |  2 +-
>>>   drivers/gpu/drm/i915/intel_guc_fwif.h      |  3 +-
>>>   3 files changed, 37 insertions(+), 37 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c
>>> b/drivers/gpu/drm/i915/i915_guc_submission.c
>>> index 6626eff..4d2ea84 100644
>>> --- a/drivers/gpu/drm/i915/i915_guc_submission.c
>>> +++ b/drivers/gpu/drm/i915/i915_guc_submission.c
>>> @@ -470,23 +470,28 @@ int i915_guc_wq_check_space(struct
>>> drm_i915_gem_request *request)
>>>       return -EAGAIN;
>>>   }
>>>
>>> -static int guc_add_workqueue_item(struct i915_guc_client *gc,
>>> -                  struct drm_i915_gem_request *rq)
>>> +static void guc_add_workqueue_item(struct i915_guc_client *gc,
>>> +                   struct drm_i915_gem_request *rq)
>>>   {
>>> +    /* wqi_len is in DWords, and does not include the one-word
>>> header */
>>> +    const size_t wqi_size = sizeof(struct guc_wq_item);
>>
>> Again, u32 is correct I think.
> Nope, it's a sizeof(), but the compiler will check that it fits in u32
> when we convert it to DWords below.

I already wrote in this same thread few days ago this was my oversight.

>>
>>> +    const u32 wqi_len = wqi_size/sizeof(u32) - 1;
>>>       struct guc_process_desc *desc;
>>>       struct guc_wq_item *wqi;
>>>       void *base;
>>> -    u32 tail, wq_len, wq_off, space;
>>> +    u32 space, tail, wq_off, wq_page;
>>>
>>>       desc = gc->client_base + gc->proc_desc_offset;
>>> +
>>> +    /* Space was checked earlier, in i915_guc_wq_check_space() above */
>>
>> It may be above in the file, but the two do not call one another so I
>> recommend saying exactly who called it.
> I don't really mind who called it, as long as it was called sometime
> earlier in the request submission protocol -- the callsite may get moved
> around a bit. Use cscope(1) or your favourite IDE to find it.

So what does this comment supposed to tell the reader? 
guc_add_workqueue_item does not call i915_guc_wq_check_space so "above" 
what? Above in the file? Or maybe *earlier* in the call sequence of 
what? Might as well put an useful comment in when you are adding one.

>>
>>>       space = CIRC_SPACE(gc->wq_tail, desc->head, gc->wq_size);
>>> -    if (WARN_ON(space < sizeof(struct guc_wq_item)))
>>> -        return -ENOSPC; /* shouldn't happen */
>>> +    GEM_BUG_ON(space < wqi_size);
>>
>> It is impossible to hit this only because of the struct_mutex guarding
>> the whole time window from request creation to submission. If in the
>> future, near or far, that gets fixed, then this will need reworking.
>>
>> I don't have any better ideas though.
>>
>> But a WARN_ON and return would be almost as good. Everything is better
>> than a dead machine one can't ssh into...
>>
>> So I appeal to make this a WARN_ON and return. Nothing bad would
>> happen apart from software thinking GPU has hung.
> If the driver violates the sequencing of check+submit then it does
> indeed require reworking -- that would be a bug. Hence (GEM_)BUG_ON().

This also was clarified in this same thread few days ago and I conceded 
the point.

>>
>>>
>>> -    /* postincrement WQ tail for next time */
>>> -    wq_off = gc->wq_tail;
>>> -    gc->wq_tail += sizeof(struct guc_wq_item);
>>> -    gc->wq_tail &= gc->wq_size - 1;
>>> +    /* The GuC firmware wants the tail index in QWords, not bytes */
>>> +    tail = rq->tail;
>>
>> Used to be sampled from rq->ringbuf->tail - are those the same?
> It should always have been rq->tail; they're the same at present because
> it was copied from ringbuffer as part of the submission process, but it
> might not be the same with the scheduler & preemption. So let's get it
> right before it becomes a mysterious bug.
>>
>>> +    GEM_BUG_ON(tail & 7);
>>> +    tail >>= 3;
>>> +    GEM_BUG_ON(tail > WQ_RING_TAIL_MAX);
>>>
>>>       /* For now workqueue item is 4 DWs; workqueue buffer is 2
>>> pages. So we
>>>        * should not have the case where structure wqi is across page,
>>> neither
>>> @@ -495,19 +500,23 @@ static int guc_add_workqueue_item(struct
>>> i915_guc_client *gc,
>>>        * XXX: if not the case, we need save data to a temp wqi and
>>> copy it to
>>>        * workqueue buffer dw by dw.
>>>        */
>>> -    WARN_ON(sizeof(struct guc_wq_item) != 16);
>>> -    WARN_ON(wq_off & 3);
>>> +    BUILD_BUG_ON(wqi_size != 16);
>>>
>>> -    /* wq starts from the page after doorbell / process_desc */
>>> -    base = kmap_atomic(i915_gem_object_get_page(gc->client_obj,
>>> -            (wq_off + GUC_DB_SIZE) >> PAGE_SHIFT));
>>> +    /* postincrement WQ tail for next time */
>>> +    wq_off = gc->wq_tail;
>>> +    gc->wq_tail += wqi_size;
>>> +    gc->wq_tail &= gc->wq_size - 1;
>>> +    GEM_BUG_ON(wq_off & (wqi_size - 1));
>>
>> Use to be wq_off & 3, now is wq_off & 15, which one is correct?
> The new one :) I made it more stringent, so we check for correct
> alignment of wq_tail to sizeof(queue item), not just DWord.

Worth mentioning in the commit message? Since it begins with "Mostly 
little optimisations;" readers/reviewers expect to see no functional 
changes then.

>>
>>> +
>>> +    /* WQ starts from the page after doorbell / process_desc */
>>> +    wq_page = (wq_off + GUC_DB_SIZE) >> PAGE_SHIFT;
>>>       wq_off &= PAGE_SIZE - 1;
>>> +    base = kmap_atomic(i915_gem_object_get_page(gc->client_obj,
>>> wq_page));
>>>       wqi = (struct guc_wq_item *)((char *)base + wq_off);
>>>
>>> -    /* len does not include the header */
>>> -    wq_len = sizeof(struct guc_wq_item) / sizeof(u32) - 1;
>>> +    /* Now fill in the 4-word work queue item */
>>>       wqi->header = WQ_TYPE_INORDER |
>>> -            (wq_len << WQ_LEN_SHIFT) |
>>> +            (wqi_len << WQ_LEN_SHIFT) |
>>>               (rq->engine->guc_id << WQ_TARGET_SHIFT) |
>>>               WQ_NO_WCFLUSH_WAIT;
>>>
>>> @@ -515,14 +524,10 @@ static int guc_add_workqueue_item(struct
>>> i915_guc_client *gc,
>>>       wqi->context_desc = (u32)intel_lr_context_descriptor(rq->ctx,
>>>                                    rq->engine);
>>>
>>> -    /* The GuC firmware wants the tail index in QWords, not bytes */
>>> -    tail = rq->ringbuf->tail >> 3;
>>>       wqi->ring_tail = tail << WQ_RING_TAIL_SHIFT;
>>> -    wqi->fence_id = 0; /*XXX: what fence to be here */
>>> +    wqi->fence_id = rq->seqno;
>>
>> Not mentioned in the commit?
> Not actually used either, but I expect it was provided for a Windows
> fence ID, so our equivalent is a seqno.
> It may show up in a GuC log, in which case seqno provides more helpful
> information.

Best to mention it in the commit imho, again, not to confuse 
readers/reviewers.

>>
>>>
>>>       kunmap_atomic(base);
>>> -
>>> -    return 0;
>>>   }
>>>
>>>   /**
>>> @@ -537,26 +542,20 @@ int i915_guc_submit(struct drm_i915_gem_request
>>> *rq)
>>>       unsigned int engine_id = rq->engine->guc_id;
>>>       struct intel_guc *guc = &rq->i915->guc;
>>>       struct i915_guc_client *client = guc->execbuf_client;
>>> -    int q_ret, b_ret;
>>> +    int b_ret;
>>
>> What is b_ret out of interest, door*B*ell return code ?
>>
> Yes, originally q_ret & b_ret, for return from Queuing and Bellringing,
> respectively. Queuing can no longer return an error, but ringing the
> doorbell can, even if only in the event of hardware not behaving as
> specified.
>>>
>>> -    q_ret = guc_add_workqueue_item(client, rq);
>>> -    if (q_ret == 0)
>>> -        b_ret = guc_ring_doorbell(client);
>>> +    guc_add_workqueue_item(client, rq);
>>> +    b_ret = guc_ring_doorbell(client);
>>>
>>>       client->submissions[engine_id] += 1;
>>> -    if (q_ret) {
>>> -        client->q_fail += 1;
>>> -        client->retcode = q_ret;
>>> -    } else if (b_ret) {
>>> +    client->retcode = b_ret;
>>
>> I wanted to ask for what is this for but then found it is for debugfs.
>>
>>> +    if (b_ret)
>>>           client->b_fail += 1;
>>> -        client->retcode = q_ret = b_ret;
>>> -    } else {
>>> -        client->retcode = 0;
>>> -    }
>>> +
>>>       guc->submissions[engine_id] += 1;
>>>       guc->last_seqno[engine_id] = rq->seqno;
>>>
>>> -    return q_ret;
>>> +    return b_ret;
>>
>> Could also kill the return value from this one, the only caller,
>> intel_logical_ring_advance_and_submit does not use it. That function
>> itself does not look like it needs a return value at the moment. :)
> That's really only because we don't have an exit strategy for submission
> failure. I would rather see all parts of the submission process able to
> return errors (e.g. hardware not responding) and let the top-level code
> implement some sort of recovery strategy. Anyway, I'm leaving this here
> until someone changes the call signature to void.
>>
>>>   }
>>>
>>>   /*
>>> diff --git a/drivers/gpu/drm/i915/intel_guc.h
>>> b/drivers/gpu/drm/i915/intel_guc.h
>>> index 436f2d6..10e1d5e 100644
>>> --- a/drivers/gpu/drm/i915/intel_guc.h
>>> +++ b/drivers/gpu/drm/i915/intel_guc.h
>>> @@ -74,7 +74,7 @@ struct i915_guc_client {
>>>       /* GuC submission statistics & status */
>>>       uint64_t submissions[GUC_MAX_ENGINES_NUM];
>>>       uint32_t no_wq_space;        /* Space pre-check failed */
>>> -    uint32_t q_fail;        /* Failed to queue (MBZ)    */
>>> +    uint32_t q_fail;        /* No longer used        */
>>
>> Get rid of it then?
> No, it keeps things aligned. It can be reused for the next thing we need
> to add.

AFAIR u32s need to be 4-byte aligned on x86_64 so I don't get it but 
whatever.

Regards,

Tvrtko