[Intel-gfx] [PATCH v2 4/5] drm/i915/guc: rework guc_add_workqueue_item()
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Fri May 6 08:55:49 UTC 2016
On 05/05/16 19:38, Dave Gordon wrote:
> On 29/04/2016 16:44, Tvrtko Ursulin wrote:
>>
>> On 27/04/16 19:03, Dave Gordon wrote:
>>> Mostly little optimisations; for instance, if the driver is correctly
>>> following the submission protocol, the "out of space" condition is
>>> impossible, so the previous runtime WARN_ON() is promoted to a
>>> GEM_BUG_ON() for a more dramatic effect in development and less impact
>>> in end-user systems.
>>>
>>> Similarly we can replace other WARN_ON() conditions that don't relate to
>>> the hardware state with either BUILD_BUG_ON() for compile-time-
>>> detectable issues, or GEM_BUG_ON() for logical "can't happen" errors.
>>>
>>> With those changes, we can convert it to void, as suggested by Chris
>>> Wilson, and update the calling code appropriately.
>>>
>>> Signed-off-by: Dave Gordon <david.s.gordon at intel.com>
>>> Cc: Chris Wilson <chris at chris-wilson.co.uk>
>>>
>>> ---
>>> drivers/gpu/drm/i915/i915_guc_submission.c | 69
>>> +++++++++++++++---------------
>>> drivers/gpu/drm/i915/intel_guc.h | 2 +-
>>> drivers/gpu/drm/i915/intel_guc_fwif.h | 3 +-
>>> 3 files changed, 37 insertions(+), 37 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c
>>> b/drivers/gpu/drm/i915/i915_guc_submission.c
>>> index 6626eff..4d2ea84 100644
>>> --- a/drivers/gpu/drm/i915/i915_guc_submission.c
>>> +++ b/drivers/gpu/drm/i915/i915_guc_submission.c
>>> @@ -470,23 +470,28 @@ int i915_guc_wq_check_space(struct
>>> drm_i915_gem_request *request)
>>> return -EAGAIN;
>>> }
>>>
>>> -static int guc_add_workqueue_item(struct i915_guc_client *gc,
>>> - struct drm_i915_gem_request *rq)
>>> +static void guc_add_workqueue_item(struct i915_guc_client *gc,
>>> + struct drm_i915_gem_request *rq)
>>> {
>>> + /* wqi_len is in DWords, and does not include the one-word
>>> header */
>>> + const size_t wqi_size = sizeof(struct guc_wq_item);
>>
>> Again, u32 is correct I think.
> Nope, it's a sizeof(), but the compiler will check that it fits in u32
> when we convert it to DWords below.
I already wrote in this same thread few days ago this was my oversight.
>>
>>> + const u32 wqi_len = wqi_size/sizeof(u32) - 1;
>>> struct guc_process_desc *desc;
>>> struct guc_wq_item *wqi;
>>> void *base;
>>> - u32 tail, wq_len, wq_off, space;
>>> + u32 space, tail, wq_off, wq_page;
>>>
>>> desc = gc->client_base + gc->proc_desc_offset;
>>> +
>>> + /* Space was checked earlier, in i915_guc_wq_check_space() above */
>>
>> It may be above in the file, but the two do not call one another so I
>> recommend saying exactly who called it.
> I don't really mind who called it, as long as it was called sometime
> earlier in the request submission protocol -- the callsite may get moved
> around a bit. Use cscope(1) or your favourite IDE to find it.
So what does this comment supposed to tell the reader?
guc_add_workqueue_item does not call i915_guc_wq_check_space so "above"
what? Above in the file? Or maybe *earlier* in the call sequence of
what? Might as well put an useful comment in when you are adding one.
>>
>>> space = CIRC_SPACE(gc->wq_tail, desc->head, gc->wq_size);
>>> - if (WARN_ON(space < sizeof(struct guc_wq_item)))
>>> - return -ENOSPC; /* shouldn't happen */
>>> + GEM_BUG_ON(space < wqi_size);
>>
>> It is impossible to hit this only because of the struct_mutex guarding
>> the whole time window from request creation to submission. If in the
>> future, near or far, that gets fixed, then this will need reworking.
>>
>> I don't have any better ideas though.
>>
>> But a WARN_ON and return would be almost as good. Everything is better
>> than a dead machine one can't ssh into...
>>
>> So I appeal to make this a WARN_ON and return. Nothing bad would
>> happen apart from software thinking GPU has hung.
> If the driver violates the sequencing of check+submit then it does
> indeed require reworking -- that would be a bug. Hence (GEM_)BUG_ON().
This also was clarified in this same thread few days ago and I conceded
the point.
>>
>>>
>>> - /* postincrement WQ tail for next time */
>>> - wq_off = gc->wq_tail;
>>> - gc->wq_tail += sizeof(struct guc_wq_item);
>>> - gc->wq_tail &= gc->wq_size - 1;
>>> + /* The GuC firmware wants the tail index in QWords, not bytes */
>>> + tail = rq->tail;
>>
>> Used to be sampled from rq->ringbuf->tail - are those the same?
> It should always have been rq->tail; they're the same at present because
> it was copied from ringbuffer as part of the submission process, but it
> might not be the same with the scheduler & preemption. So let's get it
> right before it becomes a mysterious bug.
>>
>>> + GEM_BUG_ON(tail & 7);
>>> + tail >>= 3;
>>> + GEM_BUG_ON(tail > WQ_RING_TAIL_MAX);
>>>
>>> /* For now workqueue item is 4 DWs; workqueue buffer is 2
>>> pages. So we
>>> * should not have the case where structure wqi is across page,
>>> neither
>>> @@ -495,19 +500,23 @@ static int guc_add_workqueue_item(struct
>>> i915_guc_client *gc,
>>> * XXX: if not the case, we need save data to a temp wqi and
>>> copy it to
>>> * workqueue buffer dw by dw.
>>> */
>>> - WARN_ON(sizeof(struct guc_wq_item) != 16);
>>> - WARN_ON(wq_off & 3);
>>> + BUILD_BUG_ON(wqi_size != 16);
>>>
>>> - /* wq starts from the page after doorbell / process_desc */
>>> - base = kmap_atomic(i915_gem_object_get_page(gc->client_obj,
>>> - (wq_off + GUC_DB_SIZE) >> PAGE_SHIFT));
>>> + /* postincrement WQ tail for next time */
>>> + wq_off = gc->wq_tail;
>>> + gc->wq_tail += wqi_size;
>>> + gc->wq_tail &= gc->wq_size - 1;
>>> + GEM_BUG_ON(wq_off & (wqi_size - 1));
>>
>> Use to be wq_off & 3, now is wq_off & 15, which one is correct?
> The new one :) I made it more stringent, so we check for correct
> alignment of wq_tail to sizeof(queue item), not just DWord.
Worth mentioning in the commit message? Since it begins with "Mostly
little optimisations;" readers/reviewers expect to see no functional
changes then.
>>
>>> +
>>> + /* WQ starts from the page after doorbell / process_desc */
>>> + wq_page = (wq_off + GUC_DB_SIZE) >> PAGE_SHIFT;
>>> wq_off &= PAGE_SIZE - 1;
>>> + base = kmap_atomic(i915_gem_object_get_page(gc->client_obj,
>>> wq_page));
>>> wqi = (struct guc_wq_item *)((char *)base + wq_off);
>>>
>>> - /* len does not include the header */
>>> - wq_len = sizeof(struct guc_wq_item) / sizeof(u32) - 1;
>>> + /* Now fill in the 4-word work queue item */
>>> wqi->header = WQ_TYPE_INORDER |
>>> - (wq_len << WQ_LEN_SHIFT) |
>>> + (wqi_len << WQ_LEN_SHIFT) |
>>> (rq->engine->guc_id << WQ_TARGET_SHIFT) |
>>> WQ_NO_WCFLUSH_WAIT;
>>>
>>> @@ -515,14 +524,10 @@ static int guc_add_workqueue_item(struct
>>> i915_guc_client *gc,
>>> wqi->context_desc = (u32)intel_lr_context_descriptor(rq->ctx,
>>> rq->engine);
>>>
>>> - /* The GuC firmware wants the tail index in QWords, not bytes */
>>> - tail = rq->ringbuf->tail >> 3;
>>> wqi->ring_tail = tail << WQ_RING_TAIL_SHIFT;
>>> - wqi->fence_id = 0; /*XXX: what fence to be here */
>>> + wqi->fence_id = rq->seqno;
>>
>> Not mentioned in the commit?
> Not actually used either, but I expect it was provided for a Windows
> fence ID, so our equivalent is a seqno.
> It may show up in a GuC log, in which case seqno provides more helpful
> information.
Best to mention it in the commit imho, again, not to confuse
readers/reviewers.
>>
>>>
>>> kunmap_atomic(base);
>>> -
>>> - return 0;
>>> }
>>>
>>> /**
>>> @@ -537,26 +542,20 @@ int i915_guc_submit(struct drm_i915_gem_request
>>> *rq)
>>> unsigned int engine_id = rq->engine->guc_id;
>>> struct intel_guc *guc = &rq->i915->guc;
>>> struct i915_guc_client *client = guc->execbuf_client;
>>> - int q_ret, b_ret;
>>> + int b_ret;
>>
>> What is b_ret out of interest, door*B*ell return code ?
>>
> Yes, originally q_ret & b_ret, for return from Queuing and Bellringing,
> respectively. Queuing can no longer return an error, but ringing the
> doorbell can, even if only in the event of hardware not behaving as
> specified.
>>>
>>> - q_ret = guc_add_workqueue_item(client, rq);
>>> - if (q_ret == 0)
>>> - b_ret = guc_ring_doorbell(client);
>>> + guc_add_workqueue_item(client, rq);
>>> + b_ret = guc_ring_doorbell(client);
>>>
>>> client->submissions[engine_id] += 1;
>>> - if (q_ret) {
>>> - client->q_fail += 1;
>>> - client->retcode = q_ret;
>>> - } else if (b_ret) {
>>> + client->retcode = b_ret;
>>
>> I wanted to ask for what is this for but then found it is for debugfs.
>>
>>> + if (b_ret)
>>> client->b_fail += 1;
>>> - client->retcode = q_ret = b_ret;
>>> - } else {
>>> - client->retcode = 0;
>>> - }
>>> +
>>> guc->submissions[engine_id] += 1;
>>> guc->last_seqno[engine_id] = rq->seqno;
>>>
>>> - return q_ret;
>>> + return b_ret;
>>
>> Could also kill the return value from this one, the only caller,
>> intel_logical_ring_advance_and_submit does not use it. That function
>> itself does not look like it needs a return value at the moment. :)
> That's really only because we don't have an exit strategy for submission
> failure. I would rather see all parts of the submission process able to
> return errors (e.g. hardware not responding) and let the top-level code
> implement some sort of recovery strategy. Anyway, I'm leaving this here
> until someone changes the call signature to void.
>>
>>> }
>>>
>>> /*
>>> diff --git a/drivers/gpu/drm/i915/intel_guc.h
>>> b/drivers/gpu/drm/i915/intel_guc.h
>>> index 436f2d6..10e1d5e 100644
>>> --- a/drivers/gpu/drm/i915/intel_guc.h
>>> +++ b/drivers/gpu/drm/i915/intel_guc.h
>>> @@ -74,7 +74,7 @@ struct i915_guc_client {
>>> /* GuC submission statistics & status */
>>> uint64_t submissions[GUC_MAX_ENGINES_NUM];
>>> uint32_t no_wq_space; /* Space pre-check failed */
>>> - uint32_t q_fail; /* Failed to queue (MBZ) */
>>> + uint32_t q_fail; /* No longer used */
>>
>> Get rid of it then?
> No, it keeps things aligned. It can be reused for the next thing we need
> to add.
AFAIR u32s need to be 4-byte aligned on x86_64 so I don't get it but
whatever.
Regards,
Tvrtko
More information about the Intel-gfx
mailing list