[Intel-gfx] [PATCH 3/3] drm/i915/execlists: Read the context-status HEAD from the HWSP

Thu Jul 13 00:38:18 UTC 2017

On 7/12/2017 4:41 PM, Daniele Ceraolo Spurio wrote:
> 
> 
> On 12/07/17 15:58, Chris Wilson wrote:
>> The engine provides also provides mirror of the CSB write pointer in the
>> HWSP, but not of our read pointer. To take advantage of this we need to
>> remember where we read up to on the last interrupt and continue off from
>> there. This poses a problem following a reset, as we don't know where
>> the hw will start writing from, and due to the use of power contexts we
>> cannot perform that query during the reset itself. So we continue the
>> current modus operandi of delaying the first read of the context-status
>> read/write pointers until after the first interrupt. With this we should
>> now have eliminated all uncached mmio reads in handling the
>> context-status interrupt, though we still have the uncached mmio writes
>> for submitting new work, and many uncached mmio reads in the global
>> interrupt handler itself. Still a step in the right direction towards
>> reducing our resubmit latency, although it appears lost in the noise!
>>
>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
>> Cc: Michel Thierry <michel.thierry at intel.com>
>> Cc: Tvrtko Ursulin <tvrtko.ursulin at intel.com>
>> Cc: Mika Kuoppala <mika.kuoppala at intel.com>
>> ---
>>    drivers/gpu/drm/i915/intel_lrc.c        | 20 +++++++++++++++-----
>>    drivers/gpu/drm/i915/intel_ringbuffer.h |  1 +
>>    2 files changed, 16 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>> index e413465a552b..db750abb905e 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> @@ -562,9 +562,15 @@ static void intel_lrc_irq_handler(unsigned long data)
>>                 * is set and we do a new loop.
>>                 */
>>                __clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
>> -             head = readl(csb_mmio);
>> -             tail = GEN8_CSB_WRITE_PTR(head);
>> -             head = GEN8_CSB_READ_PTR(head);
>> +             if (unlikely(engine->csb_head == -1)) { /* following a reset */
>> +                     head = readl(csb_mmio);
>> +                     tail = GEN8_CSB_WRITE_PTR(head);
>> +                     head = GEN8_CSB_READ_PTR(head);
>> +                     engine->csb_head = head;
>> +             } else {
>> +                     head = engine->csb_head;
>> +                     tail = buf[0xf];
> 
> In CNL the tail moves to offset 0x2f of the HWSP (i.e. buf[0x1f]), might
>    be worth considering it immediately since CNL support is being merged.
> 
> -Daniele
> 

Also right now it is implicitly doing csb_head_dword - csb_start_dword 
(0x1f - 0x10). I would vote to have them as defines.

And my only other comment, wouldn't the same changes should apply to the 
i915_engine_info debugfs? For example: https://hastebin.com/tahurezeri

-Michel

>> +             }
>>                while (head != tail) {
>>                        struct drm_i915_gem_request *rq;
>>                        unsigned int status;
>> @@ -618,8 +624,11 @@ static void intel_lrc_irq_handler(unsigned long data)
>>                                   !(status & GEN8_CTX_STATUS_ACTIVE_IDLE));
>>                }
>>
>> -             writel(_MASKED_FIELD(GEN8_CSB_READ_PTR_MASK, head << 8),
>> -                    csb_mmio);
>> +             if (head != engine->csb_head) {
>> +                     engine->csb_head = head;
>> +                     writel(_MASKED_FIELD(GEN8_CSB_READ_PTR_MASK, head << 8),
>> +                            csb_mmio);
>> +             }
>>        }
>>
>>        if (execlists_elsp_ready(engine))
>> @@ -1246,6 +1255,7 @@ static int gen8_init_common_ring(struct intel_engine_cs *engine)
>>
>>        /* After a GPU reset, we may have requests to replay */
>>        clear_bit(ENGINE_IRQ_EXECLIST, &engine->irq_posted);
>> +     engine->csb_head = -1;
>>
>>        submit = false;
>>        for (n = 0; n < ARRAY_SIZE(engine->execlist_port); n++) {
>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
>> index d33c93444c0d..56751413e40c 100644
>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
>> @@ -391,6 +391,7 @@ struct intel_engine_cs {
>>        struct rb_root execlist_queue;
>>        struct rb_node *execlist_first;
>>        unsigned int fw_domains;
>> +     unsigned int csb_head;
>>
>>        /* Contexts are pinned whilst they are active on the GPU. The last
>>         * context executed remains active whilst the GPU is idle - the
>>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>