[Intel-gfx] [PATCH 06/20] drm/i915: Handle log buffer flush interrupt event from GuC

Fri Aug 12 16:17:54 UTC 2016


On 8/12/2016 7:37 PM, Tvrtko Ursulin wrote:
>
> On 12/08/16 14:45, Goel, Akash wrote:
>>
>>
>> On 8/12/2016 6:47 PM, Tvrtko Ursulin wrote:
>>>
>>> On 12/08/16 07:25, akash.goel at intel.com wrote:
>>>> From: Sagar Arun Kamble <sagar.a.kamble at intel.com>
>>>>
>>>> GuC ukernel sends an interrupt to Host to flush the log buffer
>>>> and expects Host to correspondingly update the read pointer
>>>> information in the state structure, once it has consumed the
>>>> log buffer contents by copying them to a file or buffer.
>>>> Even if Host couldn't copy the contents, it can still update the
>>>> read pointer so that logging state is not disturbed on GuC side.
>>>>
>>>> v2:
>>>> - Use a dedicated workqueue for handling flush interrupt. (Tvrtko)
>>>> - Reduce the overall log buffer copying time by skipping the copy of
>>>>    crash buffer area for regular cases and copying only the state
>>>>    structure data in first page.
>>>>
>>>> v3:
>>>>   - Create a vmalloc mapping of log buffer. (Chris)
>>>>   - Cover the flush acknowledgment under rpm get & put.(Chris)
>>>>   - Revert the change of skipping the copy of crash dump area, as
>>>>     not really needed, will be covered by subsequent patch.
>>>>
>>>> v4:
>>>>   - Destroy the wq under the same condition in which it was created,
>>>>     pass dev_piv pointer instead of dev to newly added GuC function,
>>>>     add more comments & rename variable for clarity. (Tvrtko)
>>>>
>>>> Signed-off-by: Sagar Arun Kamble <sagar.a.kamble at intel.com>
>>>> Signed-off-by: Akash Goel <akash.goel at intel.com>
>>>> ---
>>>>   drivers/gpu/drm/i915/i915_drv.c            |  14 +++
>>>>   drivers/gpu/drm/i915/i915_guc_submission.c | 150
>>>> +++++++++++++++++++++++++++++
>>>>   drivers/gpu/drm/i915/i915_irq.c            |   5 +-
>>>>   drivers/gpu/drm/i915/intel_guc.h           |   3 +
>>>>   4 files changed, 170 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/i915_drv.c
>>>> b/drivers/gpu/drm/i915/i915_drv.c
>>>> index 0fcd1c0..fc2da32 100644
>>>> --- a/drivers/gpu/drm/i915/i915_drv.c
>>>> +++ b/drivers/gpu/drm/i915/i915_drv.c
>>>> @@ -770,8 +770,20 @@ static int i915_workqueues_init(struct
>>>> drm_i915_private *dev_priv)
>>>>       if (dev_priv->hotplug.dp_wq == NULL)
>>>>           goto out_free_wq;
>>>>
>>>> +    if (HAS_GUC_SCHED(dev_priv)) {
>>>
>>> This just reminded me that a previous patch had:
>>>
>>> +    if (HAS_GUC_UCODE(dev))
>>> +        dev_priv->pm_guc_events = GEN9_GUC_TO_HOST_INT_EVENT;
>>>
>>> In the interrupt setup. I don't think there is a bug right now, but
>>> there is a disagreement between the two which would be good to resolve.
>>>
>>> This HAS_GUC_UCODE in the other patch should probably be HAS_GUC_SCHED
>>> for correctness. I think.
>>
>> Sorry for inconsistency, Will use HAS_GUC_SCHED in the previous patch.
>>
>> As per Chris's comments will move the wq init/destroy to the GuC logging
>> setup/teardown routines (guc_create_log_extras, guc_log_cleanup)
>> You are fine with that ?.
>
> Yes thats OK I think.
>
>>>
>>>> +        /* Need a dedicated wq to process log buffer flush interrupts
>>>> +         * from GuC without much delay so as to avoid any loss of
>>>> logs.
>>>> +         */
>>>> +        dev_priv->guc.log.wq =
>>>> +            alloc_ordered_workqueue("i915-guc_log", 0);
>>>> +        if (dev_priv->guc.log.wq == NULL)
>>>> +            goto out_free_hotplug_dp_wq;
>>>> +    }
>>>> +
>>>>       return 0;
>>>>
>>>> +out_free_hotplug_dp_wq:
>>>> +    destroy_workqueue(dev_priv->hotplug.dp_wq);
>>>>   out_free_wq:
>>>>       destroy_workqueue(dev_priv->wq);
>>>>   out_err:
>>>> @@ -782,6 +794,8 @@ out_err:
>>>>
>>>>   static void i915_workqueues_cleanup(struct drm_i915_private
>>>> *dev_priv)
>>>>   {
>>>> +    if (HAS_GUC_SCHED(dev_priv))
>>>> +        destroy_workqueue(dev_priv->guc.log.wq);
>>>>       destroy_workqueue(dev_priv->hotplug.dp_wq);
>>>>       destroy_workqueue(dev_priv->wq);
>>>>   }
>>>> diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c
>>>> b/drivers/gpu/drm/i915/i915_guc_submission.c
>>>> index c7c679f..2635b67 100644
>>>> --- a/drivers/gpu/drm/i915/i915_guc_submission.c
>>>> +++ b/drivers/gpu/drm/i915/i915_guc_submission.c
>>>> @@ -172,6 +172,15 @@ static int host2guc_sample_forcewake(struct
>>>> intel_guc *guc,
>>>>       return host2guc_action(guc, data, ARRAY_SIZE(data));
>>>>   }
>>>>
>>>> +static int host2guc_logbuffer_flush_complete(struct intel_guc *guc)
>>>> +{
>>>> +    u32 data[1];
>>>> +
>>>> +    data[0] = HOST2GUC_ACTION_LOG_BUFFER_FILE_FLUSH_COMPLETE;
>>>> +
>>>> +    return host2guc_action(guc, data, 1);
>>>> +}
>>>> +
>>>>   /*
>>>>    * Initialise, update, or clear doorbell data shared with the GuC
>>>>    *
>>>> @@ -840,6 +849,127 @@ err:
>>>>       return NULL;
>>>>   }
>>>>
>>>> +static void guc_move_to_next_buf(struct intel_guc *guc)
>>>> +{
>>>> +    return;
>>>> +}
>>>> +
>>>> +static void* guc_get_write_buffer(struct intel_guc *guc)
>>>> +{
>>>> +    return NULL;
>>>> +}
>>>> +
>>>> +static void guc_read_update_log_buffer(struct intel_guc *guc)
>>>> +{
>>>> +    struct guc_log_buffer_state *log_buffer_state,
>>>> *log_buffer_snapshot_state;
>>>> +    struct guc_log_buffer_state log_buffer_state_local;
>>>> +    void *src_data_ptr, *dst_data_ptr;
>>>> +    u32 i, buffer_size;
>>>
>>> unsigned int i if you can be bothered.
>>
>> Fine will do that for both i & buffer_size.
>
> buffer_size can match the type of log_buffer_state_local.size or use
> something else if more appropriate.
>
>> But I remember earlier in one of the patch, you suggested to use u32 as
>> a type for some variables.
>> Please could you share the guideline.
>> Should u32, u64 be used we are exactly sure of the range of the
>> variable, like for variables containing the register values ?
>
> Depends what the variable is. "i" in this case is just a standard
> counter so not appropriate to use u32. It is not that wrong on x86, just
> looks a bit out of place.
>
> grep "u32 i;" has no hits in i915. :)
>
> They can/should be used for variables tied with hardware or protocols.
> Or in some cases where you really want to save space by using a small type.
>
Thanks will be mindful of this.

>>
>>>
>>>> +
>>>> +    if (!guc->log.buf_addr)
>>>> +        return;
>>>
>>> Can it hit this? If yes, I think better disable GuC logging when pin map
>>> on the object fails rather than let it generate interrupts in vain.
>>>
Ideally it should not, as logging itself is disabled (interrupts also 
disabled) on log buffer allocation failure.

>>>> +
>>>> +    /* Get the pointer to shared GuC log buffer */
>>>> +    log_buffer_state = src_data_ptr = guc->log.buf_addr;
>>>> +
>>>> +    /* Get the pointer to local buffer to store the logs */
>>>> +    dst_data_ptr = log_buffer_snapshot_state =
>>>> guc_get_write_buffer(guc);
>>>> +
>>>> +    /* Actual logs are present from the 2nd page */
>>>> +    src_data_ptr += PAGE_SIZE;
>>>> +    dst_data_ptr += PAGE_SIZE;
>>>> +
>>>> +    for (i = 0; i < GUC_MAX_LOG_BUFFER; i++) {
>>>> +        /* Make a copy of the state structure in GuC log buffer (which
>>>> +         * is uncached mapped) on the stack to avoid reading from it
>>>> +         * multiple times.
>>>> +         */
>>>> +        memcpy(&log_buffer_state_local, log_buffer_state,
>>>> +                sizeof(struct guc_log_buffer_state));
>>>> +        buffer_size = log_buffer_state_local.size;
>>>
>>> Needs checking (and logging) that a bad firmware or some other error
>>> does not report a dangerous size (too big) which would then overwrite
>>> memory pointed by dst_data_ptr memory. (And/or read from random memory.)
>>>
>> Have done the range checking for the read/write offset values, which are
>> updated repeatedly by GuC firmware.
>> The buffer size is updated only at init time by GuC firmware, hence less
>> vulnerable.
>>
>> But nevertheless will add the checks.
>
> Ok good.
>
>>>> +
>>>> +        if (log_buffer_snapshot_state) {
>>>> +            /* First copy the state structure in local buffer */
>>>
>>> Maybe "destination buffer" would be clearer?
>>
>> Sorry my bad, well spotted.
>>
>>>
>>>> +            memcpy(log_buffer_snapshot_state, &log_buffer_state_local,
>>>> +                    sizeof(struct guc_log_buffer_state));
>>>> +
>>>> +            /* The write pointer could have been updated by the GuC
>>>> +             * firmware, after sending the flush interrupt to Host,
>>>> +             * for consistency set the write pointer value to same
>>>> +             * value of sampled_write_ptr in the snapshot buffer.
>>>> +             */
>>>> +            log_buffer_snapshot_state->write_ptr =
>>>> +                log_buffer_snapshot_state->sampled_write_ptr;
>>>> +
>>>> +            log_buffer_snapshot_state++;
>>>> +
>>>> +            /* Now copy the actual logs */
>>>> +            memcpy(dst_data_ptr, src_data_ptr, buffer_size);
>>>> +
>>>> +            src_data_ptr += buffer_size;
>>>> +            dst_data_ptr += buffer_size;
>>>> +        }
>>>> +
>>>> +        /* FIXME: invalidate/flush for log buffer needed */
>>>
>>> Yes no maybe? :)
>>
>> Will like to keep it, if you allow.
>
> I think you need a really good justification to get r-b on patches with
> FIXMEs, especially like this one. Do you maybe handle it or remove it in
> a following patch or something?
>
Have removed this FIXME in the later patch
"[PATCH 17/20] drm/i915: Use uncached(WC) mapping for acessing the GuC 
log buffer"

Best regards
Akash

> Regards,
>
> Tvrtko