[Intel-gfx] [PATCH 06/20] drm/i915: Handle log buffer flush interrupt event from GuC

Fri Aug 12 14:07:19 UTC 2016

On 12/08/16 14:45, Goel, Akash wrote:
>
>
> On 8/12/2016 6:47 PM, Tvrtko Ursulin wrote:
>>
>> On 12/08/16 07:25, akash.goel at intel.com wrote:
>>> From: Sagar Arun Kamble <sagar.a.kamble at intel.com>
>>>
>>> GuC ukernel sends an interrupt to Host to flush the log buffer
>>> and expects Host to correspondingly update the read pointer
>>> information in the state structure, once it has consumed the
>>> log buffer contents by copying them to a file or buffer.
>>> Even if Host couldn't copy the contents, it can still update the
>>> read pointer so that logging state is not disturbed on GuC side.
>>>
>>> v2:
>>> - Use a dedicated workqueue for handling flush interrupt. (Tvrtko)
>>> - Reduce the overall log buffer copying time by skipping the copy of
>>>    crash buffer area for regular cases and copying only the state
>>>    structure data in first page.
>>>
>>> v3:
>>>   - Create a vmalloc mapping of log buffer. (Chris)
>>>   - Cover the flush acknowledgment under rpm get & put.(Chris)
>>>   - Revert the change of skipping the copy of crash dump area, as
>>>     not really needed, will be covered by subsequent patch.
>>>
>>> v4:
>>>   - Destroy the wq under the same condition in which it was created,
>>>     pass dev_piv pointer instead of dev to newly added GuC function,
>>>     add more comments & rename variable for clarity. (Tvrtko)
>>>
>>> Signed-off-by: Sagar Arun Kamble <sagar.a.kamble at intel.com>
>>> Signed-off-by: Akash Goel <akash.goel at intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/i915_drv.c            |  14 +++
>>>   drivers/gpu/drm/i915/i915_guc_submission.c | 150
>>> +++++++++++++++++++++++++++++
>>>   drivers/gpu/drm/i915/i915_irq.c            |   5 +-
>>>   drivers/gpu/drm/i915/intel_guc.h           |   3 +
>>>   4 files changed, 170 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_drv.c
>>> b/drivers/gpu/drm/i915/i915_drv.c
>>> index 0fcd1c0..fc2da32 100644
>>> --- a/drivers/gpu/drm/i915/i915_drv.c
>>> +++ b/drivers/gpu/drm/i915/i915_drv.c
>>> @@ -770,8 +770,20 @@ static int i915_workqueues_init(struct
>>> drm_i915_private *dev_priv)
>>>       if (dev_priv->hotplug.dp_wq == NULL)
>>>           goto out_free_wq;
>>>
>>> +    if (HAS_GUC_SCHED(dev_priv)) {
>>
>> This just reminded me that a previous patch had:
>>
>> +    if (HAS_GUC_UCODE(dev))
>> +        dev_priv->pm_guc_events = GEN9_GUC_TO_HOST_INT_EVENT;
>>
>> In the interrupt setup. I don't think there is a bug right now, but
>> there is a disagreement between the two which would be good to resolve.
>>
>> This HAS_GUC_UCODE in the other patch should probably be HAS_GUC_SCHED
>> for correctness. I think.
>
> Sorry for inconsistency, Will use HAS_GUC_SCHED in the previous patch.
>
> As per Chris's comments will move the wq init/destroy to the GuC logging
> setup/teardown routines (guc_create_log_extras, guc_log_cleanup)
> You are fine with that ?.

Yes thats OK I think.

>>
>>> +        /* Need a dedicated wq to process log buffer flush interrupts
>>> +         * from GuC without much delay so as to avoid any loss of logs.
>>> +         */
>>> +        dev_priv->guc.log.wq =
>>> +            alloc_ordered_workqueue("i915-guc_log", 0);
>>> +        if (dev_priv->guc.log.wq == NULL)
>>> +            goto out_free_hotplug_dp_wq;
>>> +    }
>>> +
>>>       return 0;
>>>
>>> +out_free_hotplug_dp_wq:
>>> +    destroy_workqueue(dev_priv->hotplug.dp_wq);
>>>   out_free_wq:
>>>       destroy_workqueue(dev_priv->wq);
>>>   out_err:
>>> @@ -782,6 +794,8 @@ out_err:
>>>
>>>   static void i915_workqueues_cleanup(struct drm_i915_private *dev_priv)
>>>   {
>>> +    if (HAS_GUC_SCHED(dev_priv))
>>> +        destroy_workqueue(dev_priv->guc.log.wq);
>>>       destroy_workqueue(dev_priv->hotplug.dp_wq);
>>>       destroy_workqueue(dev_priv->wq);
>>>   }
>>> diff --git a/drivers/gpu/drm/i915/i915_guc_submission.c
>>> b/drivers/gpu/drm/i915/i915_guc_submission.c
>>> index c7c679f..2635b67 100644
>>> --- a/drivers/gpu/drm/i915/i915_guc_submission.c
>>> +++ b/drivers/gpu/drm/i915/i915_guc_submission.c
>>> @@ -172,6 +172,15 @@ static int host2guc_sample_forcewake(struct
>>> intel_guc *guc,
>>>       return host2guc_action(guc, data, ARRAY_SIZE(data));
>>>   }
>>>
>>> +static int host2guc_logbuffer_flush_complete(struct intel_guc *guc)
>>> +{
>>> +    u32 data[1];
>>> +
>>> +    data[0] = HOST2GUC_ACTION_LOG_BUFFER_FILE_FLUSH_COMPLETE;
>>> +
>>> +    return host2guc_action(guc, data, 1);
>>> +}
>>> +
>>>   /*
>>>    * Initialise, update, or clear doorbell data shared with the GuC
>>>    *
>>> @@ -840,6 +849,127 @@ err:
>>>       return NULL;
>>>   }
>>>
>>> +static void guc_move_to_next_buf(struct intel_guc *guc)
>>> +{
>>> +    return;
>>> +}
>>> +
>>> +static void* guc_get_write_buffer(struct intel_guc *guc)
>>> +{
>>> +    return NULL;
>>> +}
>>> +
>>> +static void guc_read_update_log_buffer(struct intel_guc *guc)
>>> +{
>>> +    struct guc_log_buffer_state *log_buffer_state,
>>> *log_buffer_snapshot_state;
>>> +    struct guc_log_buffer_state log_buffer_state_local;
>>> +    void *src_data_ptr, *dst_data_ptr;
>>> +    u32 i, buffer_size;
>>
>> unsigned int i if you can be bothered.
>
> Fine will do that for both i & buffer_size.

buffer_size can match the type of log_buffer_state_local.size or use 
something else if more appropriate.

> But I remember earlier in one of the patch, you suggested to use u32 as
> a type for some variables.
> Please could you share the guideline.
> Should u32, u64 be used we are exactly sure of the range of the
> variable, like for variables containing the register values ?

Depends what the variable is. "i" in this case is just a standard 
counter so not appropriate to use u32. It is not that wrong on x86, just 
looks a bit out of place.

grep "u32 i;" has no hits in i915. :)

They can/should be used for variables tied with hardware or protocols. 
Or in some cases where you really want to save space by using a small type.

>
>>
>>> +
>>> +    if (!guc->log.buf_addr)
>>> +        return;
>>
>> Can it hit this? If yes, I think better disable GuC logging when pin map
>> on the object fails rather than let it generate interrupts in vain.
>>
>>> +
>>> +    /* Get the pointer to shared GuC log buffer */
>>> +    log_buffer_state = src_data_ptr = guc->log.buf_addr;
>>> +
>>> +    /* Get the pointer to local buffer to store the logs */
>>> +    dst_data_ptr = log_buffer_snapshot_state =
>>> guc_get_write_buffer(guc);
>>> +
>>> +    /* Actual logs are present from the 2nd page */
>>> +    src_data_ptr += PAGE_SIZE;
>>> +    dst_data_ptr += PAGE_SIZE;
>>> +
>>> +    for (i = 0; i < GUC_MAX_LOG_BUFFER; i++) {
>>> +        /* Make a copy of the state structure in GuC log buffer (which
>>> +         * is uncached mapped) on the stack to avoid reading from it
>>> +         * multiple times.
>>> +         */
>>> +        memcpy(&log_buffer_state_local, log_buffer_state,
>>> +                sizeof(struct guc_log_buffer_state));
>>> +        buffer_size = log_buffer_state_local.size;
>>
>> Needs checking (and logging) that a bad firmware or some other error
>> does not report a dangerous size (too big) which would then overwrite
>> memory pointed by dst_data_ptr memory. (And/or read from random memory.)
>>
> Have done the range checking for the read/write offset values, which are
> updated repeatedly by GuC firmware.
> The buffer size is updated only at init time by GuC firmware, hence less
> vulnerable.
>
> But nevertheless will add the checks.

Ok good.

>>> +
>>> +        if (log_buffer_snapshot_state) {
>>> +            /* First copy the state structure in local buffer */
>>
>> Maybe "destination buffer" would be clearer?
>
> Sorry my bad, well spotted.
>
>>
>>> +            memcpy(log_buffer_snapshot_state, &log_buffer_state_local,
>>> +                    sizeof(struct guc_log_buffer_state));
>>> +
>>> +            /* The write pointer could have been updated by the GuC
>>> +             * firmware, after sending the flush interrupt to Host,
>>> +             * for consistency set the write pointer value to same
>>> +             * value of sampled_write_ptr in the snapshot buffer.
>>> +             */
>>> +            log_buffer_snapshot_state->write_ptr =
>>> +                log_buffer_snapshot_state->sampled_write_ptr;
>>> +
>>> +            log_buffer_snapshot_state++;
>>> +
>>> +            /* Now copy the actual logs */
>>> +            memcpy(dst_data_ptr, src_data_ptr, buffer_size);
>>> +
>>> +            src_data_ptr += buffer_size;
>>> +            dst_data_ptr += buffer_size;
>>> +        }
>>> +
>>> +        /* FIXME: invalidate/flush for log buffer needed */
>>
>> Yes no maybe? :)
>
> Will like to keep it, if you allow.

I think you need a really good justification to get r-b on patches with 
FIXMEs, especially like this one. Do you maybe handle it or remove it in 
a following patch or something?

Regards,

Tvrtko