[Intel-gfx] [PATCH 04/11] drm/i915: Support for GuC interrupts

Fri Jul 1 08:47:50 UTC 2016

On 01/07/16 07:16, Goel, Akash wrote:

[snip]

>>>>
>>>>> +            /* Process all the GuC to Host events in bottom half */
>>>>> +            gen6_disable_pm_irq(dev_priv,
>>>>> +                GEN9_GUC_TO_HOST_INT_EVENT);
>>>>
>>>> Why it is important to disable the interrupt here? Not for the queue
>>>> work I think.
>>>
>>> We want to & can handle one interrupt at a time, unless the queued work
>>> item is executed we can't process the next interrupt, so better to keep
>>> the interrupt masked.
>>> Sorry this is what is my understanding.
>>
>> So it is queued in hardware and will get asserted when unmasked?
> As per my understanding, if the interrupt is masked (IMR), it won't be
> queued, will be ignored & so will not be asserted on unmasking.
>
> If the interrupt wasn't masked, but was disabled (in IER) then it
> will be asserted (in IIR) when its enabled.
>
>>
>>>
>>>>
>>>> Also, is it safe with regards to potentially losing the interrupt?
>>>>
>>> Particularly for the FLUSH_LOG_BUFFER case, GuC won't send a new flush
>>> interrupt unless its gets an acknowledgement (flush signal) of the
>>> previous one from Host.
>>
>> Ah so the previous comment is really impossible? I mean the need to mask?
>
> Sorry my comments were not fully correct. GuC can send a new flush
> interrupt, even if the previous one is pending, but that will be for a
> different log buffer type (3 types of log buffer ISR, DPC, CRASH).
> For the same buffer type, GuC won't send a new flush interrupt unless
> its gets an acknowledgement of the previous one from Host.
>
> But as you said the workqueue is ordered and furthermore there is a
> single instance of work item, so the serialization will be provided
> implicitly and there is no real need to mask the interrupt.
>
> As mentioned above, a new flush interrupt can come while the previous
> one is being processed on Host but due to a single instance of work item
> either that new interrupt will not do anything effectively if work
> item was in a pending state or will re queue the work item if it was
> getting executed at that time.
>
> Also the state of all 3 log buffer types are being parsed irrespective
> for which one the interrupt actually came, and the whole buffer is being
> captured (this is how it has been recommended to handle the flush
> interrupts from Host side). So if a new interrupt comes while the work
> item was in a pending state, then effectively work of this new interrupt
> will also be done when work item is executed later.
>
> So will remove the masking then ?

I think so, because if I understood what you wrote, masking can lose us 
an interrupt.

>>
>> Possibly just put a comment up there explaining that.
>>
>>>
>>>>> +            queue_work(dev_priv->wq, &dev_priv->guc.events_work);
>>>>
>>>> Because dev_priv->wq is a one a time in order wq so if something
>>>> else is
>>>> running on it and taking time, can that also be a cause of dropping an
>>>> interrupt or being late with sending the flush signal to the guc and so
>>>> losing some logs?
>>>
>>> Its a Driver's private workqueue and Turbo work item is also queued
>>> inside this workqueue which too needs to be executed without much delay.
>>> But yes the flush work item can get substantially delayed in case if
>>> there are other work items queued before it, especially the
>>> mm.retire_work (but generally executes every ~1 second).
>>>
>>> Best would be if the log buffer (44KB data) can be sampled in IRQ
>>> context (or Tasklet context) itself.
>>
>> I was just trying to understand if you perhaps need a dedicated wq. I
>> don't have a feel at all on how much data guc logging generates per
>> second. If the interrupt is low frequency even with a lot of cmd
>> submission happening it could be fine like it is.
>>
> Actually with maximum verbosity level, I am seeing flush interrupt every
> ms, with 'gem_exec_nop' IGT, as there are lot of submissions being done.
> But such may not happen in real life scenario.
>
> I think, if needed, later on we can either have a dedicated high
> priority work queue for logging work or use the tasklet context to do
> the processing.

Hm, do you need to add some DRM_ERROR or something if wq starts lagging 
behind the flush interrupts? How many missed flush interrupts can we 
afford before the logging buffer starts getting overwritten?

Regards,

Tvrtko