[Intel-gfx] [PATCH 04/11] drm/i915: Support for GuC interrupts
Goel, Akash
akash.goel at intel.com
Fri Jul 1 09:57:31 UTC 2016
On 7/1/2016 2:17 PM, Tvrtko Ursulin wrote:
>
>
> On 01/07/16 07:16, Goel, Akash wrote:
>
> [snip]
>
>>>>>
>>>>>> + /* Process all the GuC to Host events in bottom half */
>>>>>> + gen6_disable_pm_irq(dev_priv,
>>>>>> + GEN9_GUC_TO_HOST_INT_EVENT);
>>>>>
>>>>> Why it is important to disable the interrupt here? Not for the queue
>>>>> work I think.
>>>>
>>>> We want to & can handle one interrupt at a time, unless the queued work
>>>> item is executed we can't process the next interrupt, so better to keep
>>>> the interrupt masked.
>>>> Sorry this is what is my understanding.
>>>
>>> So it is queued in hardware and will get asserted when unmasked?
>> As per my understanding, if the interrupt is masked (IMR), it won't be
>> queued, will be ignored & so will not be asserted on unmasking.
>>
>> If the interrupt wasn't masked, but was disabled (in IER) then it
>> will be asserted (in IIR) when its enabled.
>>
>>>
>>>>
>>>>>
>>>>> Also, is it safe with regards to potentially losing the interrupt?
>>>>>
>>>> Particularly for the FLUSH_LOG_BUFFER case, GuC won't send a new flush
>>>> interrupt unless its gets an acknowledgement (flush signal) of the
>>>> previous one from Host.
>>>
>>> Ah so the previous comment is really impossible? I mean the need to
>>> mask?
>>
>> Sorry my comments were not fully correct. GuC can send a new flush
>> interrupt, even if the previous one is pending, but that will be for a
>> different log buffer type (3 types of log buffer ISR, DPC, CRASH).
>> For the same buffer type, GuC won't send a new flush interrupt unless
>> its gets an acknowledgement of the previous one from Host.
>>
>> But as you said the workqueue is ordered and furthermore there is a
>> single instance of work item, so the serialization will be provided
>> implicitly and there is no real need to mask the interrupt.
>>
>> As mentioned above, a new flush interrupt can come while the previous
>> one is being processed on Host but due to a single instance of work item
>> either that new interrupt will not do anything effectively if work
>> item was in a pending state or will re queue the work item if it was
>> getting executed at that time.
>>
>> Also the state of all 3 log buffer types are being parsed irrespective
>> for which one the interrupt actually came, and the whole buffer is being
>> captured (this is how it has been recommended to handle the flush
>> interrupts from Host side). So if a new interrupt comes while the work
>> item was in a pending state, then effectively work of this new interrupt
>> will also be done when work item is executed later.
>>
>> So will remove the masking then ?
>
> I think so, because if I understood what you wrote, masking can lose us
> an interrupt.
>
If a new flush interrupt comes while the work item was getting executed
then there is a potential of losing an opportunity to sample the log buffer.
Will not mask the interrupt.
Thanks for persisting on this.
>>>
>>> Possibly just put a comment up there explaining that.
>>>
>>>>
>>>>>> + queue_work(dev_priv->wq, &dev_priv->guc.events_work);
>>>>>
>>>>> Because dev_priv->wq is a one a time in order wq so if something
>>>>> else is
>>>>> running on it and taking time, can that also be a cause of dropping an
>>>>> interrupt or being late with sending the flush signal to the guc
>>>>> and so
>>>>> losing some logs?
>>>>
>>>> Its a Driver's private workqueue and Turbo work item is also queued
>>>> inside this workqueue which too needs to be executed without much
>>>> delay.
>>>> But yes the flush work item can get substantially delayed in case if
>>>> there are other work items queued before it, especially the
>>>> mm.retire_work (but generally executes every ~1 second).
>>>>
>>>> Best would be if the log buffer (44KB data) can be sampled in IRQ
>>>> context (or Tasklet context) itself.
>>>
>>> I was just trying to understand if you perhaps need a dedicated wq. I
>>> don't have a feel at all on how much data guc logging generates per
>>> second. If the interrupt is low frequency even with a lot of cmd
>>> submission happening it could be fine like it is.
>>>
>> Actually with maximum verbosity level, I am seeing flush interrupt every
>> ms, with 'gem_exec_nop' IGT, as there are lot of submissions being done.
>> But such may not happen in real life scenario.
>>
>> I think, if needed, later on we can either have a dedicated high
>> priority work queue for logging work or use the tasklet context to do
>> the processing.
>
> Hm, do you need to add some DRM_ERROR or something if wq starts lagging
> behind the flush interrupts? How many missed flush interrupts can we
> afford before the logging buffer starts getting overwritten?
>
Actually if GuC is producing logs at such a fast rate then we can't
afford to miss even a single interrupt, if we don't want to lose any logs.
When the log buffer becomes half full, GuC sends a flush interrupt.
GuC firmware expects that while it is writing to 2nd half of the buffer,
first half would get consumed by Host and then get a flush completed
acknowledgement from Host, so that it does not end up doing any
overwrite causing loss of logs.
There is a buffer_full_cnt field in the state structure which GuC
firmware increments every time it detects a potential log buffer
overflow. Probably this can be shown via debugfs ?
The return value of queue_work() can be checked in the interrupt, if it
is zero then it means the work item is still in the queue so the
previous flush hasn't been processed yet, this can give an inkling of lag.
I had initially thought that creating a dedicated workqueue for flush
work might be considered an overkill.
Will proceed as per your suggestion.
Best regards
Akash
> Regards,
>
> Tvrtko
>
>
>
More information about the Intel-gfx
mailing list