[Intel-gfx] [PATCH 5/7] drm/i915: handle interrupts from the OA unit

Thu Jan 17 11:43:31 UTC 2019

On 16/01/2019 18:04, Lionel Landwerlin wrote:
> On 16/01/2019 16:31, Chris Wilson wrote:
>> Quoting Lionel Landwerlin (2019-01-16 16:25:26)
>>> On 16/01/2019 16:05, Chris Wilson wrote:
>>>> Quoting Lionel Landwerlin (2019-01-16 15:58:00)
>>>>> On 16/01/2019 15:52, Chris Wilson wrote:
>>>>>> Quoting Lionel Landwerlin (2019-01-16 15:36:20)
>>>>>>> @@ -1877,6 +1883,21 @@ struct drm_i915_private {
>>>>>>>                            wait_queue_head_t poll_wq;
>>>>>>>                            bool pollin;
>>>>>>>     +                       /**
>>>>>>> +                        * Atomic counter incremented by the 
>>>>>>> interrupt
>>>>>>> +                        * handling code for each OA half full 
>>>>>>> interrupt
>>>>>>> +                        * received.
>>>>>>> +                        */
>>>>>>> +                       atomic64_t half_full_count;
>>>>>>> +
>>>>>>> +                       /**
>>>>>>> +                        * Copy of the atomic half_full_count 
>>>>>>> that was last
>>>>>>> +                        * processed in the i915-perf driver. If 
>>>>>>> both counters
>>>>>>> +                        * differ, there is data available to 
>>>>>>> read in the OA
>>>>>>> +                        * buffer.
>>>>>>> +                        */
>>>>>>> +                       u64 half_full_count_last;
>>>>>> Eh? But why a relatively expensive atomic64. You only need one 
>>>>>> bit, and
>>>>>> reading the tail pointer from WB memory should just be cheap. You 
>>>>>> should
>>>>>> be able to sample the perf ringbuffer pointers very cheaply... 
>>>>>> What am I
>>>>>> missing?
>>>>>> -Chris
>>>>>>
>>>>> Fair comment.
>>>>>
>>>>> The thing is with this series there are 2 mechanism that notify 
>>>>> the poll_wq.
>>>>>
>>>>> One is the hrtimer that kicks in at regular interval and polls the
>>>>> register with the workaround.
>>>>>
>>>>> The other is the interrupt which doesn't read the registers and 
>>>>> workaround.
>>>> What's the complication with the workaround?
>>>
>>> It's a bit more than just looking at registers, we actually have to 
>>> look
>>> at the content of the buffer to figure out what landed in memory.
>>>
>>> The register values are not synchronized with the memory writes...
>> I don't want to look at registers at all for polling, and you shouldn't
>> need to since communication is via a ringbuf.
>>> There is a comment in the code (i915_perf_poll_locked) about not
>>> checking the register after each wakeup because that may be a very 
>>> hot path.
>>>
>>> The atomic64 sounded like a lesser evil.
>> I'm clearly not understanding something here...
>>
>> Does the hardware not do:
>>     update ringbuf data;
>>     wmb() (or post to global observation point in their parlance)
>>     update ringbuf tail
>
>
> As far as I understand, the OA unit :
>
>     sends its memory write requests to the memory controller
>
>     immediately updates the ringbuf tail, without waiting for the 
> previous requests to complete
>

By experimentation, I've haven't seen a delta between what is available 
in memory and what the OA tail register indicate larger than 768 bytes, 
which is about 3 OA reports at the largest size.
There is probably a maximum number of write requests the OA unit can 
queue before blocking.

So again maybe you would prefer a 2 stage mechanism :

OA interrupt -----> head/tail pointer worker  -----> wake up userspace

                     hrtimer head/tail pointer --|

>
>
>>
>> Then we just need to sample the ringbuf tail and compare against how far
>> we read last time?
>> -Chris
>>
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/intel-gfx/attachments/20190117/06b456b1/attachment.html>