[Intel-gfx] [PATCH 0/7] Per client engine busyness
Tvrtko Ursulin
tvrtko.ursulin at linux.intel.com
Thu May 20 08:35:13 UTC 2021
On 19/05/2021 19:23, Daniel Vetter wrote:
> On Wed, May 19, 2021 at 6:16 PM Tvrtko Ursulin
> <tvrtko.ursulin at linux.intel.com> wrote:
>>
>>
>> On 18/05/2021 10:40, Tvrtko Ursulin wrote:
>>>
>>> On 18/05/2021 10:16, Daniel Stone wrote:
>>>> Hi,
>>>>
>>>> On Tue, 18 May 2021 at 10:09, Tvrtko Ursulin
>>>> <tvrtko.ursulin at linux.intel.com> wrote:
>>>>> I was just wondering if stat(2) and a chrdev major check would be a
>>>>> solid criteria to more efficiently (compared to parsing the text
>>>>> content) detect drm files while walking procfs.
>>>>
>>>> Maybe I'm missing something, but is the per-PID walk actually a
>>>> measurable performance issue rather than just a bit unpleasant?
>>>
>>> Per pid and per each open fd.
>>>
>>> As said in the other thread what bothers me a bit in this scheme is that
>>> the cost of obtaining GPU usage scales based on non-GPU criteria.
>>>
>>> For use case of a top-like tool which shows all processes this is a
>>> smaller additional cost, but then for a gpu-top like tool it is somewhat
>>> higher.
>>
>> To further expand, not only cost would scale per pid multiplies per open
>> fd, but to detect which of the fds are DRM I see these three options:
>>
>> 1) Open and parse fdinfo.
>> 2) Name based matching ie /dev/dri/.. something.
>> 3) Stat the symlink target and check for DRM major.
>
> stat with symlink following should be plenty fast.
Maybe. I don't think my point about keeping the dentry cache needlessly
hot is getting through at all. On my lightly loaded desktop:
$ sudo lsof | wc -l
599551
$ sudo lsof | grep "/dev/dri/" | wc -l
1965
It's going to look up ~600k pointless dentries in every iteration. Just
to find a handful of DRM ones. Hard to say if that is better or worse
than just parsing fdinfo text for all files. Will see.
>> All sound quite sub-optimal to me.
>>
>> Name based matching is probably the least evil on system resource usage
>> (Keeping the dentry cache too hot? Too many syscalls?), even though
>> fundamentally I don't it is the right approach.
>>
>> What happens with dup(2) is another question.
>
> We need benchmark numbers showing that on anything remotely realistic
> it's an actual problem. Until we've demonstrated it's a real problem
> we don't need to solve it.
Point about dup(2) is whether it is possible to distinguish the
duplicated fds in fdinfo. If a DRM client dupes, and we found two
fdinfos each saying client is using 20% GPU, we don't want to add it up
to 40%.
> E.g. top with any sorting enabled also parses way more than it
> displays on every update. It seems to be doing Just Fine (tm).
Ha, perceptions differ. I see it using 4-5% while building the kernel on
a Xeon server which I find quite a lot. :)
>> Does anyone have any feedback on the /proc/<pid>/gpu idea at all?
>
> When we know we have a problem to solve we can take a look at solutions.
Yes I don't think the problem would be to add a better solution later,
so happy to try the fdinfo first. I am simply pointing out a fundamental
design inefficiency. Even if machines are getting faster and faster I
don't think that should be an excuse to waste more and more under the
hood, when a more efficient solution can be designed from the start.
Regards,
Tvrtko
More information about the dri-devel
mailing list