[Intel-gfx] [PATCH 0/7] Per client engine busyness

Tvrtko Ursulin tvrtko.ursulin at linux.intel.com
Mon May 24 10:48:00 UTC 2021


On 20/05/2021 09:35, Tvrtko Ursulin wrote:
> On 19/05/2021 19:23, Daniel Vetter wrote:
>> On Wed, May 19, 2021 at 6:16 PM Tvrtko Ursulin
>> <tvrtko.ursulin at linux.intel.com> wrote:
>>>
>>>
>>> On 18/05/2021 10:40, Tvrtko Ursulin wrote:
>>>>
>>>> On 18/05/2021 10:16, Daniel Stone wrote:
>>>>> Hi,
>>>>>
>>>>> On Tue, 18 May 2021 at 10:09, Tvrtko Ursulin
>>>>> <tvrtko.ursulin at linux.intel.com> wrote:
>>>>>> I was just wondering if stat(2) and a chrdev major check would be a
>>>>>> solid criteria to more efficiently (compared to parsing the text
>>>>>> content) detect drm files while walking procfs.
>>>>>
>>>>> Maybe I'm missing something, but is the per-PID walk actually a
>>>>> measurable performance issue rather than just a bit unpleasant?
>>>>
>>>> Per pid and per each open fd.
>>>>
>>>> As said in the other thread what bothers me a bit in this scheme is 
>>>> that
>>>> the cost of obtaining GPU usage scales based on non-GPU criteria.
>>>>
>>>> For use case of a top-like tool which shows all processes this is a
>>>> smaller additional cost, but then for a gpu-top like tool it is 
>>>> somewhat
>>>> higher.
>>>
>>> To further expand, not only cost would scale per pid multiplies per open
>>> fd, but to detect which of the fds are DRM I see these three options:
>>>
>>> 1) Open and parse fdinfo.
>>> 2) Name based matching ie /dev/dri/.. something.
>>> 3) Stat the symlink target and check for DRM major.
>>
>> stat with symlink following should be plenty fast.
> 
> Maybe. I don't think my point about keeping the dentry cache needlessly 
> hot is getting through at all. On my lightly loaded desktop:
> 
>   $ sudo lsof | wc -l
>   599551
> 
>   $ sudo lsof | grep "/dev/dri/" | wc -l
>   1965
> 
> It's going to look up ~600k pointless dentries in every iteration. Just 
> to find a handful of DRM ones. Hard to say if that is better or worse 
> than just parsing fdinfo text for all files. Will see.

CPU usage looks passable under a production kernel (non-debug). Once a 
second refresh period, on a not really that loaded system (115 running 
processes, 3096 open file descriptors as reported by lsof, none of which 
are DRM), results in a system call heavy load:

real    0m55.348s
user    0m0.100s
sys     0m0.319s

Once per second loop is essentially along the lines of:

   for each pid in /proc/<pid>:
     for each fd in /proc/<pid>/fdinfo:
       if fstatat(fd) is drm major:
         read fdinfo text in one sweep and parse it

I'll post the quick intel_gpu_top patch for reference but string parsing 
in C leaves a few things to be desired there.

Regards,

Tvrtko


More information about the Intel-gfx mailing list