[PATCH 00/17] Add OA functionality to Xe

Souza, Jose jose.souza at intel.com
Tue May 21 21:00:21 UTC 2024


On Tue, 2024-05-21 at 13:24 -0700, Dixit, Ashutosh wrote:
> On Tue, 21 May 2024 11:48:01 -0700, Souza, Jose wrote:
> > 
> 
> Hi Jose,
> 
> > On Tue, 2024-05-21 at 11:02 -0700, Dixit, Ashutosh wrote:
> > > On Tue, 21 May 2024 10:39:17 -0700, Souza, Jose wrote:
> > > > 
> > > > On Tue, 2024-05-21 at 09:43 -0700, Dixit, Ashutosh wrote:
> > > > > On Tue, 21 May 2024 09:29:51 -0700, Souza, Jose wrote:
> > > > > > 
> > > > > > On Tue, 2024-05-21 at 09:10 -0700, Dixit, Ashutosh wrote:
> > > > > > > On Tue, 21 May 2024 07:47:58 -0700, Souza, Jose wrote:
> > > > > > > 
> > > > > > > Hi Jose,
> > > > > > > 
> > > > > > > > > Other ask, can you remove this 'Failed to remove unknown OA config'
> > > > > > > > > debug message from xe_oa_remove_config_ioctl()?
> > > > > > > > 
> > > > > > > > Missed 'Insufficient privileges to remove xe OA config', that need to be
> > > > > > > > removed too from xe_oa_remove_config_ioctl().
> > > > > > > > 
> > > > > > > > > Mesa will be using DRM_XE_PERF_OP_REMOVE_CONFIG with config id set to
> > > > > > > > > UINT64_MAX to detect if Xe KMD supports OA counters and if application
> > > > > > > > > has enough permissions to use it.  So it causes dmesg to be flooded
> > > > > > > > > with 'xe 0000:00:02.0: [drm:xe_oa_remove_config_ioctl [xe]] Failed to
> > > > > > > > > remove unknown OA config' messages when running tests suites.
> > > > > > > > > 
> > > > > > > > > Or do you have other suggestion of uAPI that I can use.
> > > > > > > 
> > > > > > > OK, so you are relying on ENODEV and EACCES errno's from
> > > > > > > DRM_XE_PERF_OP_REMOVE_CONFIG to find out (a) if OA is present and (b) if
> > > > > > > you need to be root (actually CAP_PERFMON or CAP_SYS_ADMIN).
> > > > > > 
> > > > > > yep
> > > > > > 
> > > > > > > 
> > > > > > > This logic in Xe should be close to what we have in i915? What does Mesa do
> > > > > > > for i915, or what doesn't work in Xe?
> > > > > > > 
> > > > > > > Here are some pointers:
> > > > > > > 
> > > > > > > * You can execute DRM_XE_DEVICE_QUERY_OA_UNITS to see if OA is present
> > > > > > > 
> > > > > > > * Add/remove OA configs and using the global OAG buffer (time based
> > > > > > >   sampling or DRM_XE_OA_PROPERTY_SAMPLE_OA set) are priviliged operations
> > > > > > >   (need root). Operations which only need OAR/OAC (OA queries, without
> > > > > > >   DRM_XE_OA_PROPERTY_SAMPLE_OA) can be executed by non-root.
> > > > > > > 
> > > > > > > * If "/proc/sys/dev/xe/perf_stream_paranoid" is 0, all operations can be
> > > > > > >   executed by non-root users. Otherwise, as I described in the previous
> > > > > > >   point.
> > > > > > 
> > > > > > It is possible that process not started by root has CAP_PERFMON:
> > > > > 
> > > > > Yes, correct.
> > > > > 
> > > > > Above I am using "root" loosely as "CAP_PERFMON or CAP_SYS_ADMIN".
> > > > > 
> > > > > So if root sets 'perf_stream_paranoid' to 0, normal users can do OA
> > > > > priviliged operations (as described above).
> > > > > 
> > > > > If root sets 'perf_stream_paranoid' to 1, only CAP_PERFMON or CAP_SYS_ADMIN
> > > > > users can do OA priviliged operations.
> > > > 
> > > > oh okay so perf_stream_paranoid has a good usage but it do not covers
> > > > CAP_PERFMON, see below.
> > > > 
> > > > > 
> > > > > > "Unprivileged processes with enabled CAP_PERFMON capability are treated
> > > > > > as privileged processes with respect to perf_events performance
> > > > > > monitoring and observability operations,..."
> > > > > > 
> > > > > > And from what I understood only root can write to perf_stream_paranoid,
> > > > > > so I don't see a point in having this file...
> > > > > 
> > > > > So I don't know how this statement follows?
> > > > > 
> > > > > root or superuser is the one which gives the permission to non-CAP_PERFMON
> > > > > and non-CAP_SYS_ADMIN users to be able to do priviliged OA operations.
> > > > 
> > > > so if I'm running a process with CAP_PERFMON and read
> > > > perf_stream_paranoid and it returns 0
> > > 
> > > 0 if fine, everyone can use all OA calls. The issue is with 1.
> > > 
> > > > there is no way for UMD to know
> > > > that process is allowed to use Xe KMD OA feature without do a uAPI call
> > > > that checks for CAP_PERFMON.
> > > > 
> > > > That is why I think is better just do a single
> > > > DRM_XE_PERF_OP_REMOVE_CONFIG to detect if feature is present and if
> > > > process is allowed to use it. But it generates a bunch of debug messages.
> > > 
> > > A process should be able to find out on its own, without help from Xe
> > > driver, what it's capabilities (CAP_PERFMON or CAP_SYS_ADMIN or neither)
> > > are:
> > > 
> > > https://www.google.com/search?q=linux+get+process+capabilities+in+c
> > > https://man7.org/linux/man-pages/man3/libcap.3.html
> > > 
> > 
> > I don't think this is much portable, I don't think BSD has this sysfs
> > with PID capabilities of process.
> 
> Not sure why we are bringing BSD here since we have a Linux KMD, not a BSD
> KMD. In any case, every OS has system calls for a process to query its
> properties.

BSD wraps Linux drm drivers into a translation layer.

https://www.freshports.org/graphics/drm-kmod/

> 
> So if these are not portable, you don't have to use these library calls,
> you should be able to use direct calls into the OS to find stuff out.

That is the problem, Linux exposes process capabilities in sysfs while BSD don't even has concept of capabilities, it has privileges...

> 
> > Also if later Xe KMD adds or removes process capabilities that has access
> > to OA it will break UMDs.
> 
> That would be breaking the uapi, so once it is merged it cannot be changed.
> 
> > https://wiki.freebsd.org/Graphics/Intel-GPU-Matrix
> > 
> > I think would be better to use uAPIs to figure out permissions.
> 
> As I said, the correct way to do this is to use OS calls. In any case, I
> don't think we can remove the debug messages because they are useful for
> debugging (that's why they are there).

Most of Xe uAPI don't have this level debug message.
EACCES is pretty clear what is the issue and it don't need a debug message in my opinion. 

> 
> Also, looking at intel_perf_init_metrics()->oa_metrics_available(),
> currently it just calls geteuid() and doesn't worry about CAP_PERFMON. Of
> course you can add that, but then you will need to figure out the correct
> way to do it. The current implementation of oa_metrics_available() seems
> mostly ok to me and doesn't call remove_config() etc. so IMO we should
> continue that approach for Xe too.

i915 is missing CAP_PERFMON handling because it was added ( https://gitlab.freedesktop.org/drm/xe/kernel/-/commit/980737282232b7 ) after i915 OA uAPI.
For i915 I don't care at this point but for Xe KMD it is important to support CAP_PERFMON and CAP_SYS_ADMIN.

> 
> Thanks.
> --
> Ashutosh
> 
> 
> 
> > 
> > > And therefore, along with perf_stream_paranoid, determine which OA calls it
> > > can use.
> > > 
> > > > 
> > > > > 
> > > > > > > So basically I think you just need to check for the perf_stream_paranoid
> > > > > > > file above. It will tell you both (a) if OA is present (because we are
> > > > > > > going to merge the code which creates this file together with OA) and (b)
> > > > > > > if you need to be root for particular operations.
> > > > > > > 
> > > > > > > Thanks.
> > > > > > > --
> > > > > > > Ashutosh
> > > > > > 
> > > > 
> > 



More information about the Intel-xe mailing list