[PATCH 00/17] Add OA functionality to Xe

Souza, Jose jose.souza at intel.com
Wed May 22 16:08:19 UTC 2024


On Tue, 2024-05-21 at 19:28 -0700, Dixit, Ashutosh wrote:
> On Tue, 21 May 2024 14:00:21 -0700, Souza, Jose wrote:
> > 
> > On Tue, 2024-05-21 at 13:24 -0700, Dixit, Ashutosh wrote:
> > > On Tue, 21 May 2024 11:48:01 -0700, Souza, Jose wrote:
> > > > 
> > > 
> > > Hi Jose,
> > > 
> > > > On Tue, 2024-05-21 at 11:02 -0700, Dixit, Ashutosh wrote:
> > > > > On Tue, 21 May 2024 10:39:17 -0700, Souza, Jose wrote:
> > > > > > 
> > > > > > On Tue, 2024-05-21 at 09:43 -0700, Dixit, Ashutosh wrote:
> > > > > > > On Tue, 21 May 2024 09:29:51 -0700, Souza, Jose wrote:
> > > > > > > > 
> > > > > > > > On Tue, 2024-05-21 at 09:10 -0700, Dixit, Ashutosh wrote:
> > > > > > > > > On Tue, 21 May 2024 07:47:58 -0700, Souza, Jose wrote:
> > > > > > > > > 
> > > > > > > > > Hi Jose,
> > > > > > > > > 
> > > > > > > > > > > Other ask, can you remove this 'Failed to remove unknown OA config'
> > > > > > > > > > > debug message from xe_oa_remove_config_ioctl()?
> > > > > > > > > > 
> > > > > > > > > > Missed 'Insufficient privileges to remove xe OA config', that need to be
> > > > > > > > > > removed too from xe_oa_remove_config_ioctl().
> > > > > > > > > > 
> > > > > > > > > > > Mesa will be using DRM_XE_PERF_OP_REMOVE_CONFIG with config id set to
> > > > > > > > > > > UINT64_MAX to detect if Xe KMD supports OA counters and if application
> > > > > > > > > > > has enough permissions to use it.  So it causes dmesg to be flooded
> > > > > > > > > > > with 'xe 0000:00:02.0: [drm:xe_oa_remove_config_ioctl [xe]] Failed to
> > > > > > > > > > > remove unknown OA config' messages when running tests suites.
> > > > > > > > > > > 
> > > > > > > > > > > Or do you have other suggestion of uAPI that I can use.
> > > > > > > > > 
> > > > > > > > > OK, so you are relying on ENODEV and EACCES errno's from
> > > > > > > > > DRM_XE_PERF_OP_REMOVE_CONFIG to find out (a) if OA is present and (b) if
> > > > > > > > > you need to be root (actually CAP_PERFMON or CAP_SYS_ADMIN).
> > > > > > > > 
> > > > > > > > yep
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > This logic in Xe should be close to what we have in i915? What does Mesa do
> > > > > > > > > for i915, or what doesn't work in Xe?
> > > > > > > > > 
> > > > > > > > > Here are some pointers:
> > > > > > > > > 
> > > > > > > > > * You can execute DRM_XE_DEVICE_QUERY_OA_UNITS to see if OA is present
> > > > > > > > > 
> > > > > > > > > * Add/remove OA configs and using the global OAG buffer (time based
> > > > > > > > >   sampling or DRM_XE_OA_PROPERTY_SAMPLE_OA set) are priviliged operations
> > > > > > > > >   (need root). Operations which only need OAR/OAC (OA queries, without
> > > > > > > > >   DRM_XE_OA_PROPERTY_SAMPLE_OA) can be executed by non-root.
> > > > > > > > > 
> > > > > > > > > * If "/proc/sys/dev/xe/perf_stream_paranoid" is 0, all operations can be
> > > > > > > > >   executed by non-root users. Otherwise, as I described in the previous
> > > > > > > > >   point.
> > > > > > > > 
> > > > > > > > It is possible that process not started by root has CAP_PERFMON:
> > > > > > > 
> > > > > > > Yes, correct.
> > > > > > > 
> > > > > > > Above I am using "root" loosely as "CAP_PERFMON or CAP_SYS_ADMIN".
> > > > > > > 
> > > > > > > So if root sets 'perf_stream_paranoid' to 0, normal users can do OA
> > > > > > > priviliged operations (as described above).
> > > > > > > 
> > > > > > > If root sets 'perf_stream_paranoid' to 1, only CAP_PERFMON or CAP_SYS_ADMIN
> > > > > > > users can do OA priviliged operations.
> > > > > > 
> > > > > > oh okay so perf_stream_paranoid has a good usage but it do not covers
> > > > > > CAP_PERFMON, see below.
> > > > > > 
> > > > > > > 
> > > > > > > > "Unprivileged processes with enabled CAP_PERFMON capability are treated
> > > > > > > > as privileged processes with respect to perf_events performance
> > > > > > > > monitoring and observability operations,..."
> > > > > > > > 
> > > > > > > > And from what I understood only root can write to perf_stream_paranoid,
> > > > > > > > so I don't see a point in having this file...
> > > > > > > 
> > > > > > > So I don't know how this statement follows?
> > > > > > > 
> > > > > > > root or superuser is the one which gives the permission to non-CAP_PERFMON
> > > > > > > and non-CAP_SYS_ADMIN users to be able to do priviliged OA operations.
> > > > > > 
> > > > > > so if I'm running a process with CAP_PERFMON and read
> > > > > > perf_stream_paranoid and it returns 0
> > > > > 
> > > > > 0 if fine, everyone can use all OA calls. The issue is with 1.
> > > > > 
> > > > > > there is no way for UMD to know
> > > > > > that process is allowed to use Xe KMD OA feature without do a uAPI call
> > > > > > that checks for CAP_PERFMON.
> > > > > > 
> > > > > > That is why I think is better just do a single
> > > > > > DRM_XE_PERF_OP_REMOVE_CONFIG to detect if feature is present and if
> > > > > > process is allowed to use it. But it generates a bunch of debug messages.
> > > > > 
> > > > > A process should be able to find out on its own, without help from Xe
> > > > > driver, what it's capabilities (CAP_PERFMON or CAP_SYS_ADMIN or neither)
> > > > > are:
> > > > > 
> > > > > https://www.google.com/search?q=linux+get+process+capabilities+in+c
> > > > > https://man7.org/linux/man-pages/man3/libcap.3.html
> > > > > 
> > > > 
> > > > I don't think this is much portable, I don't think BSD has this sysfs
> > > > with PID capabilities of process.
> > > 
> > > Not sure why we are bringing BSD here since we have a Linux KMD, not a BSD
> > > KMD. In any case, every OS has system calls for a process to query its
> > > properties.
> > 
> > BSD wraps Linux drm drivers into a translation layer.
> > 
> > https://www.freshports.org/graphics/drm-kmod/
> > 
> > > 
> > > So if these are not portable, you don't have to use these library calls,
> > > you should be able to use direct calls into the OS to find stuff out.
> > 
> > That is the problem, Linux exposes process capabilities in sysfs while
> > BSD don't even has concept of capabilities, it has privileges...
> 
> I guess then that the Linux to BSD translation layer will convert
> capabilities to priviliges.

It will, it will just not expose capabilities or privileges in sysfs like Linux does.

> 
> As I see it, Mesa will need to have Linux/BSD specific code. Similar to the
> Linux link above here is the BSD specific link:
> 
> https://www.google.com/search?q=freebsd+get+process+privileges+in+c

Again that is what I want to avoid.
We can avoid it by just removing 2 debug messages, don't look that hard.

> 
> > 
> > > 
> > > > Also if later Xe KMD adds or removes process capabilities that has access
> > > > to OA it will break UMDs.
> > > 
> > > That would be breaking the uapi, so once it is merged it cannot be changed.
> > > 
> > > > https://wiki.freebsd.org/Graphics/Intel-GPU-Matrix
> > > > 
> > > > I think would be better to use uAPIs to figure out permissions.
> > > 
> > > As I said, the correct way to do this is to use OS calls. In any case, I
> > > don't think we can remove the debug messages because they are useful for
> > > debugging (that's why they are there).
> > 
> > Most of Xe uAPI don't have this level debug message. EACCES is pretty
> > clear what is the issue and it don't need a debug message in my opinion.
> 
> The problem is we have all sorts of IGT tests which test these error code
> paths. When these fail in CI, all we have is dmesg to figure out what's
> going on.

IGT will also include the errno code, will not be much harder to figure the failure reason.

> 
> And anyway according to me, the right way to do this is to query the
> OS. Maybe you need to add dependencies on some of these other libraries?
> 
> And I am not sure if the approach you are suggesting will be considered
> acceptable in your Mesa code review.
> 
> Also, I don't know why you have oa_metrics_available() at all, since the
> next step in intel_perf_init_metrics() is to add the OA configs and if the
> process doesn't have sufficient priviliges, the kernel will reject adding
> these configs. So just the return code from the kernel will suffice.

if oa_metrics_available() returns false it will not try to add OA configs.

> 
> So you may be able to skip oa_metrics_available() for Xe with a little bit
> of code refactoring, no?
> 
> > 
> > > 
> > > Also, looking at intel_perf_init_metrics()->oa_metrics_available(),
> > > currently it just calls geteuid() and doesn't worry about CAP_PERFMON. Of
> > > course you can add that, but then you will need to figure out the correct
> > > way to do it. The current implementation of oa_metrics_available() seems
> > > mostly ok to me and doesn't call remove_config() etc. so IMO we should
> > > continue that approach for Xe too.
> > 
> > i915 is missing CAP_PERFMON handling because it was added (
> > https://gitlab.freedesktop.org/drm/xe/kernel/-/commit/980737282232b7 )
> > after i915 OA uAPI.  For i915 I don't care at this point but for Xe KMD
> > it is important to support CAP_PERFMON and CAP_SYS_ADMIN.
> 
> You might add this for i915 too while you're at it :)

i915 will also span dmesg.

> 
> > 
> > > 
> > > Thanks.
> > > --
> > > Ashutosh
> > > 
> > > 
> > > 
> > > > 
> > > > > And therefore, along with perf_stream_paranoid, determine which OA calls it
> > > > > can use.
> > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > > > So basically I think you just need to check for the perf_stream_paranoid
> > > > > > > > > file above. It will tell you both (a) if OA is present (because we are
> > > > > > > > > going to merge the code which creates this file together with OA) and (b)
> > > > > > > > > if you need to be root for particular operations.
> > > > > > > > > 
> > > > > > > > > Thanks.
> > > > > > > > > --
> > > > > > > > > Ashutosh
> > > > > > > > 
> > > > > > 
> > > > 
> > 



More information about the Intel-xe mailing list