[PATCH 00/17] Add OA functionality to Xe
Dixit, Ashutosh
ashutosh.dixit at intel.com
Wed May 22 02:28:11 UTC 2024
On Tue, 21 May 2024 14:00:21 -0700, Souza, Jose wrote:
>
> On Tue, 2024-05-21 at 13:24 -0700, Dixit, Ashutosh wrote:
> > On Tue, 21 May 2024 11:48:01 -0700, Souza, Jose wrote:
> > >
> >
> > Hi Jose,
> >
> > > On Tue, 2024-05-21 at 11:02 -0700, Dixit, Ashutosh wrote:
> > > > On Tue, 21 May 2024 10:39:17 -0700, Souza, Jose wrote:
> > > > >
> > > > > On Tue, 2024-05-21 at 09:43 -0700, Dixit, Ashutosh wrote:
> > > > > > On Tue, 21 May 2024 09:29:51 -0700, Souza, Jose wrote:
> > > > > > >
> > > > > > > On Tue, 2024-05-21 at 09:10 -0700, Dixit, Ashutosh wrote:
> > > > > > > > On Tue, 21 May 2024 07:47:58 -0700, Souza, Jose wrote:
> > > > > > > >
> > > > > > > > Hi Jose,
> > > > > > > >
> > > > > > > > > > Other ask, can you remove this 'Failed to remove unknown OA config'
> > > > > > > > > > debug message from xe_oa_remove_config_ioctl()?
> > > > > > > > >
> > > > > > > > > Missed 'Insufficient privileges to remove xe OA config', that need to be
> > > > > > > > > removed too from xe_oa_remove_config_ioctl().
> > > > > > > > >
> > > > > > > > > > Mesa will be using DRM_XE_PERF_OP_REMOVE_CONFIG with config id set to
> > > > > > > > > > UINT64_MAX to detect if Xe KMD supports OA counters and if application
> > > > > > > > > > has enough permissions to use it. So it causes dmesg to be flooded
> > > > > > > > > > with 'xe 0000:00:02.0: [drm:xe_oa_remove_config_ioctl [xe]] Failed to
> > > > > > > > > > remove unknown OA config' messages when running tests suites.
> > > > > > > > > >
> > > > > > > > > > Or do you have other suggestion of uAPI that I can use.
> > > > > > > >
> > > > > > > > OK, so you are relying on ENODEV and EACCES errno's from
> > > > > > > > DRM_XE_PERF_OP_REMOVE_CONFIG to find out (a) if OA is present and (b) if
> > > > > > > > you need to be root (actually CAP_PERFMON or CAP_SYS_ADMIN).
> > > > > > >
> > > > > > > yep
> > > > > > >
> > > > > > > >
> > > > > > > > This logic in Xe should be close to what we have in i915? What does Mesa do
> > > > > > > > for i915, or what doesn't work in Xe?
> > > > > > > >
> > > > > > > > Here are some pointers:
> > > > > > > >
> > > > > > > > * You can execute DRM_XE_DEVICE_QUERY_OA_UNITS to see if OA is present
> > > > > > > >
> > > > > > > > * Add/remove OA configs and using the global OAG buffer (time based
> > > > > > > > sampling or DRM_XE_OA_PROPERTY_SAMPLE_OA set) are priviliged operations
> > > > > > > > (need root). Operations which only need OAR/OAC (OA queries, without
> > > > > > > > DRM_XE_OA_PROPERTY_SAMPLE_OA) can be executed by non-root.
> > > > > > > >
> > > > > > > > * If "/proc/sys/dev/xe/perf_stream_paranoid" is 0, all operations can be
> > > > > > > > executed by non-root users. Otherwise, as I described in the previous
> > > > > > > > point.
> > > > > > >
> > > > > > > It is possible that process not started by root has CAP_PERFMON:
> > > > > >
> > > > > > Yes, correct.
> > > > > >
> > > > > > Above I am using "root" loosely as "CAP_PERFMON or CAP_SYS_ADMIN".
> > > > > >
> > > > > > So if root sets 'perf_stream_paranoid' to 0, normal users can do OA
> > > > > > priviliged operations (as described above).
> > > > > >
> > > > > > If root sets 'perf_stream_paranoid' to 1, only CAP_PERFMON or CAP_SYS_ADMIN
> > > > > > users can do OA priviliged operations.
> > > > >
> > > > > oh okay so perf_stream_paranoid has a good usage but it do not covers
> > > > > CAP_PERFMON, see below.
> > > > >
> > > > > >
> > > > > > > "Unprivileged processes with enabled CAP_PERFMON capability are treated
> > > > > > > as privileged processes with respect to perf_events performance
> > > > > > > monitoring and observability operations,..."
> > > > > > >
> > > > > > > And from what I understood only root can write to perf_stream_paranoid,
> > > > > > > so I don't see a point in having this file...
> > > > > >
> > > > > > So I don't know how this statement follows?
> > > > > >
> > > > > > root or superuser is the one which gives the permission to non-CAP_PERFMON
> > > > > > and non-CAP_SYS_ADMIN users to be able to do priviliged OA operations.
> > > > >
> > > > > so if I'm running a process with CAP_PERFMON and read
> > > > > perf_stream_paranoid and it returns 0
> > > >
> > > > 0 if fine, everyone can use all OA calls. The issue is with 1.
> > > >
> > > > > there is no way for UMD to know
> > > > > that process is allowed to use Xe KMD OA feature without do a uAPI call
> > > > > that checks for CAP_PERFMON.
> > > > >
> > > > > That is why I think is better just do a single
> > > > > DRM_XE_PERF_OP_REMOVE_CONFIG to detect if feature is present and if
> > > > > process is allowed to use it. But it generates a bunch of debug messages.
> > > >
> > > > A process should be able to find out on its own, without help from Xe
> > > > driver, what it's capabilities (CAP_PERFMON or CAP_SYS_ADMIN or neither)
> > > > are:
> > > >
> > > > https://www.google.com/search?q=linux+get+process+capabilities+in+c
> > > > https://man7.org/linux/man-pages/man3/libcap.3.html
> > > >
> > >
> > > I don't think this is much portable, I don't think BSD has this sysfs
> > > with PID capabilities of process.
> >
> > Not sure why we are bringing BSD here since we have a Linux KMD, not a BSD
> > KMD. In any case, every OS has system calls for a process to query its
> > properties.
>
> BSD wraps Linux drm drivers into a translation layer.
>
> https://www.freshports.org/graphics/drm-kmod/
>
> >
> > So if these are not portable, you don't have to use these library calls,
> > you should be able to use direct calls into the OS to find stuff out.
>
> That is the problem, Linux exposes process capabilities in sysfs while
> BSD don't even has concept of capabilities, it has privileges...
I guess then that the Linux to BSD translation layer will convert
capabilities to priviliges.
As I see it, Mesa will need to have Linux/BSD specific code. Similar to the
Linux link above here is the BSD specific link:
https://www.google.com/search?q=freebsd+get+process+privileges+in+c
>
> >
> > > Also if later Xe KMD adds or removes process capabilities that has access
> > > to OA it will break UMDs.
> >
> > That would be breaking the uapi, so once it is merged it cannot be changed.
> >
> > > https://wiki.freebsd.org/Graphics/Intel-GPU-Matrix
> > >
> > > I think would be better to use uAPIs to figure out permissions.
> >
> > As I said, the correct way to do this is to use OS calls. In any case, I
> > don't think we can remove the debug messages because they are useful for
> > debugging (that's why they are there).
>
> Most of Xe uAPI don't have this level debug message. EACCES is pretty
> clear what is the issue and it don't need a debug message in my opinion.
The problem is we have all sorts of IGT tests which test these error code
paths. When these fail in CI, all we have is dmesg to figure out what's
going on.
And anyway according to me, the right way to do this is to query the
OS. Maybe you need to add dependencies on some of these other libraries?
And I am not sure if the approach you are suggesting will be considered
acceptable in your Mesa code review.
Also, I don't know why you have oa_metrics_available() at all, since the
next step in intel_perf_init_metrics() is to add the OA configs and if the
process doesn't have sufficient priviliges, the kernel will reject adding
these configs. So just the return code from the kernel will suffice.
So you may be able to skip oa_metrics_available() for Xe with a little bit
of code refactoring, no?
>
> >
> > Also, looking at intel_perf_init_metrics()->oa_metrics_available(),
> > currently it just calls geteuid() and doesn't worry about CAP_PERFMON. Of
> > course you can add that, but then you will need to figure out the correct
> > way to do it. The current implementation of oa_metrics_available() seems
> > mostly ok to me and doesn't call remove_config() etc. so IMO we should
> > continue that approach for Xe too.
>
> i915 is missing CAP_PERFMON handling because it was added (
> https://gitlab.freedesktop.org/drm/xe/kernel/-/commit/980737282232b7 )
> after i915 OA uAPI. For i915 I don't care at this point but for Xe KMD
> it is important to support CAP_PERFMON and CAP_SYS_ADMIN.
You might add this for i915 too while you're at it :)
>
> >
> > Thanks.
> > --
> > Ashutosh
> >
> >
> >
> > >
> > > > And therefore, along with perf_stream_paranoid, determine which OA calls it
> > > > can use.
> > > >
> > > > >
> > > > > >
> > > > > > > > So basically I think you just need to check for the perf_stream_paranoid
> > > > > > > > file above. It will tell you both (a) if OA is present (because we are
> > > > > > > > going to merge the code which creates this file together with OA) and (b)
> > > > > > > > if you need to be root for particular operations.
> > > > > > > >
> > > > > > > > Thanks.
> > > > > > > > --
> > > > > > > > Ashutosh
> > > > > > >
> > > > >
> > >
>
More information about the Intel-xe
mailing list