[PATCH 00/17] Add OA functionality to Xe

Souza, Jose jose.souza at intel.com
Wed May 22 19:30:27 UTC 2024


On Wed, 2024-05-22 at 11:50 -0700, Dixit, Ashutosh wrote:
> On Wed, 22 May 2024 09:13:48 -0700, Souza, Jose wrote:
> > 
> > On Tue, 2024-05-21 at 21:42 -0700, Dixit, Ashutosh wrote:
> > > On Tue, 21 May 2024 09:29:51 -0700, Souza, Jose wrote:
> > > > 
> > > > On Tue, 2024-05-21 at 09:10 -0700, Dixit, Ashutosh wrote:
> > > > > On Tue, 21 May 2024 07:47:58 -0700, Souza, Jose wrote:
> > > > > 
> > > > > Hi Jose,
> > > > > 
> > > > > > > Other ask, can you remove this 'Failed to remove unknown OA config'
> > > > > > > debug message from xe_oa_remove_config_ioctl()?
> > > > > > 
> > > > > > Missed 'Insufficient privileges to remove xe OA config', that need to be
> > > > > > removed too from xe_oa_remove_config_ioctl().
> > > > > > 
> > > > > > > Mesa will be using DRM_XE_PERF_OP_REMOVE_CONFIG with config id set to
> > > > > > > UINT64_MAX to detect if Xe KMD supports OA counters and if application
> > > > > > > has enough permissions to use it.  So it causes dmesg to be flooded
> > > > > > > with 'xe 0000:00:02.0: [drm:xe_oa_remove_config_ioctl [xe]] Failed to
> > > > > > > remove unknown OA config' messages when running tests suites.
> > > > > > > 
> > > > > > > Or do you have other suggestion of uAPI that I can use.
> > > 
> > > Also, to return to the original issue, what exactly is the issue if dmesg
> > > is getting flooded when runing tests? Maybe it's ok? Or if it is not, why
> > > don't you turn off particular debug messages using
> > > /sys/module/drm/parameters/debug?
> > 
> > KMD logs are also important for UMD debug.
> 
> What about the answer to the first question: "what exactly is the issue if
> dmesg is getting flooded when runing tests"? How many lines are added per
> test? Why is it an issue?

Most tests will print one line, others will print two or more, depends on how many logical devices the test creates.

Just a example, started to run crucible that has 1024 tests on time 6399.935243, see in attachment how many 'Failed to remove unknown OA config' it
gets printed.
For my testing I have set xe_perf_stream_paranoid to false on my Xe KMD, so in a regular usage 'Insufficient privileges to remove xe OA config' would
be printed instead.

All those messages would cause developers to miss other important debug messages.

> 
> 
> 
> > 
> > > 
> > > So basically I don't want to tell you what to do or how to implement your
> > > stuff (as long as you reciprocally don't ask us to make changes
> > > either). The Xe uapi is exposed and userspace if free to use it however
> > > they want.
> > > 
> > > So anyway, the discussion in this thread has come up with a few options,
> > > which I can quickly summarize here:
> > > 
> > > * Live with the debug messages
> > > * Turn debug messages off with /sys/module/drm/parameters/debug
> > > * Query the OS for process capabilities or privileges
> > > * Refactor the code to not need oa_metrics_available()
> > > * Anything else? Another idea e.g. is to eventually convert debug messages
> > >   into dynamic debug which can be controlled at lower granularity iirc (so
> > >   e.g. you can turn off OA debug messages only but this needs some work).
> > 
> > I don't think I'm asking much, I just asking to remove 2 debug messages
> > to implement it in a Unix portable way that supports both capabilities.
> > 
> > > 
> > > So let's see where this goes :)
> > > 
> > > Thanks.
> > > --
> > > Ashutosh
> > > 
> > > 
> > > > > 
> > > > > OK, so you are relying on ENODEV and EACCES errno's from
> > > > > DRM_XE_PERF_OP_REMOVE_CONFIG to find out (a) if OA is present and (b) if
> > > > > you need to be root (actually CAP_PERFMON or CAP_SYS_ADMIN).
> > > > 
> > > > yep
> > > > 
> > > > > 
> > > > > This logic in Xe should be close to what we have in i915? What does Mesa do
> > > > > for i915, or what doesn't work in Xe?
> > > > > 
> > > > > Here are some pointers:
> > > > > 
> > > > > * You can execute DRM_XE_DEVICE_QUERY_OA_UNITS to see if OA is present
> > > > > 
> > > > > * Add/remove OA configs and using the global OAG buffer (time based
> > > > >   sampling or DRM_XE_OA_PROPERTY_SAMPLE_OA set) are priviliged operations
> > > > >   (need root). Operations which only need OAR/OAC (OA queries, without
> > > > >   DRM_XE_OA_PROPERTY_SAMPLE_OA) can be executed by non-root.
> > > > > 
> > > > > * If "/proc/sys/dev/xe/perf_stream_paranoid" is 0, all operations can be
> > > > >   executed by non-root users. Otherwise, as I described in the previous
> > > > >   point.
> > > > 
> > > > It is possible that process not started by root has CAP_PERFMON:
> > > > 
> > > > "Unprivileged processes with enabled CAP_PERFMON capability are treated
> > > > as privileged processes with respect to perf_events performance
> > > > monitoring and observability operations,..."
> > > > 
> > > > And from what I understood only root can write to perf_stream_paranoid, so I don't see a point in having this file...
> > > > 
> > > > > 
> > > > > So basically I think you just need to check for the perf_stream_paranoid
> > > > > file above. It will tell you both (a) if OA is present (because we are
> > > > > going to merge the code which creates this file together with OA) and (b)
> > > > > if you need to be root for particular operations.
> > > > > 
> > > > > Thanks.
> > > > > --
> > > > > Ashutosh
> > > > 
> > 

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: dmesg.txt
URL: <https://lists.freedesktop.org/archives/intel-xe/attachments/20240522/4d3249aa/attachment-0001.txt>


More information about the Intel-xe mailing list