[PATCH 1/2] drm/xe/guc: Default log level to non-verbose

Matthew Brost matthew.brost at intel.com
Thu Jun 12 18:29:55 UTC 2025


On Thu, Jun 12, 2025 at 11:21:38AM -0700, John Harrison wrote:
> On 6/12/2025 11:05 AM, Lucas De Marchi wrote:
> > Currently xe sets the guc log level to a verbose level since it's useful
> > to debug hangs and general development. However the verbose level may
> > already be too much and affect performance.
> > 
> > Michal Mrozek did some tests with the L0 compute stack for submission
> > latency with ULLS disabled. Below are the normalized numbers with log
> > level 3 (the current default) as baseline for each test:
> > 
> >                            Test \ Log Level                        3      0      1      2
> >   ----------------------------------------------------------- ------ ------ ------ ------
> >    BestWalkerNthCommandListSubmission(CmdListCount=2)           1.00   0.63   0.63   0.96
> >    BestWalkerNthSubmission(KernelCount=2)                       1.00   0.62   0.63   0.96
> >    BestWalkerNthSubmissionImmediate(KernelCount=2)              1.00   0.58   0.58   0.85
> >    BestWalkerSubmission                                         1.00   0.62   0.62   0.96
> >    BestWalkerSubmissionImmediate                                1.00   0.63   0.62   0.96
> >    BestWalkerSubmissionImmediateMultiCmdlists(cmdlistCount=2)   1.00   0.58   0.58   0.86
> >    BestWalkerSubmissionImmediateMultiCmdlists(cmdlistCount=4)   1.00   0.70   0.70   0.83
> >    BestWalkerSubmissionImmediateMultiCmdlists(cmdlistCount=8)   1.00   0.53   0.52   0.78
> > 
> > Log level 2 is the first "verbose level" for GuC, where the biggest
> > difference happens. Keep log level 3 for CONFIG_DRM_XE_DEBUG, but switch
> > to 1, i.e.  GUC_LOG_LEVEL_NON_VERBOSE, for "normal" builds.
> Note that this performance is understood, although it was not realised quite
> how much of a hit it was on this benchmark. The impact comes from logging
> around context switches. The logging adds a few microseconds to the context
> switch time. In general, this is not noticeable as the context switch time
> is negligible compared to the runtime for the workload itself. However, I'm
> guessing from the name that this benchmark is specifically measuring context
> switch performance with empty workloads. Thus it is the pathological worst
> case scenario with regards to the impact of the logging.
> 
> Anyway, not logging in release builds is generally a good idea and better
> benchmark scores are always good :).
> 

FWIW, a page fault benchmark from compute showed 15us better latency
without GuC logging in my testing. That involves context switches,
multiple H2G + G2H, and the page fault service. Pagefaults are certainly
a case we'd like to speed up.

Matt

> Reviewed-by: John Harrison <John.C.Harrison at Intel.com>
> 
> > 
> > Cc: Michal Mrozek <michal.mrozek at intel.com>
> > Cc: John Harrison <John.C.Harrison at Intel.com>
> > Signed-off-by: Lucas De Marchi <lucas.demarchi at intel.com>
> > ---
> >   drivers/gpu/drm/xe/xe_module.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/xe/xe_module.c b/drivers/gpu/drm/xe/xe_module.c
> > index 1c4dfafbcd0bc..4809afa7ce3f9 100644
> > --- a/drivers/gpu/drm/xe/xe_module.c
> > +++ b/drivers/gpu/drm/xe/xe_module.c
> > @@ -20,7 +20,7 @@
> >   struct xe_modparam xe_modparam = {
> >   	.probe_display = true,
> > -	.guc_log_level = 3,
> > +	.guc_log_level = IS_ENABLED(CONFIG_DRM_XE_DEBUG) ? 3 : 1,
> >   	.force_probe = CONFIG_DRM_XE_FORCE_PROBE,
> >   #ifdef CONFIG_PCI_IOV
> >   	.max_vfs = IS_ENABLED(CONFIG_DRM_XE_DEBUG) ? ~0 : 0,
> > 
> 


More information about the Intel-xe mailing list