[PATCH 1/2] drm/xe/guc: Default log level to non-verbose
Matthew Brost
matthew.brost at intel.com
Thu Jun 12 18:29:55 UTC 2025
On Thu, Jun 12, 2025 at 11:21:38AM -0700, John Harrison wrote:
> On 6/12/2025 11:05 AM, Lucas De Marchi wrote:
> > Currently xe sets the guc log level to a verbose level since it's useful
> > to debug hangs and general development. However the verbose level may
> > already be too much and affect performance.
> >
> > Michal Mrozek did some tests with the L0 compute stack for submission
> > latency with ULLS disabled. Below are the normalized numbers with log
> > level 3 (the current default) as baseline for each test:
> >
> > Test \ Log Level 3 0 1 2
> > ----------------------------------------------------------- ------ ------ ------ ------
> > BestWalkerNthCommandListSubmission(CmdListCount=2) 1.00 0.63 0.63 0.96
> > BestWalkerNthSubmission(KernelCount=2) 1.00 0.62 0.63 0.96
> > BestWalkerNthSubmissionImmediate(KernelCount=2) 1.00 0.58 0.58 0.85
> > BestWalkerSubmission 1.00 0.62 0.62 0.96
> > BestWalkerSubmissionImmediate 1.00 0.63 0.62 0.96
> > BestWalkerSubmissionImmediateMultiCmdlists(cmdlistCount=2) 1.00 0.58 0.58 0.86
> > BestWalkerSubmissionImmediateMultiCmdlists(cmdlistCount=4) 1.00 0.70 0.70 0.83
> > BestWalkerSubmissionImmediateMultiCmdlists(cmdlistCount=8) 1.00 0.53 0.52 0.78
> >
> > Log level 2 is the first "verbose level" for GuC, where the biggest
> > difference happens. Keep log level 3 for CONFIG_DRM_XE_DEBUG, but switch
> > to 1, i.e. GUC_LOG_LEVEL_NON_VERBOSE, for "normal" builds.
> Note that this performance is understood, although it was not realised quite
> how much of a hit it was on this benchmark. The impact comes from logging
> around context switches. The logging adds a few microseconds to the context
> switch time. In general, this is not noticeable as the context switch time
> is negligible compared to the runtime for the workload itself. However, I'm
> guessing from the name that this benchmark is specifically measuring context
> switch performance with empty workloads. Thus it is the pathological worst
> case scenario with regards to the impact of the logging.
>
> Anyway, not logging in release builds is generally a good idea and better
> benchmark scores are always good :).
>
FWIW, a page fault benchmark from compute showed 15us better latency
without GuC logging in my testing. That involves context switches,
multiple H2G + G2H, and the page fault service. Pagefaults are certainly
a case we'd like to speed up.
Matt
> Reviewed-by: John Harrison <John.C.Harrison at Intel.com>
>
> >
> > Cc: Michal Mrozek <michal.mrozek at intel.com>
> > Cc: John Harrison <John.C.Harrison at Intel.com>
> > Signed-off-by: Lucas De Marchi <lucas.demarchi at intel.com>
> > ---
> > drivers/gpu/drm/xe/xe_module.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/xe/xe_module.c b/drivers/gpu/drm/xe/xe_module.c
> > index 1c4dfafbcd0bc..4809afa7ce3f9 100644
> > --- a/drivers/gpu/drm/xe/xe_module.c
> > +++ b/drivers/gpu/drm/xe/xe_module.c
> > @@ -20,7 +20,7 @@
> > struct xe_modparam xe_modparam = {
> > .probe_display = true,
> > - .guc_log_level = 3,
> > + .guc_log_level = IS_ENABLED(CONFIG_DRM_XE_DEBUG) ? 3 : 1,
> > .force_probe = CONFIG_DRM_XE_FORCE_PROBE,
> > #ifdef CONFIG_PCI_IOV
> > .max_vfs = IS_ENABLED(CONFIG_DRM_XE_DEBUG) ? ~0 : 0,
> >
>
More information about the Intel-xe
mailing list