[PATCH i-g-t] lib/intel_device_info: Use per-thread device info cache

Matt Roper matthew.d.roper at intel.com
Wed Nov 20 22:35:46 UTC 2024


Relying on static variables to act as a cache inside the device info
lookup function is racy for multi-threaded tests since there's only one
copy of the static variables shared by all threads.  E.g.,

    Thread 1                Thread 2
    ========                ========
    cached_devid = devid;
                            if (cached_devid == devid)
                            return cache;
    cache = ...

Indeed, running xe_exec_threads repeatedly turns up cases where the test
fails with "Platform is missing PAT settings for uc/wt/wb" because a
racing thread wound up getting 'intel_generic_info' rather than a proper
device info structure (although it can take a couple thousand tries to
hit this specific race condition).  CI also sees this from time to time,
but the bugs often get closed as "cannot reproduce" because the failure
rate is so low.

We're probably going to be reworking this code completely in the
not-too-distant future since we really need to be looking things up from
the GMD_ID query rather than using hardcoded device ID table lookups,
but for now we can apply a quick fix of just making 'cached_devid' and
'cache' thread-local so that we're not racing on a  shared copy.  Simply
swapping the order of the assignment of 'cache' and 'cached_devid'
probably would have also worked around the specific race condition noted
above, but it would have left tests susceptible to other race conditions
in cases where a multi-threaded test is actually looking up different
device IDs / GPUs in its multiple threads.

Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1600
References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/1535
References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3236
Signed-off-by: Matt Roper <matthew.d.roper at intel.com>
---
 lib/intel_device_info.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/intel_device_info.c b/lib/intel_device_info.c
index 546b9c65a..d38dc415d 100644
--- a/lib/intel_device_info.c
+++ b/lib/intel_device_info.c
@@ -658,8 +658,8 @@ static const struct pci_id_match intel_device_match[] = {
  */
 const struct intel_device_info *intel_get_device_info(uint16_t devid)
 {
-	static const struct intel_device_info *cache = &intel_generic_info;
-	static uint16_t cached_devid;
+	static __thread const struct intel_device_info *cache = &intel_generic_info;
+	static __thread uint16_t cached_devid;
 	int i;
 
 	if (cached_devid == devid)
-- 
2.47.0



More information about the igt-dev mailing list