[RFC 0/9] Fix xe_force_wake_get() failure handling
Himal Prasad Ghimiray
himal.prasad.ghimiray at intel.com
Fri Aug 30 05:23:17 UTC 2024
In case of xe_force_wake_get() call the refcount for domains
are incremented irrespective of failure or success, which can lead to
undefined behaviour:
[1] subsequent forcewake call by callee function assumes domains are
already awake, which might not be true. This shows perfectly balanced
xe_force_wake_get/_put can also cause problem.
[2] The functions are using xe_force_wake_get() and return on failures.
In this scenario only the first caller returns and other functions will
always assume domains to be awake.
[1] func_a() {
XE_WARN(xe_force_wake_get()) <---> fails but increments refcount
func_b();
XE_WARN(xe_force_wake_put());<---> decrements refcounts
}
func_b() {
if(xe_force_wake_get()) <---> succeeds due to refcount of caller
return;
does mmio_operations(); <---> Domain might not be awake
xe_force_wake_put(); <---> decrement refcount
}
[2] func_a() {
int err = xe_force_wake_get(); <---> returns if fails, inc refcount
if (err)
return;
mmio_operations();
xe_force_wake_put();
}
func_b() {
if (xe_force_wake_get()) <---> succeeds due to refcnt of func_a()
return;
mmio_operations(); <---> Invalid accesses
xe_force_wake_put(); <---> decrements refcount for func_b
}
This RFC proposes not incrementing the refcount in case of
xe_force_wake_get() failure and putting all domains awakened by the caller
to sleep and decrease the reference count for all requested domains.
This prevents xe_force_wake_get() from leaving an unhandled reference
count in case of failure.
With these changes in place caller needs to ensure xe_force_wake_put() is
called only if xe_force_wake_get() succeeds.
Patches breakdown:
1. Changes in xe_force_wake_get() and documentations
- Patches 1 to 2
2. Ensure xe_force_wake_put() is called when xe_force_wake_get() succeds
- Patches 3 to 7
3. Modify xe_force_wake_put to return void and warn for failure
- Patch 8
4. Debugfs handling for forcewake open
- Patch 9
Cc: Badal Nilawar <badal.nilawar at intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi at intel.com>
Cc: Lucas De Marchi <lucas.demarchi at intel.com>
Cc: Nirmoy Das <nirmoy.das at intel.com>
Cc: Matthew Brost <matthew.brost at intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio at intel.com>
Cc: Ashutosh Dixit <ashutosh.dixit at intel.com>
Himal Prasad Ghimiray (9):
drm/xe: Error handling in xe_force_wake_get()
drm/xe: Ensure __must_check for xe_force_wake_get() return
drm/xe/gsc: call xe_force_wake_put() only if xe_force_wake_get()
succeeds
drm/xe/gt: call xe_force_wake_put() only if xe_force_wake_get()
succeeds
drm/xe/guc: call xe_force_wake_put() only if xe_force_wake_get()
succeeds
drm/xe/oa: Handle force_wake_get failure in xe_oa_stream_init()
drm/xe/gt_tlb_invalidation_ggtt: Call xe_force_wake_put if
xe_force_wake_get succeds
drm/xe: Change return type to void for xe_force_wake_put
drm/xe: forcewake debugfs open fails on xe_forcewake_get failure
drivers/gpu/drm/xe/xe_debugfs.c | 28 +++++++--
drivers/gpu/drm/xe/xe_device.c | 3 +-
drivers/gpu/drm/xe/xe_drm_client.c | 2 +-
drivers/gpu/drm/xe/xe_force_wake.c | 64 +++++++++++++++++----
drivers/gpu/drm/xe/xe_force_wake.h | 9 +--
drivers/gpu/drm/xe/xe_gsc.c | 14 +++--
drivers/gpu/drm/xe/xe_gt.c | 25 ++++----
drivers/gpu/drm/xe/xe_gt_debugfs.c | 2 +-
drivers/gpu/drm/xe/xe_gt_idle.c | 25 +++++---
drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 9 ++-
drivers/gpu/drm/xe/xe_guc.c | 9 ++-
drivers/gpu/drm/xe/xe_guc_pc.c | 15 +++--
drivers/gpu/drm/xe/xe_oa.c | 13 +++--
drivers/gpu/drm/xe/xe_pat.c | 10 ++--
drivers/gpu/drm/xe/xe_reg_sr.c | 6 +-
drivers/gpu/drm/xe/xe_vram.c | 3 +-
16 files changed, 165 insertions(+), 72 deletions(-)
--
2.34.1
More information about the Intel-xe
mailing list