[Intel-xe] Code review for Xe driver - Second batch of comments (1/6)

Oded Gabbay ogabbay at kernel.org
Thu Jun 1 13:44:44 UTC 2023


Hi All,
Here is the second batch of comments, grouped by filename:

General: Use of defines in the middle of functions is really making it
hard to read the code. And it is a very uncommon practice. I can
understand definitions of flags inside structure, and even definitions
before functions, but just in the middle of the function makes the
reader lose track of the code. I would like to request to modify only
those parts.

xe_vm_types.h:
- Line 175: Either XE_VM_FLAGS_64K should be XE_VM_FLAG_64K or all
other XE_VM_FLAG_* should be XE_VM_FLAGS_*
    I would also love to have some documentation on what each flag
means. It's not trivial to understand that from the code.

xe_bo.h:
- Line 23: Where is the STOLEN define being used? I only found code
that checks if that flag is set

xe_bo.c:
- __xe_bo_create_locked(): Are you sure the bo will always be
destroyed (kfree) if a failure happens during this function ? AFAICS,
the bo can either be destroyed by a direct call to xe_bo_free(), or
through callback from ttm called
   xe_ttm_bo_destroy(). However, xe_ttm_bo_destroy is assigned only
when calling ttm_bo_init_reserved(). However, you can exit
__xe_bo_create_locked in line 1009 (which is before
ttm_bo_init_reserved). In addition, what
   happens if the call to ttm_bo_init_reserved fails ?

- Line 1096: after returning from __xe_bo_create_locked, if there was
an error and the bo was allocated in xe_bo_create_locked_range(),
don't we need to make sure it is freed ?

xe_vm.c:
- Line 1182: I have no idea what this comment means... Could you
please explain what the migrate vm code is doing there and WHY do we
need it in case the migration flag is not set ?

- number_gts: This local variable appears in 3 different functions in
this file and each time is calculated in a different manner :) Why not
keep this number in the device data structure and set it once during
init ? (Maybe it also happens in other files, idk)

- Line 1888: what is this asid ? Is it related to PASID when using ATS
or is this something else ?

xe_drm.h:
- Line 356: You need to add a comment for the flags, in case there is
some dependency or limitation. e.g. afaics, you can't create two vms,
one with DRM_XE_VM_CREATE_FAULT_MODE and the other without. Either all
vms are in fault mode or not. This should be written here explicitly
in the uapi file.

xe_guc_ads.c:
- No need for guc_ads_regset_size function, use field directly.
- Use define instead of guc_ads_capture_size.
- calculate_golden_lrc_size() - you are calculating golden ring size
for all possible engine types but not taking into account if engines
have several hw instances. Is it ok ?
- Line 452: not all struct fields (which are used later) are initialized.

xe_guc_ct_types.h:
- Line 17: XE_GUC_CT_SELFTEST is always defined, why do we need it?
- mutex lock in xe_guc_ct: same lock used for G2H and H2G actions so
these CTs can't work in parallel. Is there a reason why this can't be
separated into two different locks ? If so, I think it is worth
commenting somewhere in the code.

xe_guc_pc.c
- pc_gucrc_disable(): disabling registers which were never enabled
(lines 748-750).
- Line 896: Add mutex destroy for pc->freq_lock.

xe_guc_ct.c
- Line 45, 50: Please use define for uninitialized fence.
- Line 139: need to add mutex_destroy for ct->lock.
- Line 142: It appears that ct->fence_context is initialized but never
used. If so, please remove
- Although it's explained at the top, please add defines for offsets
in the CT buffer:
  H2G_CTB_DESC_OFFSET, G2H_CTB_DESC_OFFSET, H2G_CTB_BUF_OFFSET and
G2H_CTB_BUF_OFFSET.
- parse_g2h_event(): add default handling to switch.
- parse_g2h_msg(): header variable is unused.
- Lines 927: can't you use CIRC_SPACE instead of doing this manually ?
- xe_guc_ct_fast_path(): problematic use of ct->fast_lock spinlock.
g2h_read() can take some time to execute. Same for dequeue_one_g2h
which also calls to g2h_read(). spinlock protected code should execute
fast. Must we have a spinlock here ?
- xe_guc_tlb_invalidation_done_handler(): it asserts that the ct mutex
is locked, but I didn't see anywhere in the call stack of this
function that the mutex is locked

xe_guc_submit.c
- Line 230: mutex_init(&guc->submission_state.lock); without mutex_destroy.
- Lines 387, 396(!) and 399: please add defines which hold the
calculations and use them in the structure definitions.
- wq_wait_for_space(), Line 526: no handling in case there is no
available space. Actually, it looks like this if statement is
redundant since you check it again two lines after. This code is very
confusing, I recommend refactoring this.
- wq_item_append(): Lines 571 - 572 - Please add define instead of 3.
Leaving it to the user to understand where this +3 came from is not
helpful. Also there is no indication on success/fail, need to add
return code.
- submit_engine(): There is no indication of success/failure, need to
add return code.
- guc_engine_run_job(): maybe it's possible to merge assertions? There
are 7 assertions before emitting the job. Maybe add
engine_operational() which takes care of these assertions?
  Also, if job is not submitted to the engine, I think this function
should return ERR_VALUE to the drm scheduler.

xe_pci.c:
- Line 462: Consider using pcim_enable_device ? I see other drm
drivers (drivers/gpu/drm/tiny/bochs.c for example) using it.

xe_force_wake.c
- Line 44: mutex_init(&fw->lock); without mutex_destroy.

xe_pcode.c:
- Line 278: mutex_init(&gt->pcode.lock); without mutex_destroy.

xe_ttm_vram_mgr.c:
- Line 342: mutex_init(&mgr->lock); without mutex_destroy.

Thanks,
Oded


More information about the Intel-xe mailing list