[Intel-gfx] possible struct_mutex race, gtt_space becomes invalid in execbuf under memory pressure

Siluvery, Arun arun.siluvery at intel.com
Mon Nov 18 16:51:30 CET 2013


Hi All,

I am running a repetitive test on HSW with max available RAM limited to
1GB (max TOLUD is 1GB) and it fails with NULL pointer dereference in
execbuf ioctl.

Debug showed that the batch_obj->gtt_space which was valid becomes NULL
before it is dispatched. During debug I stored batch_obj->gtt_space
address in execbuf and compared this address whenever an obj is freed in
i915_gem_object_unbind() and found that it is triggered by
i915_gem_fault(). It is freed as it is not yet pinned.

I have artificially incremented the pin_count of this bo to see if it
helps as a workaround but now I am seeing kernel panic with "general
protection fault".

It is not clear to me how i915_gem_fault() is able to acquire
struct_mutex as it is held by execbuf ioctl. It is released if
relocation is done by slow path but that is not the case here.

There are page allocation failures of different orders during the test,
but as the system runs out of memory, low memory killer starts killing
processes to free up space and also i915_gem_evict_everything() is
called to free space, the system recovers from it but is failing
randomly.

It looks like somewhere there is a possibility where struct_mutex is
released during execbuf and the fault handler is able to free the valid
bo because of memory pressure.
Is there any possibility for this to happen?

I really appreciate any suggestions on how to debug further.

regards
Arun


More information about the Intel-gfx mailing list