[Nouveau] [PATCH] drm/ttm: Fix race condition in ttm_bo_delayed_delete

Thu Jan 21 05:40:42 PST 2010

> At a first glance:
>
> 1) We probably *will* need a delayed destroyed workqueue to avoid wasting
> memory that otherwise should be freed to the system. At the very least, the
> delayed delete process should optionally be run by a system shrinker.
You are right. For VRAM we don't care since we are the only user,
while for system backed memory some delayed destruction will be
needed.
The logical extension of the scheme would be for the Linux page
allocator/swapper to check for TTM buffers to destroy when it would
otherwise shrink caches, try to swap and/or wait on swap to happen.
Not sure whether there are existing hooks for this or where exactly to
hook this code.

> 2) Fences in TTM are currently not necessarily strictly ordered, and
> sequence numbers are hidden from the bo code. This means, for a given FIFO,
> fence sequence 3 may expire before fence sequence 2, depending on the usage
> of the buffer.

My definition of "channel" (I sometimes used FIFO incorrectly as a
synonym of that) is exactly a set of fences that are strictly ordered.
If the card has multiple HW engines, each is considered a different
channel (so that a channel becomes a (fifo, engine) pair).

We may need however to add the concept of a "sync domain" that would
be a set of channels that support on-GPU synchronization against each
other.
This would model hardware where channels with the same FIFO can be
synchronized together but those with different FIFOs don't, and also
multi-core GPUs where synchronization might be available only inside
each core and not across cores.

To sum it up, a GPU consists of a set of sync domains, each consisting
of a set of channels, each consisting of a sequence of fences, with
the following rules:
1. Fences within the same channel expire in order
2. If channels A and B belong to the same sync domain, it's possible
to emit a fence on A that is guaranteed to expire after an arbitrary
fence of B

Whether channels have the same FIFO or not is essentially a driver
implementation detail, and what TTM cares about is if they are in the
same sync domain.

[I just made up "sync domain" here: is there a standard term?]

This assumes that the "synchronizability" graph is a disjoint union of
complete graphs. Is there any example where it is not so?
Also, does this actually model correctly Poulsbo, or am I wrong?

Note that we could use CPU mediation more than we currently do.
For instance now Nouveau, to do inter-channel synchronization, simply
waits on the fence with the CPU immediately synchronously, while it
could instead queue the commands in software, and with an
interrupt/delayed mechanism submit them to hardware once the fence to
be waited for is expired.