[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation

Alexandre Courbot acourbot at nvidia.com
Tue Feb 25 14:11:07 UTC 2025

On Mon Feb 24, 2025 at 9:07 PM JST, Danilo Krummrich wrote:
> CC: Gary
> On Mon, Feb 24, 2025 at 10:40:00AM +0900, Alexandre Courbot wrote:
>> This inability to sleep while we are accessing registers seems very
>> constraining to me, if not dangerous. It is pretty common to have
>> functions intermingle hardware accesses with other operations that might
>> sleep, and this constraint means that in such cases the caller would
>> need to perform guard lifetime management manually:
>>   let bar_guard = bar.try_access()?;
>>   /* do something non-sleeping with bar_guard */
>>   drop(bar_guard);
>>   /* do something that might sleep */
>>   let bar_guard = bar.try_access()?;
>>   /* do something non-sleeping with bar_guard */
>>   drop(bar_guard);
>>   ...
>> Failure to drop the guard potentially introduces a race condition, which
>> will receive no compile-time warning and potentialy not even a runtime
>> one unless lockdep is enabled. This problem does not exist with the
>> equivalent C code AFAICT, which makes the Rust version actually more
>> error-prone and dangerous, the opposite of what we are trying to achieve
>> with Rust. Or am I missing something?
> Generally you are right, but you have to see it from a different perspective.
> What you describe is not an issue that comes from the design of the API, but is
> a limitation of Rust in the kernel. People are aware of the issue and with klint
> [1] there are solutions for that in the pipeline, see also [2] and [3].
> [1] https://rust-for-linux.com/klint
> [2] https://github.com/Rust-for-Linux/klint/blob/trunk/doc/atomic_context.md
> [3] https://www.memorysafety.org/blog/gary-guo-klint-rust-tools/

Thanks, I wasn't aware of klint and it looks indeed cool, even if not perfect by
its own admission. But even if the ignore the safety issue, the other one
(ergonomics) is still there.

Basically this way of accessing registers imposes quite a mental burden on its
users. It requires a very different (and harsher) discipline than when writing
the same code in C, and the correct granularity to use is unclear to me.

For instance, if I want to do the equivalent of Nouveau's nvkm_usec() to poll a
particular register in a busy loop, should I call try_access() once before the
loop? Or every time before accessing the register? I'm afraid having to check
that the resource is still alive before accessing any register is going to
become tedious very quickly.

I understand that we want to protect against accessing the IO region of an
unplugged device ; but still there is no guarantee that the device won't be
unplugged in the middle of a critical section, however short. Thus the driver
code should be able to recognize that the device has fallen off the bus when it
e.g. gets a bunch of 0xff instead of a valid value. So do we really need to
extra protection that AFAICT isn't used in C?

More information about the Nouveau mailing list