[RFC PATCH 0/3] gpu: nova-core: add basic timer subdevice implementation

Tue Feb 18 13:35:03 UTC 2025

Hi Danilo,

On Tue Feb 18, 2025 at 6:33 AM JST, Danilo Krummrich wrote:
> Hi Alex,
>
> On Mon, Feb 17, 2025 at 11:04:45PM +0900, Alexandre Courbot wrote:
>> Hi everyone,
>> 
>> This short RFC is based on top of Danilo's initial driver stub series
>> [1] and has for goal to initiate discussions and hopefully some design
>> decisions using the simplest subdevice of the GPU (the timer) as an
>> example, before implementing more devices allowing the GPU
>> initialization sequence to progress (Falcon being the logical next step
>> so we can get the GSP rolling).
>> 
>> It is kept simple and short for that purpose, and to avoid bumping into
>> a wall with much more device code because my assumptions were incorrect.
>> 
>> This is my first time trying to write Rust kernel code, and some of my
>> questions below are probably due to me not understanding yet how to use
>> the core kernel interfaces. So before going further I thought it would
>> make sense to raise the most obvious questions that came to my mind
>> while writing this draft:
>
> Thanks for sending this RFC, that makes a lot of sense.
>
> It's great to see you picking up work on Nova and Rust in the kernel in general!
>
> One nit: For the future, please make sure to copy in the folks listed under the
> RUST entry in the maintainers file explicitly.

Ack. I tend to get nervous as the list of recipients increases and
reduce it to the people I believe will be strictly interested, but will
refrain from doing that in the future.

>
>> 
>> - Where and how to store subdevices. The timer device is currently a
>>   direct member of the GPU structure. It might work for GSP devices
>>   which are IIUC supposed to have at least a few fixed devices required
>>   to bring the GSP up ; but as a general rule this probably won't scale
>>   as not all subdevices are present on all GPU variants, or in the same
>>   numbers. So we will probably need to find an equivalent to the
>>   `subdev` linked list in Nouveau.
>
> Hm...I think a Vec should probably do the job for this. Once we know the
> chipset, we know the exact topology of subdevices too.
>
>> 
>> - BAR sharing between subdevices. Right now each subdevice gets access
>>   to the full BAR range. I am wondering whether we could not split it
>>   into the relevant slices for each-subdevice, and transfer ownership of
>>   each slice to the device that is supposed to use it. That way each
>>   register would have a single owner, which is arguably safer - but
>>   maybe not as flexible as we will need down the road?
>
> I think for self-contained subdevices we can easily add an abstraction for
> pci_iomap_range() to pci::Device. I considered doing that from the get-go, but
> then decided to wait until we have some actual use for that.

Yeah, my comment was just to check whether this makes sense at all, we
can definitely live without it for now. Would be a nice safety addition
though IMHO.

>
> For where we have to share a mapping of the same set of registers between
> multiple structures, I think we have to embedd in into an Arc (unfortunately,
> we can't re-use the inner Arc of Devres for that).
>
> An alternative would be to request a whole new mapping, i.e. Devres<pci::Bar>
> instance, but that includes an inner Arc anyways and, hence, is more costly.

Another way could be to make the owning subdevice itself implement the
required functionality through a method that other devices could call as
needed.

>
>> 
>> - On a related note, since the BAR is behind a Devres its availability
>>   must first be secured before any hardware access using try_access().
>>   Doing this on a per-register or per-operation basis looks overkill, so
>>   all methods that access the BAR take a reference to it, allowing to
>>   call try_access() from the highest-level caller and thus reducing the
>>   number of times this needs to be performed. Doing so comes at the cost
>>   of an extra argument to most subdevice methods ; but also with the
>>   benefit that we don't need to put the BAR behind another Arc and share
>>   it across all subdevices. I don't know which design is better here,
>>   and input would be very welcome.
>
> I'm not sure I understand you correctly, because what you describe here seem to
> be two different things to me.
>
> 1. How to avoid unnecessary calls to try_access().
>
> This is why I made Boot0.read() take a &RevocableGuard<'_, Bar0> as argument. I
> think we can just call try_access() once and then propage the guard through the
> callchain, where necessary.
>
> 2. Share the MMIO mapping between subdevices.
>
> This is where I can follow. How does 1. help with that? How are 1. and 2.
> related?

The idea was that by having the Gpu device secure access to the Bar and
pass a reference to the guard (or to the Bar itself since the guard
implements Deref, as I mentioned in [1]), we can avoid the repeated
calls to try_access() AND "share" the Bar between all the subdevices
through an argument instead of e.g. wrapping it into another Arc that
each subdevice would store.

But as Dave pointed out, it looks like this won't work in practice
anyway, so we can probably drop that idea...

[1] https://lore.kernel.org/nouveau/D7Q79WJZSFEK.R9BX1V85SV1Z@nvidia.com/