[PATCH 11/16] gpu: nova-core: add falcon register definitions and base code

Joel Fernandes joelagnelf at nvidia.com
Wed Apr 30 23:08:54 UTC 2025


On 4/30/2025 2:16 PM, Danilo Krummrich wrote:
[...]
>>> It is also the one that makes use of methods to abstract things (vs.
>>> fixed parameters), so it is a natural candidate for using virtual
>>> methods. I am not a fan of having ever-growing boilerplate match
>>> statements for each method that needs to be abstracted, especially since
>>> this is that virtual methods do without requiring extra code, and for a
>>> runtime penalty that is completely negligible in our context and IMHO
>>> completely balanced by the smaller binary size that results from their
>>> use.
>>
>> Adding to what Alex said, note that the runtime cost is still there even without
>> using dyn. Because at runtime, the match conditionals need to route function
>> calls to the right place.
> 
> Honestly, I don't know how dynamic dispatch scales compared to static dispatch
> with conditionals.
> 
> OOC, I briefly looked for a benchmark and found [3], which doesn't look
> unreasonable at a first glance.
> 
> I modified it real quick to have more than 2 actions. [4]
> 
> 2 Actions
> ---------
> Dynamic Dispatch: time:   [2.0679 ns 2.0825 ns 2.0945 ns]
>  Static Dispatch: time:   [850.29 ps 851.05 ps 852.36 ps]
> 
> 20 Actions
> ----------
> Dynamic Dispatch: time:   [21.368 ns 21.827 ns 22.284 ns]
>  Static Dispatch: time:   [1.3623 ns 1.3703 ns 1.3793 ns]
> 
> 100 Actions
> -----------
> Dynamic Dispatch: time:   [103.72 ns 104.33 ns 105.13 ns]
>  Static Dispatch: time:   [4.5905 ns 4.6311 ns 4.6775 ns]
> 
> Absolutely take it with a grain of salt, I neither spend a lot of brain power
> nor time on this, which usually is not a great combination with benchmarking
> things. :)

Interesting, thanks for running the benchmark. I think this could be because of
function inlining during the static dispatch, so maybe at runtime there is no
overhead after all, even with long match statements. Just speculating, I have
not looked at codegen for this or anything.

But as you noted, the overhead still is not that much an issue (unless say the
method in concern is in an extremely hot path).

> 
> However, I think it's probably not too important here. Hence, feel free to go
> with dynamic dispatch for this.

Ok thanks, sounds good to me. It does seem the code is a lot more readable IMHO
as well, with dyn.

>> I am just not seeing the benefits of not using dyn for
>> this use case and only drawbacks. IMHO, we should try to not be doing the
>> compiler's job.
>>
>> Maybe the only benefit is you don't need an Arc or Kbox wrapper?
> 
> That's not a huge concern for me, it's only one single allocation per Engine,
> correct?

Yes, that's right. I was more referring to the fact that static dispatch as in
your example does not need Arc/Box, however even with Arc/Box IMHO the
readability of the code using dyn is more due to the lack of long match
statements on the access methods.

thanks,

- Joel



More information about the dri-devel mailing list