[RFC 04/29] nvkm/vgpu: set the VF partition count when NVIDIA vGPU is enabled

Tue Oct 15 15:19:33 UTC 2024

On 15/10/2024 15.20, Jason Gunthorpe wrote:
> On Sun, Oct 13, 2024 at 06:54:32PM +0000, Zhi Wang wrote:
>> On 27/09/2024 1.51, Jason Gunthorpe wrote:
>>> On Sun, Sep 22, 2024 at 05:49:26AM -0700, Zhi Wang wrote:
>>>> GSP firmware needs to know the number of max-supported vGPUs when
>>>> initialization.
>>>>
>>>> The field of VF partition count in the GSP WPR2 is required to be set
>>>> according to the number of max-supported vGPUs.
>>>>
>>>> Set the VF partition count in the GSP WPR2 when NVKM is loading the GSP
>>>> firmware and initializes the GSP WPR2, if vGPU is enabled.
>>>
>>> How/why is this different from the SRIOV num_vfs concept?
>>>
>>
>> 1) The VF is considered as an HW interface of vGPU exposed to the VMM/VM.
>>
>> 2) Number of VF is not always equal to number of max vGPU supported,
>> which depends on a) the size of metadata of video memory space allocated
>> for FW to manage the vGPUs. b) how user divide the resources. E.g. if a
>> card has 48GB video memory, and user creates two vGPUs each has 24GB
>> video memory. Only two VFs are usable even SRIOV num_vfs can be large
>> than that.
> 
> But that can't be determine at driver load time, the profiling of the
> VFs must happen at run time when the orchestation determins what kind
> of VM instance type to run.
> 
> Which again gets back to the question of why do you need to specify
> the number of VFs at FW boot time? Why isn't it just fully dynamic and
> driven on the SRIOV enable?
> 

The FW needs to pre-calculate the reserved video memory for its own use, 
which includes the size of metadata of max-supported vGPUs. It needs to 
be decided at the FW loading time. We can always set it to the max 
number and the trade-off is we lose some usable video memory, at around 
(549-256)MB so far.

> Jason