[PATCH v3 00/15] CCS static load balance

Andi Shyti andi.shyti at linux.intel.com
Wed Aug 28 08:20:15 UTC 2024


Hi Sima,

first of all, thanks for looking into this series.

On Tue, Aug 27, 2024 at 07:31:21PM +0200, Daniel Vetter wrote:
> On Fri, Aug 23, 2024 at 03:08:40PM +0200, Andi Shyti wrote:
> > Hi,
> > 
> > This patch series introduces static load balancing for GPUs with
> > multiple compute engines. It's a lengthy series, and some
> > challenging aspects still need to be resolved.
> 
> Do we have an actual user for this, where just reloading the entire driver
> (or well-rebinding, if you only want to change the value for a specific
> device) with a new module option isn't enough?

Yes, we have users for this and this has been already agreed with
architects and maintainers.

Why are you saying that we are reloading/rebinding the driver?
I'm only removing the exposure of user engines, which is
basically a flag in the engines data structure.

> There's some really gnarly locking and lifetime fun in there, and it needs
> a corresponding justification.

What locking are you referring about?

I only added one single mutex that has a comment and a
justification. If you think that's not enough, I can of course
improve it (please note that the changes have a good amount of
comments and I tried to be aso more descriptive as I could).

When I change the engines configurations only for the compute
engines and only for DG2 platforms, I need to make sure that no
other user is affected by the change. Thus I need to make sure
that access to some of the strucures are properly serialized.

> Which needs to be enormous for this case,
> meaning actual customers willing to shout on dri-devel that they really,
> absolutely need this, or their machines will go up in flames.
> Otherwise this is a nack from me.

Would you please tell me why are you nacking the patch? So that I
address your comments for v4?

Thanks,
Andi


More information about the dri-devel mailing list