[PATCH 09/15] net: hbl_en: add habanalabs Ethernet driver

Wed Jun 19 16:13:48 UTC 2024

On Wed, Jun 19, 2024 at 07:16:20AM +0000, Omer Shpigelman wrote:
> On 6/18/24 17:19, Andrew Lunn wrote:
> >>>> +static u32 hbl_en_get_mtu(struct hbl_aux_dev *aux_dev, u32 port_idx)
> >>>> +{
> >>>> +     struct hbl_en_port *port = HBL_EN_PORT(aux_dev, port_idx);
> >>>> +     struct net_device *ndev = port->ndev;
> >>>> +     u32 mtu;
> >>>> +
> >>>> +     if (atomic_cmpxchg(&port->in_reset, 0, 1)) {
> >>>> +             netdev_err(ndev, "port is in reset, can't get MTU\n");
> >>>> +             return 0;
> >>>> +     }
> >>>> +
> >>>> +     mtu = ndev->mtu;
> >>>
> >>> I think you need a better error message. All this does is access
> >>> ndev->mtu. What does it matter if the port is in reset? You don't
> >>> access it.
> >>>
> >>
> >> This function is called from the CN driver to get the current MTU in order
> >> to configure it to the HW, for exmaple when configuring an IB QP. The MTU
> >> value might be changed by user while we execute this function.
> > 
> > Change of MTU will happen while holding RTNL. Why not simply hold RTNL
> > while programming the hardware? That is the normal pattern for MAC
> > drivers.
> >
> 
> I can hold the RTNL lock while configuring the HW but it seems like a big
> overhead. Configuring the HW might take some time due to QP draining or
> cache invalidation.

How often does the MTU change? Once, maybe twice on boot, and never
again? MTU change is not hot path. For slow path code, KISS is much
better, so it is likely to be correct. 

> To me it seems unnecessary but if that's the common way then I'll change
> it.
>  
> >>>> +static int hbl_en_change_mtu(struct net_device *netdev, int new_mtu)
> >>>> +{
> >>>> +     struct hbl_en_port *port = hbl_netdev_priv(netdev);
> >>>> +     int rc = 0;
> >>>> +
> >>>> +     if (atomic_cmpxchg(&port->in_reset, 0, 1)) {
> >>>> +             netdev_err(netdev, "port is in reset, can't change MTU\n");
> >>>> +             return -EBUSY;
> >>>> +     }
> >>>> +
> >>>> +     if (netif_running(port->ndev)) {
> >>>> +             hbl_en_port_close(port);
> >>>> +
> >>>> +             /* Sleep in order to let obsolete events to be dropped before re-opening the port */
> >>>> +             msleep(20);
> >>>> +
> >>>> +             netdev->mtu = new_mtu;
> >>>> +
> >>>> +             rc = hbl_en_port_open(port);
> >>>> +             if (rc)
> >>>> +                     netdev_err(netdev, "Failed to reinit port for MTU change, rc %d\n", rc);
> >>>
> >>> Does that mean the port is FUBAR?
> >>>
> >>> Most operations like this are expected to roll back to the previous
> >>> working configuration on failure. So if changing the MTU requires new
> >>> buffers in your ring, you should first allocate the new buffers, then
> >>> free the old buffers, so that if allocation fails, you still have
> >>> buffers, and the device can continue operating.
> >>>
> >>
> >> A failure in opening a port is a fatal error. It shouldn't happen. This is
> >> not something we wish to recover from.
> > 
> > What could cause open to fail? Is memory allocated?
> > 
> 
> Memory is allocated but it is freed in case of a failure.
> Port opening can fail due to other reasons as well like some HW timeout
> while configuring the ETH QP.

If the hardware timeout because the hardware is dead, there is nothing
you can do about it. Its dead.

But what about when the system is under memory pressure? You say it
allocates memory. What happens if those allocations fail. Does
changing the MTU take me from a working system to a dead system? It is
good practice to not kill a working system under situations like
memory pressure. You try to first allocate the memory you need to
handle the new MTU, and only if successful do you free existing memory
you no longer need. That means if you cannot allocate the needed
memory, you still have the old memory, you can keep the old MTU and
return -ENOMEM, and the system keeps running.

> I didn't check that prior to my submit. Regarding this "no new module
> parameters allowed" rule, is that documented anywhere?

Lots of emails that fly passed on the mailing list. Maybe once every
couple of months when a vendor tries to mainline a new driver without
reading the mailing list for a few months to know how mainline
actually works. I _guess_ Davem has been pushing back on module
parameters for 10 years? Maybe more.

> if not, is that the
> common practice? not to try to do something that was not done recently?
> how "recently" is defined?
> I just want to clarify this because it's hard to handle these submissions
> when we write some code based on existing examples but then we are
> rejected because "we don't do that here anymore".
> I want to avoid future cases of this mismatch.

My suggestion would be to spend 30 minutes every day reading patches
and review comment on the mailing list. Avoid making the same mistakes
others make, especially newbies to mainline, and see what others are
doing in the same niche as this device. 30 minutes might seem like a
lot, but how much time did you waste implementing polling mode, now
you are going to throw it away?

> >>>> +                     ethtool_link_ksettings_add_link_mode(cmd, lp_advertising, Autoneg);
> >>>
> >>> That looks odd. Care to explain?
> >>>
> >>
> >> The HW of all of our ports supports autoneg.
> >> But in addition, the ports are divided to two groups:
> >> internal: ports which are connected to other Gaudi2 ports in the same server.
> >> external: ports which are connected to an external switch.
> >> Only internal ports use autoneg.
> >> The ports mask which sets each port as internal/external is fetched from
> >> the FW on device load.
> > 
> > That is not what i meant. lc_advertising should indicate the link
> > modes the peer is advertising. If this was a copper link, it typically
> > would contain 10BaseT-Half, 10BaseT-Full, 100BaseT-Half,
> > 100BaseT-Full, 1000BaseT-Half. Setting the Autoneg bit is pointless,
> > since the peer must be advertising in order that lp_advertising has a
> > value!
> > 
> 
> Sorry, but I don't get this. The problem is the setting of the Autoneg bit
> in lp_advertising? is that redundant? I see that other vendors set it too
> in case that Autoneg was completed.

$ ethtool eth0
Settings for eth0:
	Supported ports: [ TP	 MII ]
	Supported link modes:   10baseT/Half 10baseT/Full
	                        100baseT/Half 100baseT/Full
	                        1000baseT/Full

This is `supported`. The hardware can do these link modes.

	Supported pause frame use: Symmetric Receive-only
	Supports auto-negotiation: Yes

It also support symmetric pause, and can do autoneg.

	Supported FEC modes: Not reported
	Advertised link modes:  10baseT/Half 10baseT/Full
	                        100baseT/Half 100baseT/Full
	                        1000baseT/Full
	Advertised pause frame use: Symmetric Receive-only
	Advertised auto-negotiation: Yes
	Advertised FEC modes: Not reported

This is `advertising`, and is what this device is advertising to the
link partner. By default you copy supported into advertising, but the
user can use ethtool -s advertise N, where N is a list of link modes,
to change what is advertised to the link partner.

	Link partner advertised link modes:  10baseT/Half 10baseT/Full
	                                     100baseT/Half 100baseT/Full
	                                     1000baseT/Full
	Link partner advertised pause frame use: Symmetric
	Link partner advertised auto-negotiation: Yes
	Link partner advertised FEC modes: Not reported

This is `lp_advertising`, what the link partner is advertising to this
device. Once you have this, you mask lp_advertising with advertising,
and generally pick the link mode with the highest bandwidth:

	Speed: 1000Mb/s
	Duplex: Full

So autoneg resolved to 1000baseT/Full

	Andrew