[PATCH 11/15] RDMA/hbl: add habanalabs RDMA driver

Omer Shpigelman oshpigelman at habana.ai
Mon Jun 17 17:43:49 UTC 2024


On 6/13/24 22:18, Leon Romanovsky wrote:
> [Some people who received this message don't often get email from leon at kernel.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
> 
> On Thu, Jun 13, 2024 at 11:22:04AM +0300, Omer Shpigelman wrote:
>> Add an RDMA driver of Gaudi ASICs family for AI scaling.
>> The driver itself is agnostic to the ASIC in action, it operates according
>> to the capabilities that were passed on device initialization.
>> The device is initialized by the hbl_cn driver via auxiliary bus.
>> The driver also supports QP resource tracking and port/device HW counters.
>>
>> Signed-off-by: Omer Shpigelman <oshpigelman at habana.ai>
>> Co-developed-by: Abhilash K V <kvabhilash at habana.ai>
>> Signed-off-by: Abhilash K V <kvabhilash at habana.ai>
>> Co-developed-by: Andrey Agranovich <aagranovich at habana.ai>
>> Signed-off-by: Andrey Agranovich <aagranovich at habana.ai>
>> Co-developed-by: Bharat Jauhari <bjauhari at habana.ai>
>> Signed-off-by: Bharat Jauhari <bjauhari at habana.ai>
>> Co-developed-by: David Meriin <dmeriin at habana.ai>
>> Signed-off-by: David Meriin <dmeriin at habana.ai>
>> Co-developed-by: Sagiv Ozeri <sozeri at habana.ai>
>> Signed-off-by: Sagiv Ozeri <sozeri at habana.ai>
>> Co-developed-by: Zvika Yehudai <zyehudai at habana.ai>
>> Signed-off-by: Zvika Yehudai <zyehudai at habana.ai>
> 
> I afraid that you misinterpreted the "Co-developed-by" tag. All these
> people are probably touch the code and not actually sit together at
> the same room and write the code together. So, please remove the
> extensive "Co-developed-by" tags.
> 
> It is not full review yet, but simple pass-by-comments.
> 

Actually except of two, all of the mentioned persons sat in the same room
and developed the code together.
The remaining two are located on a different site (but also together).
Isn't that what "Co-developed-by" tag for?
I wanted to give them credit for writing the code but I can remove if it's
not common.

>> ---
>>  MAINTAINERS                              |   10 +
>>  drivers/infiniband/Kconfig               |    1 +
>>  drivers/infiniband/hw/Makefile           |    1 +
>>  drivers/infiniband/hw/hbl/Kconfig        |   17 +
>>  drivers/infiniband/hw/hbl/Makefile       |    8 +
>>  drivers/infiniband/hw/hbl/hbl.h          |  326 +++
>>  drivers/infiniband/hw/hbl/hbl_main.c     |  478 ++++
>>  drivers/infiniband/hw/hbl/hbl_verbs.c    | 2686 ++++++++++++++++++++++
>>  include/uapi/rdma/hbl-abi.h              |  204 ++
>>  include/uapi/rdma/hbl_user_ioctl_cmds.h  |   66 +
>>  include/uapi/rdma/hbl_user_ioctl_verbs.h |  106 +
>>  include/uapi/rdma/ib_user_ioctl_verbs.h  |    1 +
>>  12 files changed, 3904 insertions(+)
>>  create mode 100644 drivers/infiniband/hw/hbl/Kconfig
>>  create mode 100644 drivers/infiniband/hw/hbl/Makefile
>>  create mode 100644 drivers/infiniband/hw/hbl/hbl.h
>>  create mode 100644 drivers/infiniband/hw/hbl/hbl_main.c
>>  create mode 100644 drivers/infiniband/hw/hbl/hbl_verbs.c
>>  create mode 100644 include/uapi/rdma/hbl-abi.h
>>  create mode 100644 include/uapi/rdma/hbl_user_ioctl_cmds.h
>>  create mode 100644 include/uapi/rdma/hbl_user_ioctl_verbs.h
> 
> <...>
> 
>> +#define hbl_ibdev_emerg(ibdev, format, ...)  ibdev_emerg(ibdev, format, ##__VA_ARGS__)
>> +#define hbl_ibdev_alert(ibdev, format, ...)  ibdev_alert(ibdev, format, ##__VA_ARGS__)
>> +#define hbl_ibdev_crit(ibdev, format, ...)   ibdev_crit(ibdev, format, ##__VA_ARGS__)
>> +#define hbl_ibdev_err(ibdev, format, ...)    ibdev_err(ibdev, format, ##__VA_ARGS__)
>> +#define hbl_ibdev_warn(ibdev, format, ...)   ibdev_warn(ibdev, format, ##__VA_ARGS__)
>> +#define hbl_ibdev_notice(ibdev, format, ...) ibdev_notice(ibdev, format, ##__VA_ARGS__)
>> +#define hbl_ibdev_info(ibdev, format, ...)   ibdev_info(ibdev, format, ##__VA_ARGS__)
>> +#define hbl_ibdev_dbg(ibdev, format, ...)    ibdev_dbg(ibdev, format, ##__VA_ARGS__)
>> +
>> +#define hbl_ibdev_emerg_ratelimited(ibdev, fmt, ...)         \
>> +     ibdev_emerg_ratelimited(ibdev, fmt, ##__VA_ARGS__)
>> +#define hbl_ibdev_alert_ratelimited(ibdev, fmt, ...)         \
>> +     ibdev_alert_ratelimited(ibdev, fmt, ##__VA_ARGS__)
>> +#define hbl_ibdev_crit_ratelimited(ibdev, fmt, ...)          \
>> +     ibdev_crit_ratelimited(ibdev, fmt, ##__VA_ARGS__)
>> +#define hbl_ibdev_err_ratelimited(ibdev, fmt, ...)           \
>> +     ibdev_err_ratelimited(ibdev, fmt, ##__VA_ARGS__)
>> +#define hbl_ibdev_warn_ratelimited(ibdev, fmt, ...)          \
>> +     ibdev_warn_ratelimited(ibdev, fmt, ##__VA_ARGS__)
>> +#define hbl_ibdev_notice_ratelimited(ibdev, fmt, ...)                \
>> +     ibdev_notice_ratelimited(ibdev, fmt, ##__VA_ARGS__)
>> +#define hbl_ibdev_info_ratelimited(ibdev, fmt, ...)          \
>> +     ibdev_info_ratelimited(ibdev, fmt, ##__VA_ARGS__)
>> +#define hbl_ibdev_dbg_ratelimited(ibdev, fmt, ...)           \
>> +     ibdev_dbg_ratelimited(ibdev, fmt, ##__VA_ARGS__)
>> +
> 
> Please don't redefine the existing macros. Just use the existing ones.
> 
> 
> <...>
> 

That's a leftover from some debug code. I'll remove.

>> +     if (hbl_ib_match_netdev(ibdev, netdev))
>> +             ib_port = hbl_to_ib_port_num(hdev, netdev->dev_port);
>> +     else
>> +             return NOTIFY_DONE;
> 
> It is not kernel coding style. Please write:
> if (!hbl_ib_match_netdev(ibdev, netdev))
>     return NOTIFY_DONE;
> 
> ib_port = hbl_to_ib_port_num(hdev, netdev->dev_port);
> 

I'll fix the code, thanks.

>> +
> 
> <...>
> 
>> +static int hbl_ib_probe(struct auxiliary_device *adev, const struct auxiliary_device_id *id)
>> +{
>> +     struct hbl_aux_dev *aux_dev = container_of(adev, struct hbl_aux_dev, adev);
>> +     struct hbl_ib_aux_ops *aux_ops = aux_dev->aux_ops;
>> +     struct hbl_ib_device *hdev;
>> +     ktime_t timeout;
>> +     int rc;
>> +
>> +     rc = hdev_init(aux_dev);
>> +     if (rc) {
>> +             dev_err(&aux_dev->adev.dev, "Failed to init hdev\n");
>> +             return -EIO;
>> +     }
>> +
>> +     hdev = aux_dev->priv;
>> +
>> +     /* don't allow module unloading while it is attached */
>> +     if (!try_module_get(THIS_MODULE)) {
> 
> This part makes wonder, what are you trying to do here? What doesn't work for you
> in standard driver core and module load mechanism?
> 

Before auxiliary bus was introduced, we used EXPORT_SYMBOLs for inter
driver communication. That incremented the refcount of the used module so
it couldn't be removed while it is in use.
Auxiliary bus usage doesn't increment the used module refcount and hence
the used module can be removed while it is in use and that's something
we don't want to allow.
We could solve it by some global locking or in_use atomic but the most
simple and clean way is just to increment the used module refcount on
auxiliary device probe and decrement it on auxiliary device removal.

>> +             dev_err(hdev->dev, "Failed to increment %s module refcount\n",
>> +                     module_name(THIS_MODULE));
>> +             rc = -EIO;
>> +             goto module_get_err;
>> +     }
>> +
>> +     timeout = ktime_add_ms(ktime_get(), hdev->pending_reset_long_timeout * MSEC_PER_SEC);
>> +     while (1) {
>> +             aux_ops->hw_access_lock(aux_dev);
>> +
>> +             /* if the device is operational, proceed to actual init while holding the lock in
>> +              * order to prevent concurrent hard reset
>> +              */
>> +             if (aux_ops->device_operational(aux_dev))
>> +                     break;
>> +
>> +             aux_ops->hw_access_unlock(aux_dev);
>> +
>> +             if (ktime_compare(ktime_get(), timeout) > 0) {
>> +                     dev_err(hdev->dev, "Timeout while waiting for hard reset to finish\n");
>> +                     rc = -EBUSY;
>> +                     goto timeout_err;
>> +             }
>> +
>> +             dev_notice_once(hdev->dev, "Waiting for hard reset to finish before probing IB\n");
>> +
>> +             msleep_interruptible(MSEC_PER_SEC);
>> +     }
> 
> The code above is unexpected.
> 

We have no control on when the user insmod the IB driver. As a result it
is possible that the IB auxiliary device will be probed while the compute
device is under reset (due to some HW error). During device reset we must
not access the HW. We should wait for the compute device to finish the
reset flow before we continue with the IB device probe and block any
compute device reset flow meanwhile.

>> +
>> +     rc = hbl_ib_dev_init(hdev);
>> +     if (rc) {
>> +             dev_err(hdev->dev, "Failed to init ib device\n");
>> +             goto dev_init_err;
>> +     }
>> +
>> +     aux_ops->hw_access_unlock(aux_dev);
>> +
>> +     return 0;
>> +
>> +dev_init_err:
>> +     aux_ops->hw_access_unlock(aux_dev);
>> +timeout_err:
>> +     module_put(THIS_MODULE);
>> +module_get_err:
>> +     hdev_fini(aux_dev);
>> +
>> +     return rc;
>> +}
> 
> <...>
> 
>> +static int __init hbl_ib_init(void)
>> +{
>> +     pr_info("loading driver\n");
> 
> Please remove all these debug prints and leave only the necessary ones.
> 

Sure, will remove.

>> +
>> +     return auxiliary_driver_register(&hbl_ib_driver);
>> +}
>> +
>> +static void __exit hbl_ib_exit(void)
>> +{
>> +     auxiliary_driver_unregister(&hbl_ib_driver);
>> +
>> +     pr_info("driver removed\n");
>> +}
>> +
>> +module_init(hbl_ib_init);
>> +module_exit(hbl_ib_exit)
> 
> Thanks


More information about the dri-devel mailing list