[PATCH 04/15] net: hbl_cn: QP state machine
Omer Shpigelman
oshpigelman at habana.ai
Tue Jun 18 07:58:55 UTC 2024
On 6/18/24 10:08, Leon Romanovsky wrote:
> On Tue, Jun 18, 2024 at 05:50:15AM +0000, Omer Shpigelman wrote:
>> On 6/17/24 16:18, Leon Romanovsky wrote:
>>> [Some people who received this message don't often get email from leon at kernel.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>>>
>>> On Thu, Jun 13, 2024 at 11:21:57AM +0300, Omer Shpigelman wrote:
>>>> Add a common QP state machine which handles the moving for a QP from one
>>>> state to another including performing necessary checks, draining
>>>> in-flight transactions, invalidating caches and error reporting.
>>>>
>>>> Signed-off-by: Omer Shpigelman <oshpigelman at habana.ai>
>>>> Co-developed-by: Abhilash K V <kvabhilash at habana.ai>
>>>> Signed-off-by: Abhilash K V <kvabhilash at habana.ai>
>>>> Co-developed-by: Andrey Agranovich <aagranovich at habana.ai>
>>>> Signed-off-by: Andrey Agranovich <aagranovich at habana.ai>
>>>> Co-developed-by: Bharat Jauhari <bjauhari at habana.ai>
>>>> Signed-off-by: Bharat Jauhari <bjauhari at habana.ai>
>>>> Co-developed-by: David Meriin <dmeriin at habana.ai>
>>>> Signed-off-by: David Meriin <dmeriin at habana.ai>
>>>> Co-developed-by: Sagiv Ozeri <sozeri at habana.ai>
>>>> Signed-off-by: Sagiv Ozeri <sozeri at habana.ai>
>>>> Co-developed-by: Zvika Yehudai <zyehudai at habana.ai>
>>>> Signed-off-by: Zvika Yehudai <zyehudai at habana.ai>
>>>> ---
>>>> .../ethernet/intel/hbl_cn/common/hbl_cn_qp.c | 480 +++++++++++++++++-
>>>> 1 file changed, 479 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_qp.c b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_qp.c
>>>> index 9ddc23bf8194..26ebdf448193 100644
>>>> --- a/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_qp.c
>>>> +++ b/drivers/net/ethernet/intel/hbl_cn/common/hbl_cn_qp.c
>>>> @@ -6,8 +6,486 @@
>>>
>>> <...>
>>>
>>>> +/* The following table represents the (valid) operations that can be performed on
>>>> + * a QP in order to move it from one state to another
>>>> + * For example: a QP in RTR state can be moved to RTS state using the CN_QP_OP_RTR_2RTS
>>>> + * operation.
>>>> + */
>>>> +static const enum hbl_cn_qp_state_op qp_valid_state_op[CN_QP_NUM_STATE][CN_QP_NUM_STATE] = {
>>>> + [CN_QP_STATE_RESET] = {
>>>> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
>>>> + [CN_QP_STATE_INIT] = CN_QP_OP_RST_2INIT,
>>>> + [CN_QP_STATE_SQD] = CN_QP_OP_NOP,
>>>> + [CN_QP_STATE_QPD] = CN_QP_OP_NOP,
>>>> + },
>>>> + [CN_QP_STATE_INIT] = {
>>>> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
>>>> + [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
>>>> + [CN_QP_STATE_INIT] = CN_QP_OP_NOP,
>>>> + [CN_QP_STATE_RTR] = CN_QP_OP_INIT_2RTR,
>>>> + [CN_QP_STATE_SQD] = CN_QP_OP_NOP,
>>>> + [CN_QP_STATE_QPD] = CN_QP_OP_NOP,
>>>> + },
>>>> + [CN_QP_STATE_RTR] = {
>>>> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
>>>> + [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
>>>> + [CN_QP_STATE_RTR] = CN_QP_OP_RTR_2RTR,
>>>> + [CN_QP_STATE_RTS] = CN_QP_OP_RTR_2RTS,
>>>> + [CN_QP_STATE_SQD] = CN_QP_OP_NOP,
>>>> + [CN_QP_STATE_QPD] = CN_QP_OP_RTR_2QPD,
>>>> + },
>>>> + [CN_QP_STATE_RTS] = {
>>>> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
>>>> + [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
>>>> + [CN_QP_STATE_RTS] = CN_QP_OP_RTS_2RTS,
>>>> + [CN_QP_STATE_SQD] = CN_QP_OP_RTS_2SQD,
>>>> + [CN_QP_STATE_QPD] = CN_QP_OP_RTS_2QPD,
>>>> + [CN_QP_STATE_SQERR] = CN_QP_OP_RTS_2SQERR,
>>>> + },
>>>> + [CN_QP_STATE_SQD] = {
>>>> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
>>>> + [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
>>>> + [CN_QP_STATE_SQD] = CN_QP_OP_SQD_2SQD,
>>>> + [CN_QP_STATE_RTS] = CN_QP_OP_SQD_2RTS,
>>>> + [CN_QP_STATE_QPD] = CN_QP_OP_SQD_2QPD,
>>>> + [CN_QP_STATE_SQERR] = CN_QP_OP_SQD_2SQ_ERR,
>>>> + },
>>>> + [CN_QP_STATE_QPD] = {
>>>> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
>>>> + [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
>>>> + [CN_QP_STATE_SQD] = CN_QP_OP_NOP,
>>>> + [CN_QP_STATE_QPD] = CN_QP_OP_NOP,
>>>> + [CN_QP_STATE_RTR] = CN_QP_OP_QPD_2RTR,
>>>> + },
>>>> + [CN_QP_STATE_SQERR] = {
>>>> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
>>>> + [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
>>>> + [CN_QP_STATE_SQD] = CN_QP_OP_SQ_ERR_2SQD,
>>>> + [CN_QP_STATE_SQERR] = CN_QP_OP_NOP,
>>>> + },
>>>> + [CN_QP_STATE_ERR] = {
>>>> + [CN_QP_STATE_RESET] = CN_QP_OP_2RESET,
>>>> + [CN_QP_STATE_ERR] = CN_QP_OP_2ERR,
>>>> + }
>>>> +};
>>>
>>> I don't understand why IBTA QP state machine is declared in ETH driver
>>> and not in IB driver.
>>>
>>
>> Implementing the actual transitions between the states requires full
>> knowledge of the HW e.g. when to flush, cache invalidation, timeouts.
>> Our IB driver is agnostic to the ASIC type by design. Note that more ASIC
>> generations are planned to be added and the IB driver should not be aware
>> of these additional HWs.
>> Hence we implemeted the QP state machine in the CN driver which is aware
>> of the actual HW.
>
> Somehow ALL other IB drivers are able to implement this logic in the IB,
> while supporting multiple ASICs. I don't see a reason why you can't do
> the same.
>
If we are referring to this actual table, then I can move it to the IB
driver and the CN driver will fetch the needed opcode via a function
pointer.
Is that ok?
>>
>>>> +
>>>
>>> <...>
>>>
>>>> + /* Release lock while we wait before retry.
>>>> + * Note, we can assert that we are already locked.
>>>> + */
>>>> + port_funcs->cfg_unlock(cn_port);
>>>> +
>>>> + msleep(20);
>>>> +
>>>> + port_funcs->cfg_lock(cn_port);
>>>
>>> lock/unlock through ops pointer doesn't look like a good idea.
>>>
>>
>> More ASIC generations will be added once we merge the current Gaudi2 code.
>> On other ASICs the locking granularity is different because some of the HW
>> resources are shared between different logical ports.
>> Hence it is was logical for us to implement it with a function pointer so
>> each ASIC specific code can implemnet the locking properly.
>
> We are reviewing this code which is for the current ASIC, not for the
> unknown future ASICs. Please don't over engineer the first submission.
> You will always be able to improve/change the code once you decide to
> upstream your future ASICs.
>
I see. I'll refactor the code then.
> Thanks
More information about the dri-devel
mailing list