[PATCH v4 3/8] accel/qaic: Add MHI controller

Mon Mar 27 14:34:54 UTC 2023

On 3/27/2023 12:59 AM, Manivannan Sadhasivam wrote:
> On Fri, Mar 24, 2023 at 09:26:49AM -0600, Jeffrey Hugo wrote:
>> On 3/24/2023 4:26 AM, Manivannan Sadhasivam wrote:
>>> On Mon, Mar 20, 2023 at 09:11:09AM -0600, Jeffrey Hugo wrote:
>>>> An AIC100 device contains a MHI interface with a number of different
>>>> channels for controlling different aspects of the device. The MHI
>>>> controller works with the MHI bus to enable and drive that interface.
>>>>
>>>> AIC100 uses the BHI protocol in PBL to load SBL. The MHI controller
>>>> expects the SBL to be located at /lib/firmware/qcom/aic100/sbl.bin and
>>>> expects the MHI bus to manage the process of loading and sending SBL to
>>>> the device.
>>>>
>>>> Signed-off-by: Jeffrey Hugo <quic_jhugo at quicinc.com>
>>>> Reviewed-by: Carl Vanderlip <quic_carlv at quicinc.com>
>>>> Reviewed-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy at quicinc.com>
>>>> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka at linux.intel.com>
>>>> ---
>>>>    drivers/accel/qaic/mhi_controller.c | 563 ++++++++++++++++++++++++++++++++++++
>>>>    drivers/accel/qaic/mhi_controller.h |  16 +
>>>>    2 files changed, 579 insertions(+)
>>>>    create mode 100644 drivers/accel/qaic/mhi_controller.c
>>>>    create mode 100644 drivers/accel/qaic/mhi_controller.h
>>>>
>>>> diff --git a/drivers/accel/qaic/mhi_controller.c b/drivers/accel/qaic/mhi_controller.c
>>>> new file mode 100644
>>>> index 0000000..777dfbe
>>>> --- /dev/null
>>>> +++ b/drivers/accel/qaic/mhi_controller.c
>>>> @@ -0,0 +1,563 @@
>>>> +// SPDX-License-Identifier: GPL-2.0-only
>>>> +
>>>> +/* Copyright (c) 2019-2021, The Linux Foundation. All rights reserved. */
>>>> +/* Copyright (c) 2021-2023 Qualcomm Innovation Center, Inc. All rights reserved. */
>>>> +
>>>> +#include <linux/delay.h>
>>>> +#include <linux/err.h>
>>>> +#include <linux/memblock.h>
>>>> +#include <linux/mhi.h>
>>>> +#include <linux/moduleparam.h>
>>>> +#include <linux/pci.h>
>>>> +#include <linux/sizes.h>
>>>> +
>>>> +#include "mhi_controller.h"
>>>> +#include "qaic.h"
>>>> +
>>>> +#define MAX_RESET_TIME_SEC 25
>>>> +
>>>> +static unsigned int mhi_timeout_ms = 2000; /* 2 sec default */
>>>> +module_param(mhi_timeout_ms, uint, 0600);
>>>> +MODULE_PARM_DESC(mhi_timeout_ms, "MHI controller timeout value");
>>>> +
>>>> +static struct mhi_channel_config aic100_channels[] = {
>>>> +	{
>>>> +		.name = "QAIC_LOOPBACK",
>>>
>>> Why do you need QAIC_ prefix for channel names?
>>
>> To avoid existing and anticipated conflicts.
>>
>> As you are aware, the channel name becomes critical for the bus device and
>> is the key that the consumer driver will probe on.
>>
>> Sadly, that is rife for conflicts.  You can only have one driver for a
>> particular MHI device (channel).  Multiple drivers can register for it, but
>> only the first one will bind to the device.  This creates a race condition.
>> Whoever is able to register with the bus first, owns all instances of that
>> device.  That also means that particular driver on the bus also needs to be
>> able to handle all instances of that device.
>>
>> The WWAN subsystem already claims DIAG.  You and I both know from the WWAN
>> subsystem creation experience, the Net folks don't want a common framework
>> that can service multiple types of devices.  QAIC devices are not WWAN
>> devices, and were an argument for having a WWAN specific thing.  So, I can't
>> leverage WWAN, and frankly I shouldn't because my device is not a WWAN
>> device.  The WWAN userspace shouldn't try to use ACCEL/QAIC devices (one of
>> the reasons for having ACCEL instead of DRM).  Therefore DIAG devices are
>> WWAN exclusive, and I need to have a different device.  "DIAG2" seems like a
>> poor name.  If the QAIC DIAG device is going to be QAIC specific, having
>> QAIC in the name to isolate and identify it seems like the best option.
>>
>> I anticipate similar conflicts with
>> SAHARA/QDSS/DEBUG/TIMESYNC/LOGGING/LOOPBACK.  All of these are "common" with
>> other existing MHI devices.
>>
> 
> Hmm, this is something I anticipated to happen... :/
> 
>> I anticipate future conflicts with STATUS/RAS/TELEMETRY/CONTROL/SSR. These
>> are rather generic channel names.  It seems likely that a future WWAN device
>> or other MHI device would want a channel with the same name as one of these.
>> I'd like to leave that open as a possibility by not exclusivly claiming the
>> sole use to one of these names.
>>
>> Arguably this is an internal implementation detail with how the MHI bus
>> operates and could be fixed at first look.  However I don't think that is
>> the case because it looks like the WWAN subsystem is exposing these names to
>> userspace, which creates a uAPI that cannot be broken. Therefore I think we
>> are rather quite stuck with this situation and what I have proposed with
>> this patch is the best thing I've come up with to address the problem.  If
>> you have an alternate suggestion, I'm willing to discuss with you.
>>
> 
> I think what you have is the best for now. Only downside of this approach is
> the code duplication among the client drivers but I think we compromised this
> during the WWAN framework discussion.
> 
>>>
>>>> +		.num = 0,
>>>> +		.num_elements = 32,
>>>> +		.local_elements = 0,
>>>> +		.event_ring = 0,
>>>> +		.dir = DMA_TO_DEVICE,
>>>> +		.ee_mask = MHI_CH_EE_AMSS,
>>>> +		.pollcfg = 0,
>>>> +		.doorbell = MHI_DB_BRST_DISABLE,
>>>> +		.lpm_notify = false,
>>>> +		.offload_channel = false,
>>>> +		.doorbell_mode_switch = false,
>>>> +		.auto_queue = false,
>>>> +		.wake_capable = false,
>>>> +	},
>>>
>>> [...]
>>>
>>>> +static struct mhi_event_config aic100_events[] = {
>>>> +	{
>>>> +		.num_elements = 32,
>>>> +		.irq_moderation_ms = 0,
>>>> +		.irq = 0,
>>>> +		.channel = U32_MAX,
>>>> +		.priority = 1,
>>>> +		.mode = MHI_DB_BRST_DISABLE,
>>>> +		.data_type = MHI_ER_CTRL,
>>>> +		.hardware_event = false,
>>>> +		.client_managed = false,
>>>> +		.offload_channel = false,
>>>> +	},
>>>> +};
>>>> +
>>>
>>> It'd be nice to use macros for defining the channels and events as done in the
>>> pci_generic driver.
>>
>> I think the pci_generic driver has a usecase for using a macro in that it is
>> servicing multiple devices, with different configuration.  Right now, we
>> only have the one device with the one config.  I suspect that will change in
>> the future, but I don't have concrete information at the time to inform a
>> proper design.
>>
>> I feel this should be left until such time the multi-device scenario becomes
>> realized.
>>
> 
> Ok, this sounds reasonable to me.
> 
>>>
>>>> +static struct mhi_controller_config aic100_config = {
>>>> +	.max_channels = 128,
>>>> +	.timeout_ms = 0, /* controlled by mhi_timeout */
>>>> +	.buf_len = 0,
>>>> +	.num_channels = ARRAY_SIZE(aic100_channels),
>>>> +	.ch_cfg = aic100_channels,
>>>> +	.num_events = ARRAY_SIZE(aic100_events),
>>>> +	.event_cfg = aic100_events,
>>>> +	.use_bounce_buf = false,
>>>> +	.m2_no_db = false,
>>>> +};
>>>> +
>>>> +static int mhi_read_reg(struct mhi_controller *mhi_cntl, void __iomem *addr, u32 *out)
>>>> +{
>>>> +	u32 tmp = readl_relaxed(addr);
>>>> +
>>>> +	if (tmp == U32_MAX)
>>>> +		return -EIO;
>>>> +
>>>> +	*out = tmp;
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +static void mhi_write_reg(struct mhi_controller *mhi_cntl, void __iomem *addr, u32 val)
>>>> +{
>>>> +	writel_relaxed(val, addr);
>>>> +}
>>>> +
>>>> +static int mhi_runtime_get(struct mhi_controller *mhi_cntl)
>>>> +{
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +static void mhi_runtime_put(struct mhi_controller *mhi_cntl)
>>>> +{
>>>> +}
>>>> +
>>>> +static void mhi_status_cb(struct mhi_controller *mhi_cntl, enum mhi_callback reason)
>>>> +{
>>>> +	struct qaic_device *qdev = pci_get_drvdata(to_pci_dev(mhi_cntl->cntrl_dev));
>>>> +
>>>> +	/* this event occurs in atomic context */
>>>> +	if (reason == MHI_CB_FATAL_ERROR)
>>>> +		pci_err(qdev->pdev, "Fatal error received from device. Attempting to recover\n");
>>>
>>> Why no dev_err()?
>>
>> pci_err is more specific than dev_err.  It is built upon dev_err. pci_err
>> seems to be preferred for pci devices, and also matches uses elsewhere in
>> the driver.
>>
> 
> Ok.
> 
>>>
>>>> +	/* this event occurs in non-atomic context */
>>>> +	if (reason == MHI_CB_SYS_ERROR)
>>>> +		qaic_dev_reset_clean_local_state(qdev, true);
>>>> +}
>>>> +
>>>> +static int mhi_reset_and_async_power_up(struct mhi_controller *mhi_cntl)
>>>> +{
>>>> +	char time_sec = 1;
>>>
>>> u8?
>>
>> Eh.  Ok I guess.  I usually reserve the size specific types for things where
>> that size is required, such as sending data over a network.
>>
>>>
>>>> +	int current_ee;
>>>> +	int ret;
>>>> +
>>>> +	/* Reset the device to bring the device in PBL EE */
>>>> +	mhi_soc_reset(mhi_cntl);
>>>> +
>>>> +	/*
>>>> +	 * Keep checking the execution environment(EE) after every 1 second
>>>> +	 * interval.
>>>> +	 */
>>>> +	do {
>>>> +		msleep(1000);
>>>> +		current_ee = mhi_get_exec_env(mhi_cntl);
>>>> +	} while (current_ee != MHI_EE_PBL && time_sec++ <= MAX_RESET_TIME_SEC);
>>>> +
>>>> +	/* If the device is in PBL EE retry power up */
>>>> +	if (current_ee == MHI_EE_PBL)
>>>> +		ret = mhi_async_power_up(mhi_cntl);
>>>> +	else
>>>> +		ret = -EIO;
>>>> +
>>>> +	return ret;
>>>> +}
>>>> +
>>>> +struct mhi_controller *qaic_mhi_register_controller(struct pci_dev *pci_dev, void __iomem *mhi_bar,
>>>> +						    int mhi_irq)
>>>> +{
>>>> +	struct mhi_controller *mhi_cntl;
>>>
>>> Cosmetic change: We use "mhi_cntrl" in other controller drivers. So it is
>>> better to follow the same pattern here also.
>>
>> If you insist.  "cntl" is the more common abbreviation.  The MHI bus is the
>> first place I recall seeing "cntrl".
>>
> 
> For some reason, all MHI controller drivers have picked up this notation. So
> I'd like to keep it same.

Done.

> 
>>>
>>>> +	int ret;
>>>> +
>>>> +	mhi_cntl = devm_kzalloc(&pci_dev->dev, sizeof(*mhi_cntl), GFP_KERNEL);
>>>> +	if (!mhi_cntl)
>>>> +		return ERR_PTR(-ENOMEM);
>>>> +
>>>> +	mhi_cntl->cntrl_dev = &pci_dev->dev;
>>>> +
>>>> +	/*
>>>> +	 * Covers the entire possible physical ram region. Remote side is
>>>> +	 * going to calculate a size of this range, so subtract 1 to prevent
>>>> +	 * rollover.
>>>> +	 */
>>>> +	mhi_cntl->iova_start = 0;
>>>> +	mhi_cntl->iova_stop = PHYS_ADDR_MAX - 1;
>>>> +	mhi_cntl->status_cb = mhi_status_cb;
>>>> +	mhi_cntl->runtime_get = mhi_runtime_get;
>>>> +	mhi_cntl->runtime_put = mhi_runtime_put;
>>>> +	mhi_cntl->read_reg = mhi_read_reg;
>>>> +	mhi_cntl->write_reg = mhi_write_reg;
>>>> +	mhi_cntl->regs = mhi_bar;
>>>> +	mhi_cntl->reg_len = SZ_4K;
>>>
>>> Is this size fixed for all AIC100 revisions? I think you should get this value
>>> from pci_resource_len() to avoid issues later.
>>
>> Yes, this size is burned into the silicon with no provision for ever
>> changing it.
>>
> 
> Fine then!
> 
> Thanks,
> Mani
> 
>>>
>>> Thanks,
>>> Mani
>>>
>>>> +	mhi_cntl->nr_irqs = 1;
>>>> +	mhi_cntl->irq = devm_kmalloc(&pci_dev->dev, sizeof(*mhi_cntl->irq), GFP_KERNEL);
>>>> +
>>>> +	if (!mhi_cntl->irq)
>>>> +		return ERR_PTR(-ENOMEM);
>>>> +
>>>> +	mhi_cntl->irq[0] = mhi_irq;
>>>> +	mhi_cntl->fw_image = "qcom/aic100/sbl.bin";
>>>> +
>>>> +	/* use latest configured timeout */
>>>> +	aic100_config.timeout_ms = mhi_timeout_ms;
>>>> +	ret = mhi_register_controller(mhi_cntl, &aic100_config);
>>>> +	if (ret) {
>>>> +		pci_err(pci_dev, "mhi_register_controller failed %d\n", ret);
>>>> +		return ERR_PTR(ret);
>>>> +	}
>>>> +
>>>> +	ret = mhi_prepare_for_power_up(mhi_cntl);
>>>> +	if (ret) {
>>>> +		pci_err(pci_dev, "mhi_prepare_for_power_up failed %d\n", ret);
>>>> +		goto prepare_power_up_fail;
>>>> +	}
>>>> +
>>>> +	ret = mhi_async_power_up(mhi_cntl);
>>>> +	/*
>>>> +	 * If EIO is returned it is possible that device is in SBL EE, which is
>>>> +	 * undesired. SOC reset the device and try to power up again.
>>>> +	 */
>>>> +	if (ret == -EIO && MHI_EE_SBL == mhi_get_exec_env(mhi_cntl)) {
>>>> +		pci_err(pci_dev, "Found device in SBL at MHI init. Attempting a reset.\n");
>>>> +		ret = mhi_reset_and_async_power_up(mhi_cntl);
>>>> +	}
>>>> +
>>>> +	if (ret) {
>>>> +		pci_err(pci_dev, "mhi_async_power_up failed %d\n", ret);
>>>> +		goto power_up_fail;
>>>> +	}
>>>> +
>>>> +	return mhi_cntl;
>>>> +
>>>> +power_up_fail:
>>>> +	mhi_unprepare_after_power_down(mhi_cntl);
>>>> +prepare_power_up_fail:
>>>> +	mhi_unregister_controller(mhi_cntl);
>>>> +	return ERR_PTR(ret);
>>>> +}
>>>> +
>>>> +void qaic_mhi_free_controller(struct mhi_controller *mhi_cntl, bool link_up)
>>>> +{
>>>> +	mhi_power_down(mhi_cntl, link_up);
>>>> +	mhi_unprepare_after_power_down(mhi_cntl);
>>>> +	mhi_unregister_controller(mhi_cntl);
>>>> +}
>>>> +
>>>> +void qaic_mhi_start_reset(struct mhi_controller *mhi_cntl)
>>>> +{
>>>> +	mhi_power_down(mhi_cntl, true);
>>>> +}
>>>> +
>>>> +void qaic_mhi_reset_done(struct mhi_controller *mhi_cntl)
>>>> +{
>>>> +	struct pci_dev *pci_dev = container_of(mhi_cntl->cntrl_dev, struct pci_dev, dev);
>>>> +	int ret;
>>>> +
>>>> +	ret = mhi_async_power_up(mhi_cntl);
>>>> +	if (ret)
>>>> +		pci_err(pci_dev, "mhi_async_power_up failed after reset %d\n", ret);
>>>> +}
>>>> diff --git a/drivers/accel/qaic/mhi_controller.h b/drivers/accel/qaic/mhi_controller.h
>>>> new file mode 100644
>>>> index 0000000..c105e93
>>>> --- /dev/null
>>>> +++ b/drivers/accel/qaic/mhi_controller.h
>>>> @@ -0,0 +1,16 @@
>>>> +/* SPDX-License-Identifier: GPL-2.0-only
>>>> + *
>>>> + * Copyright (c) 2019-2020, The Linux Foundation. All rights reserved.
>>>> + * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved.
>>>> + */
>>>> +
>>>> +#ifndef MHICONTROLLERQAIC_H_
>>>> +#define MHICONTROLLERQAIC_H_
>>>> +
>>>> +struct mhi_controller *qaic_mhi_register_controller(struct pci_dev *pci_dev, void __iomem *mhi_bar,
>>>> +						    int mhi_irq);
>>>> +void qaic_mhi_free_controller(struct mhi_controller *mhi_cntl, bool link_up);
>>>> +void qaic_mhi_start_reset(struct mhi_controller *mhi_cntl);
>>>> +void qaic_mhi_reset_done(struct mhi_controller *mhi_cntl);
>>>> +
>>>> +#endif /* MHICONTROLLERQAIC_H_ */
>>>> -- 
>>>> 2.7.4
>>>>
>>>
>>
>