[PATCH V2 01/10] accel/amdxdna: Add a new driver for AMD AI Engine

Jeffrey Hugo quic_jhugo at quicinc.com
Wed Aug 14 18:46:05 UTC 2024


On 8/14/2024 12:16 PM, Lizhi Hou wrote:
> 
> On 8/9/24 09:11, Jeffrey Hugo wrote:
>> On 8/5/2024 11:39 AM, Lizhi Hou wrote:
>>> diff --git a/drivers/accel/amdxdna/aie2_pci.c 
>>> b/drivers/accel/amdxdna/aie2_pci.c
>>> new file mode 100644
>>> index 000000000000..3660967c00e6
>>> --- /dev/null
>>> +++ b/drivers/accel/amdxdna/aie2_pci.c
>>> @@ -0,0 +1,182 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>> +/*
>>> + * Copyright (C) 2023-2024, Advanced Micro Devices, Inc.
>>> + */
>>> +
>>> +#include <linux/amd-iommu.h>
>>> +#include <linux/errno.h>
>>> +#include <linux/firmware.h>
>>> +#include <linux/iommu.h>
>>
>> You are clearly missing linux/pci.h and I suspect many more.
> Other headers are indirectly included by "aie2_pci.h" underneath.

aie2_pci.h also does not directly include linux/pci.h

>>> +
>>> +    ret = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
>>> +    if (ret) {
>>> +        XDNA_ERR(xdna, "Failed to set DMA mask: %d", ret);
>>> +        goto release_fw;
>>> +    }
>>> +
>>> +    nvec = pci_msix_vec_count(pdev);
>>
>> This feels weird.  Can your device advertise variable number of MSI-X 
>> vectors?  It only works if all of the vectors are used?
> That is possible. the driver supports different hardware. And the fw 
> assigns vector for hardware context dynamically. So the driver needs to 
> allocate all vectors ahead.

So, if the device requests N MSIs, but the host is only able to satisfy 
1 (or some number less than N), the fw is completely unable to function?

>>> +struct psp_device *aie2m_psp_create(struct device *dev, struct 
>>> psp_config *conf)
>>> +{
>>> +    struct psp_device *psp;
>>> +    u64 offset;
>>> +
>>> +    psp = devm_kzalloc(dev, sizeof(*psp), GFP_KERNEL);
>>> +    if (!psp)
>>> +        return NULL;
>>> +
>>> +    psp->dev = dev;
>>> +    memcpy(psp->psp_regs, conf->psp_regs, sizeof(psp->psp_regs));
>>> +
>>> +    psp->fw_buf_sz = ALIGN(conf->fw_size, PSP_FW_ALIGN) + PSP_FW_ALIGN;
>>> +    psp->fw_buffer = devm_kmalloc(psp->dev, psp->fw_buf_sz, 
>>> GFP_KERNEL);
>>
>> Feels like this (and a bunch of other instances I haven't commented 
>> on) should be drmm_* allocs.
> 
> The PSP code is kind of low level and directly interact with hardware. 
> All the PSP interfaces use struct device * instead of drm_device. I 
> think it is kind make sense because PSP is not related to drm.
> 
> I will scan all other allocs and change them to drmm_* allocs for the 
> code related to drm_device. Does this sound ok to you?

I don't think so.  Look up
drm/todo: Add TODO entry for "lints"
on the dri-devel list, and its history.

> 
>>
>>> +    if (!psp->fw_buffer) {
>>> +        dev_err(psp->dev, "no memory for fw buffer");
>>> +        return NULL;
>>> +    }
>>> +
>>> +    psp->fw_paddr = virt_to_phys(psp->fw_buffer);
>>
>> I'm pretty sure virt_to_phys() is always wrong
> 
> The hardware exposes several registers to communicate with platform PSP 
> (AMD Platform Security Processor) to load NPU firmware. And PSP only 
> accept host physical address with current hardware.
> 
> I understand usually virt_to_phys() should not be needed for device 
> driver. And maybe it is ok to use if there is hardware requirement? I 
> can see some drivers use it as well.

Eh.  I guess the PSP would never have an IOMMU in front of it or 
anything like that.

This feels similar to what Qualcomm MSM platforms do, which uses the 
remoteproc framework.  Not sure if that helps you here.

This still feels not good, but you might have a valid exception here. 
I'd suggest putting a justification comment in the code through. 
Someone looking at this in X months might raise the same question.

> 
>>
>>> +    offset = ALIGN(psp->fw_paddr, PSP_FW_ALIGN) - psp->fw_paddr;
>>> +    psp->fw_paddr += offset;
>>> +    memcpy(psp->fw_buffer + offset, conf->fw_buf, conf->fw_size);
>>> +
>>> +    return psp;
>>> +}
>>> diff --git a/drivers/accel/amdxdna/amdxdna_drm.c 
>>> b/drivers/accel/amdxdna/amdxdna_drm.c
>>> new file mode 100644
>>> index 000000000000..91e4f9c9dac9
>>> --- /dev/null
>>> +++ b/drivers/accel/amdxdna/amdxdna_drm.c
>>
>> What is the point of this file?  Seems like all of this could just be 
>> in amdxdna_pci_drv.c
> The future product may have NPU with non-pci device. So it might be a 
> amdxdna_plat_drv.c and share the same amdxdna_drm.c in the future.

This seems like a weak justification.  "may" is not definitive.  If such 
hardware appears, you could refactor the driver at that time.

>>> diff --git a/drivers/accel/amdxdna/amdxdna_pci_drv.c 
>>> b/drivers/accel/amdxdna/amdxdna_pci_drv.c
>>> new file mode 100644
>>> index 000000000000..7d0cfd918b0e
>>> --- /dev/null
>>> +++ b/drivers/accel/amdxdna/amdxdna_pci_drv.c
>>> @@ -0,0 +1,118 @@
>>> +// SPDX-License-Identifier: GPL-2.0
>>> +/*
>>> + * Copyright (C) 2022-2024, Advanced Micro Devices, Inc.
>>> + */
>>> +
>>> +#include <linux/module.h>
>>> +
>>> +#include "amdxdna_pci_drv.h"
>>> +
>>> +/*
>>> + *  There are platforms which share the same PCI device ID
>>> + *  but have different PCI revision IDs. So, let the PCI class
>>> + *  determine the probe and later use the (device_id, rev_id)
>>> + *  pair as a key to select the devices.
>>> + */
>>
>> Huh?  So, VID == AMD, DID == 0x17f0, rev == 0x1 is a completely 
>> different device?  That feels like a PCIe spec violation...
> Maybe the comment is misleading. The hardware with same device id 0x17f0 
> uses the same commands, registers etc. And they are same device with 
> different revisions.

Then I don't understand why you need to do the class matching.  Match on 
PCI_VENDOR_ID_AMD with the Device IDs you need to support like a 
"normal" PCI(e) driver?

>>
>>> +static const struct pci_device_id pci_ids[] = {
>>> +    { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_ANY_ID),
>>> +        .class = PCI_CLASS_SP_OTHER << 8,
>>
>> Weird.  I would have expected the Accelerator class to be used
> We contacted our hardware team to figure out why accelerator class is 
> not used here. Some of hardware is already released. Hopefully hardware 
> team may consider to use accelerator class with new products.


More information about the dri-devel mailing list