[PATCH] drm/tegra: Remove of_dma_configure() from host1x_device_add()

Jason Gunthorpe jgg at nvidia.com
Wed Feb 7 13:05:32 UTC 2024


On Fri, Feb 02, 2024 at 12:15:40PM -0400, Jason Gunthorpe wrote:

> > Yes looks like a race of some sort. Adding a bit of debug also makes the
> > issue go away so difficult to see what is happening.
> 
> I'm wondering if it is racing with iommu driver probing? I looked but
> didn't notice anything obviously wrong there that would cause this
> though.

Oh there is at least one racy thing here..

The of_xlate hackjob is out of order and is racy if you have multiple instances:

struct tegra_smmu *tegra_smmu_probe(struct device *dev,
				    const struct tegra_smmu_soc *soc,
				    struct tegra_mc *mc)
{
	/*
	 * This is a bit of a hack. Ideally we'd want to simply return this
	 * value. However iommu_device_register() will attempt to add
	 * all devices to the IOMMU before we get that far. In order
	 * not to rely on global variables to track the IOMMU instance, we
	 * set it here so that it can be looked up from the .probe_device()
	 * callback via the IOMMU device's .drvdata field.
	 */
	mc->smmu = smmu;
         ^^^^^^^^^^^^^
           After this of_xlate and probe_device will succeed

        [...]
       	err = iommu_device_sysfs_add(&smmu->iommu, dev, NULL, dev_name(dev));               
	err = iommu_device_register(&smmu->iommu, &tegra_smmu_ops, dev);
        ^^^^^^^^^
        But the iommu instance is not fully initialized yet.

So:

static int tegra_smmu_of_xlate(struct device *dev,
			       struct of_phandle_args *args)
{
	struct platform_device *iommu_pdev = of_find_device_by_node(args->np);
	struct tegra_mc *mc = platform_get_drvdata(iommu_pdev);

	dev_iommu_priv_set(dev, mc->smmu);
                            ^^^^^^^^^^^^
                             Gets the partially initialized iommu
			     instance


static struct iommu_device *tegra_smmu_probe_device(struct device *dev)
{
	smmu = dev_iommu_priv_get(dev);
	if (!smmu)
		return ERR_PTR(-ENODEV);
               ^^^^^^^^^^^^^
                  Allows the driver to bind to a partially setup instance

ie if you have multiple instances of tegra-smmu and you manage to do
concurrent probe where the iommu instance is probing concurrently with
the failing device_add flow then you can a situation like you have
described where the sysfs is not fully setup.

Is this making sense to you? Add a sleep after the mc->smmu store and
confirm with printing?

I think this is all an insane design. I fixed this race and removed
all this hackery in my fwspec removal series. There the iommu instance
only ever comes out of the locked list that iommu_device_register()
populates and drivers have a safe and simple flow.

Maybe just moving the store to mc->smmu later would improve it? I
didn't look closely..

Jason


More information about the dri-devel mailing list