[PATCH] drm/tegra: Remove of_dma_configure() from host1x_device_add()
Jason Gunthorpe
jgg at nvidia.com
Wed Feb 7 13:05:32 UTC 2024
On Fri, Feb 02, 2024 at 12:15:40PM -0400, Jason Gunthorpe wrote:
> > Yes looks like a race of some sort. Adding a bit of debug also makes the
> > issue go away so difficult to see what is happening.
>
> I'm wondering if it is racing with iommu driver probing? I looked but
> didn't notice anything obviously wrong there that would cause this
> though.
Oh there is at least one racy thing here..
The of_xlate hackjob is out of order and is racy if you have multiple instances:
struct tegra_smmu *tegra_smmu_probe(struct device *dev,
const struct tegra_smmu_soc *soc,
struct tegra_mc *mc)
{
/*
* This is a bit of a hack. Ideally we'd want to simply return this
* value. However iommu_device_register() will attempt to add
* all devices to the IOMMU before we get that far. In order
* not to rely on global variables to track the IOMMU instance, we
* set it here so that it can be looked up from the .probe_device()
* callback via the IOMMU device's .drvdata field.
*/
mc->smmu = smmu;
^^^^^^^^^^^^^
After this of_xlate and probe_device will succeed
[...]
err = iommu_device_sysfs_add(&smmu->iommu, dev, NULL, dev_name(dev));
err = iommu_device_register(&smmu->iommu, &tegra_smmu_ops, dev);
^^^^^^^^^
But the iommu instance is not fully initialized yet.
So:
static int tegra_smmu_of_xlate(struct device *dev,
struct of_phandle_args *args)
{
struct platform_device *iommu_pdev = of_find_device_by_node(args->np);
struct tegra_mc *mc = platform_get_drvdata(iommu_pdev);
dev_iommu_priv_set(dev, mc->smmu);
^^^^^^^^^^^^
Gets the partially initialized iommu
instance
static struct iommu_device *tegra_smmu_probe_device(struct device *dev)
{
smmu = dev_iommu_priv_get(dev);
if (!smmu)
return ERR_PTR(-ENODEV);
^^^^^^^^^^^^^
Allows the driver to bind to a partially setup instance
ie if you have multiple instances of tegra-smmu and you manage to do
concurrent probe where the iommu instance is probing concurrently with
the failing device_add flow then you can a situation like you have
described where the sysfs is not fully setup.
Is this making sense to you? Add a sleep after the mc->smmu store and
confirm with printing?
I think this is all an insane design. I fixed this race and removed
all this hackery in my fwspec removal series. There the iommu instance
only ever comes out of the locked list that iommu_device_register()
populates and drivers have a safe and simple flow.
Maybe just moving the store to mc->smmu later would improve it? I
didn't look closely..
Jason
More information about the dri-devel
mailing list