[Nouveau] next/master boot bisection: Oops in nouveau driver on jetson-tk1

Guillaume Tucker guillaume.tucker at collabora.com
Mon Dec 10 10:00:08 UTC 2018


On 08/12/2018 00:08, Lyude Paul wrote:
> uhhhhhhhhhhhhh
> didn't we fix this weeks ago? with "drm/nouveau: tegra: Call
> nouveau_drm_device_init()"

Yes here's the fix from Thierry:

  https://patchwork.freedesktop.org/patch/263587/


and I can confirm that it does fix the Oops when applied on top
of next-20181206 (what I used for the bisection last week):

  http://lava.baylibre.com:10080/scheduler/job/71109


However the fix doesn't appear to have been applied in any
upstream tree yet.

Guillaume
 
> On Fri, 2018-12-07 at 23:31 +0000, Guillaume Tucker wrote:
>> Please find below an automated bisection report for a kernel Oops
>> seen during the initialisation of the nouveau GPU driver on
>> jetson-tk1.
>>
>>
>> All the LAVA test jobs for this bisection can be found here:
>>
>>   
>> http://lava.baylibre.com:10080/scheduler/alljobs?length=25&search=lava-bisect-staging-7366#table
>>
>>
>> Here's the beginning of the Oops stack trace:
>>
>> [    7.485361] [00000064] *pgd=f9e7b835
>> [    7.485372] Internal error: Oops: 17 [#1] SMP ARM
>> [    7.485376] Modules linked in: snd_soc_tegra_rt5640(+)
>> snd_soc_tegra_utils snd_soc_rt5640(+) snd_soc_rl6231 snd_soc_tegra30_ahub
>> snd_hda_tegra snd_soc_core snd_hda_codec snd_hda_core ac97_bus
>> snd_pcm_dmaengine snd_pcm xhci_tegra(+) snd_timer snd soundcore nouveau(+)
>> ttm tegra_devfreq tegra_wdt
>> [    7.542227] CPU: 1 PID: 128 Comm: udevd Not tainted 4.20.0-rc5-next-
>> 20181206 #44
>> [    7.549603] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
>> [    7.555859] PC is at drm_plane_register_all+0x18/0x50
>> [    7.560899] LR is at drm_modeset_register_all+0xc/0x6c
>>
>>
>> Full log:
>>
>>   http://lava.baylibre.com:10080/scheduler/job/68628#L816
>>
>>
>> The bisection was run from next-20181206 as this is where the
>> issue was discovered on kernelci.org but the patch it found has
>> already been merged in mainline.
>>
>> Hope this helps!
>>
>> Guillaume
>>
>>
>> -----------8<------------------------8<-----------
>>
>>
>> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
>> * This automated bisection report was sent to you on the basis  *
>> * that you may be involved with the breaking commit it has      *
>> * found.  No manual investigation has been done to verify it,   *
>> * and the root cause of the problem may be somewhere else.      *
>> * Hope this helps!                                              *
>> * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
>>
>> Bisection result for next/master (next-20181206) on jetson-tk1
>>
>>   Good:       84df9525b0c2 Linux 4.19
>>   Bad:        4c92b7b3080d Add linux-next specific files for 20181206
>>   Found:      cfea88a4d866 drm/nouveau: Start using new drm_dev
>> initialization helpers
>>
>> Checks:
>>   revert:     PASS
>>   verify:     PASS
>>
>> Parameters:
>>   Tree:       next
>>   URL:        None
>>   Branch:     master
>>   Target:     jetson-tk1
>>   Lab:        lab-baylibre
>>   Config:     multi_v7_defconfig
>>   Plan:       dmesg-nouveau
>>
>> Breaking commit found:
>>
>> ----------------------------------------------------------------------------
>> ---
>> commit cfea88a4d86632f28cf80be97079f131645b7869
>> Author: Lyude Paul <lyude at redhat.com>
>> Date:   Wed Aug 22 21:40:07 2018 -0400
>>
>>     drm/nouveau: Start using new drm_dev initialization helpers
>>     
>>     Per the documentation in drm_get_pci_dev(), this function is deprecated
>>     and shouldn't be used anymore. As it turns out, we're going to need to
>>     stop using drm_get_pci_dev() anyway in order to allow us to turn off the
>>     card before full system shutdowns, otherwise we'll hit race conditions
>>     with userspace while trying to tear down the card on shutdown.
>>     
>>     So, start using drm_dev_get() and drm_dev_put(), and just turn our
>>     load/unload callbacks into open coded init/fini() functions.
>>     
>>     Signed-off-by: Lyude Paul <lyude at redhat.com>
>>     Cc: Karol Herbst <kherbst at redhat.com>
>>     Signed-off-by: Ben Skeggs <bskeggs at redhat.com>
>>
>> diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c
>> b/drivers/gpu/drm/nouveau/nouveau_drm.c
>> index 905956809d21..2b2baf6e0e0d 100644
>> --- a/drivers/gpu/drm/nouveau/nouveau_drm.c
>> +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
>> @@ -458,75 +458,8 @@ nouveau_accel_init(struct nouveau_drm *drm)
>>  	nouveau_bo_move_init(drm);
>>  }
>>  
>> -static int nouveau_drm_probe(struct pci_dev *pdev,
>> -			     const struct pci_device_id *pent)
>> -{
>> -	struct nvkm_device *device;
>> -	struct apertures_struct *aper;
>> -	bool boot = false;
>> -	int ret;
>> -
>> -	if (vga_switcheroo_client_probe_defer(pdev))
>> -		return -EPROBE_DEFER;
>> -
>> -	/* We need to check that the chipset is supported before booting
>> -	 * fbdev off the hardware, as there's no way to put it back.
>> -	 */
>> -	ret = nvkm_device_pci_new(pdev, NULL, "error", true, false, 0,
>> &device);
>> -	if (ret)
>> -		return ret;
>> -
>> -	nvkm_device_del(&device);
>> -
>> -	/* Remove conflicting drivers (vesafb, efifb etc). */
>> -	aper = alloc_apertures(3);
>> -	if (!aper)
>> -		return -ENOMEM;
>> -
>> -	aper->ranges[0].base = pci_resource_start(pdev, 1);
>> -	aper->ranges[0].size = pci_resource_len(pdev, 1);
>> -	aper->count = 1;
>> -
>> -	if (pci_resource_len(pdev, 2)) {
>> -		aper->ranges[aper->count].base = pci_resource_start(pdev, 2);
>> -		aper->ranges[aper->count].size = pci_resource_len(pdev, 2);
>> -		aper->count++;
>> -	}
>> -
>> -	if (pci_resource_len(pdev, 3)) {
>> -		aper->ranges[aper->count].base = pci_resource_start(pdev, 3);
>> -		aper->ranges[aper->count].size = pci_resource_len(pdev, 3);
>> -		aper->count++;
>> -	}
>> -
>> -#ifdef CONFIG_X86
>> -	boot = pdev->resource[PCI_ROM_RESOURCE].flags & IORESOURCE_ROM_SHADOW;
>> -#endif
>> -	if (nouveau_modeset != 2)
>> -		drm_fb_helper_remove_conflicting_framebuffers(aper,
>> "nouveaufb", boot);
>> -	kfree(aper);
>> -
>> -	ret = nvkm_device_pci_new(pdev, nouveau_config, nouveau_debug,
>> -				  true, true, ~0ULL, &device);
>> -	if (ret)
>> -		return ret;
>> -
>> -	pci_set_master(pdev);
>> -
>> -	if (nouveau_atomic)
>> -		driver_pci.driver_features |= DRIVER_ATOMIC;
>> -
>> -	ret = drm_get_pci_dev(pdev, pent, &driver_pci);
>> -	if (ret) {
>> -		nvkm_device_del(&device);
>> -		return ret;
>> -	}
>> -
>> -	return 0;
>> -}
>> -
>>  static int
>> -nouveau_drm_load(struct drm_device *dev, unsigned long flags)
>> +nouveau_drm_device_init(struct drm_device *dev)
>>  {
>>  	struct nouveau_drm *drm;
>>  	int ret;
>> @@ -613,7 +546,7 @@ nouveau_drm_load(struct drm_device *dev, unsigned long
>> flags)
>>  }
>>  
>>  static void
>> -nouveau_drm_unload(struct drm_device *dev)
>> +nouveau_drm_device_fini(struct drm_device *dev)
>>  {
>>  	struct nouveau_drm *drm = nouveau_drm(dev);
>>  
>> @@ -642,18 +575,116 @@ nouveau_drm_unload(struct drm_device *dev)
>>  	kfree(drm);
>>  }
>>  
>> +static int nouveau_drm_probe(struct pci_dev *pdev,
>> +			     const struct pci_device_id *pent)
>> +{
>> +	struct nvkm_device *device;
>> +	struct drm_device *drm_dev;
>> +	struct apertures_struct *aper;
>> +	bool boot = false;
>> +	int ret;
>> +
>> +	if (vga_switcheroo_client_probe_defer(pdev))
>> +		return -EPROBE_DEFER;
>> +
>> +	/* We need to check that the chipset is supported before booting
>> +	 * fbdev off the hardware, as there's no way to put it back.
>> +	 */
>> +	ret = nvkm_device_pci_new(pdev, NULL, "error", true, false, 0,
>> &device);
>> +	if (ret)
>> +		return ret;
>> +
>> +	nvkm_device_del(&device);
>> +
>> +	/* Remove conflicting drivers (vesafb, efifb etc). */
>> +	aper = alloc_apertures(3);
>> +	if (!aper)
>> +		return -ENOMEM;
>> +
>> +	aper->ranges[0].base = pci_resource_start(pdev, 1);
>> +	aper->ranges[0].size = pci_resource_len(pdev, 1);
>> +	aper->count = 1;
>> +
>> +	if (pci_resource_len(pdev, 2)) {
>> +		aper->ranges[aper->count].base = pci_resource_start(pdev, 2);
>> +		aper->ranges[aper->count].size = pci_resource_len(pdev, 2);
>> +		aper->count++;
>> +	}
>> +
>> +	if (pci_resource_len(pdev, 3)) {
>> +		aper->ranges[aper->count].base = pci_resource_start(pdev, 3);
>> +		aper->ranges[aper->count].size = pci_resource_len(pdev, 3);
>> +		aper->count++;
>> +	}
>> +
>> +#ifdef CONFIG_X86
>> +	boot = pdev->resource[PCI_ROM_RESOURCE].flags & IORESOURCE_ROM_SHADOW;
>> +#endif
>> +	if (nouveau_modeset != 2)
>> +		drm_fb_helper_remove_conflicting_framebuffers(aper,
>> "nouveaufb", boot);
>> +	kfree(aper);
>> +
>> +	ret = nvkm_device_pci_new(pdev, nouveau_config, nouveau_debug,
>> +				  true, true, ~0ULL, &device);
>> +	if (ret)
>> +		return ret;
>> +
>> +	pci_set_master(pdev);
>> +
>> +	if (nouveau_atomic)
>> +		driver_pci.driver_features |= DRIVER_ATOMIC;
>> +
>> +	drm_dev = drm_dev_alloc(&driver_pci, &pdev->dev);
>> +	if (IS_ERR(drm_dev)) {
>> +		ret = PTR_ERR(drm_dev);
>> +		goto fail_nvkm;
>> +	}
>> +
>> +	ret = pci_enable_device(pdev);
>> +	if (ret)
>> +		goto fail_drm;
>> +
>> +	drm_dev->pdev = pdev;
>> +	pci_set_drvdata(pdev, drm_dev);
>> +
>> +	ret = nouveau_drm_device_init(drm_dev);
>> +	if (ret)
>> +		goto fail_pci;
>> +
>> +	ret = drm_dev_register(drm_dev, pent->driver_data);
>> +	if (ret)
>> +		goto fail_drm_dev_init;
>> +
>> +	return 0;
>> +
>> +fail_drm_dev_init:
>> +	nouveau_drm_device_fini(drm_dev);
>> +fail_pci:
>> +	pci_disable_device(pdev);
>> +fail_drm:
>> +	drm_dev_put(drm_dev);
>> +fail_nvkm:
>> +	nvkm_device_del(&device);
>> +	return ret;
>> +}
>> +
>>  void
>>  nouveau_drm_device_remove(struct drm_device *dev)
>>  {
>> +	struct pci_dev *pdev = dev->pdev;
>>  	struct nouveau_drm *drm = nouveau_drm(dev);
>>  	struct nvkm_client *client;
>>  	struct nvkm_device *device;
>>  
>> +	drm_dev_unregister(dev);
>> +
>>  	dev->irq_enabled = false;
>>  	client = nvxx_client(&drm->client.base);
>>  	device = nvkm_device_find(client->device);
>> -	drm_put_dev(dev);
>>  
>> +	nouveau_drm_device_fini(dev);
>> +	pci_disable_device(pdev);
>> +	drm_dev_put(dev);
>>  	nvkm_device_del(&device);
>>  }
>>  
>> @@ -1020,8 +1051,6 @@ driver_stub = {
>>  		DRIVER_GEM | DRIVER_MODESET | DRIVER_PRIME | DRIVER_RENDER |
>>  		DRIVER_KMS_LEGACY_CONTEXT,
>>  
>> -	.load = nouveau_drm_load,
>> -	.unload = nouveau_drm_unload,
>>  	.open = nouveau_drm_open,
>>  	.postclose = nouveau_drm_postclose,
>>  	.lastclose = nouveau_vga_lastclose,
>> ----------------------------------------------------------------------------
>> ---
>>
>>
>> Git bisection log:
>>
>> ----------------------------------------------------------------------------
>> ---
>> git bisect start
>> # good: [84df9525b0c27f3ebc2ebb1864fa62a97fdedb7d] Linux 4.19
>> git bisect good 84df9525b0c27f3ebc2ebb1864fa62a97fdedb7d
>> # bad: [4c92b7b3080d8281941ae81c51cd62bb49bdc3d4] Add linux-next specific
>> files for 20181206
>> git bisect bad 4c92b7b3080d8281941ae81c51cd62bb49bdc3d4
>> # bad: [c38239b4be1ac7e4bcf5bbd971353bae51525b8f] Merge branch 'parisc-4.20-
>> 2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
>> git bisect bad c38239b4be1ac7e4bcf5bbd971353bae51525b8f
>> # good: [d49f8a52b15bf35db778035340d8a673149f9f93] Merge tag 'scsi-misc' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
>> git bisect good d49f8a52b15bf35db778035340d8a673149f9f93
>> # good: [ac747c0715f29c2be3848b719a1b7e65b07f7b21] Merge tag 'kbuild-v4.20'
>> of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
>> git bisect good ac747c0715f29c2be3848b719a1b7e65b07f7b21
>> # bad: [46972c03ab667dc298cad0c9db517fb9b1521b5f] Merge tag 'drm-misc-next-
>> fixes-2018-10-10' of git://anongit.freedesktop.org/drm/drm-misc into drm-
>> next
>> git bisect bad 46972c03ab667dc298cad0c9db517fb9b1521b5f
>> # good: [6ac99a328ee16d3f8cc253f1df62623cee3e9ea5] drm/exynos: mixer: Make
>> plane alpha configurable
>> git bisect good 6ac99a328ee16d3f8cc253f1df62623cee3e9ea5
>> # good: [0957dc7097a3f462f6cedb45cf9b9785cc29e5bb] drm/amdgpu: revert "stop
>> using gart_start as offset for the GTT domain"
>> git bisect good 0957dc7097a3f462f6cedb45cf9b9785cc29e5bb
>> # good: [2de0b0a158bf423208c3898522c8fa1c1078df48] Merge tag 'drm/tegra/for-
>> 4.20-rc1' of git://anongit.freedesktop.org/tegra/linux into drm-next
>> git bisect good 2de0b0a158bf423208c3898522c8fa1c1078df48
>> # good: [04b96b63c5640a305e30611def7a9c5fcd7a72cf] drm/msm/dpu: Remove
>> unneeded checks in dpu_crtc.c
>> git bisect good 04b96b63c5640a305e30611def7a9c5fcd7a72cf
>> # good: [6952e3a1dffcb931cf8625aa01642b9afac2af61] Merge branch 'for-
>> upstream/mali-dp' of git://linux-arm.org/linux-ld into drm-next
>> git bisect good 6952e3a1dffcb931cf8625aa01642b9afac2af61
>> # good: [62e681f7dcab746412dce22d4b75b32c5ea38cdb] Merge tag 'drm-msm-fixes-
>> 2018-10-09' of git://people.freedesktop.org/~robclark/linux into drm-next
>> git bisect good 62e681f7dcab746412dce22d4b75b32c5ea38cdb
>> # bad: [7e6191d4360a2df6cf2a2613dcb79680cb943df8] Merge branch 'linux-4.20'
>> of git://github.com/skeggsb/linux into drm-next
>> git bisect bad 7e6191d4360a2df6cf2a2613dcb79680cb943df8
>> # good: [c4cee69a4497d9c6ad8868d63568b30e50cac9e9] drm/nouveau: Fix
>> potential memory leak in nouveau_drm_load()
>> git bisect good c4cee69a4497d9c6ad8868d63568b30e50cac9e9
>> # bad: [a971558c298755d2c07bc5508c65d689471763c8] drm/nouveau/disp: keep
>> track of high-speed state, program into clock
>> git bisect bad a971558c298755d2c07bc5508c65d689471763c8
>> # bad: [4126b99e744b7a29746e201e2be6644d2edf3c56] drm/nouveau/disp: add a
>> way to configure scrambling/tmds for hdmi 2.0
>> git bisect bad 4126b99e744b7a29746e201e2be6644d2edf3c56
>> # bad: [cfea88a4d86632f28cf80be97079f131645b7869] drm/nouveau: Start using
>> new drm_dev initialization helpers
>> git bisect bad cfea88a4d86632f28cf80be97079f131645b7869
>> # first bad commit: [cfea88a4d86632f28cf80be97079f131645b7869] drm/nouveau:
>> Start using new drm_dev initialization helpers
>> ----------------------------------------------------------------------------
>> ---



More information about the Nouveau mailing list