question about error handling in ttm_bo_handle_move_mem()

Dan Carpenter dan.carpenter at oracle.com
Thu Jun 10 15:41:13 UTC 2021


The new version of Firefox seems to trigger a refcounting bug in my
nouveau driver.  I tested a v4.15 kernel and that has the bug as well.
It seems like the refcounting is off if ttm_bo_evict() fails.  Dmesg
at the end.

I tried to see if I could spot anything off and I had a question about
ttm_bo_handle_move_mem().

drivers/gpu/drm/ttm/ttm_bo.c
   230  static int ttm_bo_handle_move_mem(struct ttm_buffer_object *bo,
   231                                    struct ttm_resource *mem, bool evict,
   232                                    struct ttm_operation_ctx *ctx,
   233                                    struct ttm_place *hop)
   234  {
   235          struct ttm_bo_device *bdev = bo->bdev;
   236          struct ttm_resource_manager *old_man = ttm_manager_type(bdev, bo->mem.mem_type);
   237          struct ttm_resource_manager *new_man = ttm_manager_type(bdev, mem->mem_type);

old_man and new_man are assigned here.

   238          int ret;
   239  
   240          ttm_bo_unmap_virtual(bo);
   241  
   242          /*
   243           * Create and bind a ttm if required.
   244           */
   245  
   246          if (new_man->use_tt) {
   247                  /* Zero init the new TTM structure if the old location should
   248                   * have used one as well.
   249                   */
   250                  ret = ttm_tt_create(bo, old_man->use_tt);
   251                  if (ret)
   252                          goto out_err;

This "goto out_err;" is a no-op.  Presumably that is intentional.  I
think if this create succeeds then the error handling is expected to
clean it up?

   253  
   254                  if (mem->mem_type != TTM_PL_SYSTEM) {
   255                          ret = ttm_tt_populate(bo->bdev, bo->ttm, ctx);
   256                          if (ret)
   257                                  goto out_err;
   258                  }
   259          }
   260  
   261          ret = bdev->driver->move(bo, evict, ctx, mem, hop);

On my system ->move() is returning -EINVAL

   262          if (ret) {
   263                  if (ret == -EMULTIHOP)
   264                          return ret;
   265                  goto out_err;
   266          }
   267  
   268          ctx->bytes_moved += bo->base.size;
   269          return 0;
   270  
   271  out_err:
   272          new_man = ttm_manager_type(bdev, bo->mem.mem_type);

This seems like a mistake.  This sets new_man to the same value as
old_man.  I don't understand why it needs to be re-assigned at all
though so maybe I'm missing something.


   273          if (!new_man->use_tt)

This test seems reversed.

Unfortunately, making these changes doesn't fix my crashes and I'm still
investigating.

   274                  ttm_bo_tt_destroy(bo);
   275  
   276          return ret;
   277  }

regards,
dan carpenter

[  159.893081] nouveau 0000:01:00.0: Direct firmware load for nouveau/nva8_fuc084 failed with error -2
[  159.893089] nouveau 0000:01:00.0: Direct firmware load for nouveau/nva8_fuc084d failed with error -2
[  159.893091] nouveau 0000:01:00.0: msvld: unable to load firmware data
[  159.893092] nouveau 0000:01:00.0: msvld: init failed, -19
[ 1945.479861] [TTM] Buffer eviction failed
[ 1945.479883] ------------[ cut here ]------------
[ 1945.479886] refcount_t: underflow; use-after-free.
[ 1945.479900] WARNING: CPU: 7 PID: 2528 at lib/refcount.c:28 refcount_warn_saturate+0xa6/0xf0
[ 1945.479914] Modules linked in: bnep(E) ctr(E) ccm(E) cpufreq_conservative(E) cpufreq_userspace(E) cpufreq_powersave(E) cpufreq_ondemand(E) tun(E) uinput(E) binfmt_misc(E) ath3k(E) btusb(E) btrtl(E) btbcm(E) btintel(E) bluetooth(E) jitterentropy_rng(E) drbg(E) ansi_cprng(E) ecdh_generic(E) ecc(E) intel_rapl_msr(E) intel_rapl_common(E) snd_hda_codec_realtek(E) x86_pkg_temp_thermal(E) snd_hda_codec_generic(E) ath9k(E) intel_powerclamp(E) ledtrig_audio(E) snd_hda_codec_hdmi(E) coretemp(E) ath9k_common(E) kvm_intel(E) ath9k_hw(E) snd_hda_intel(E) snd_intel_dspcfg(E) kvm(E) snd_intel_sdw_acpi(E) ath(E) irqbypass(E) snd_hda_codec(E) mac80211(E) snd_hda_core(E) ghash_clmulni_intel(E) snd_hwdep(E) aesni_intel(E) snd_pcm_oss(E) libaes(E) snd_mixer_oss(E) crypto_simd(E) dell_smm_hwmon(E) cfg80211(E) cryptd(E) snd_pcm(E) rapl(E) iTCO_wdt(E) snd_timer(E) intel_cstate(E) intel_pmc_bxt(E) snd(E) rfkill(E) iTCO_vendor_support(E) intel_uncore(E) pcspkr(E) libarc4(E) soundcore(E) mei_me(E) watchdog(E)
[ 1945.480005]  sg(E) at24(E) mei(E) evdev(E) nfsd(E) loop(E) auth_rpcgss(E) msr(E) nfs_acl(E) lockd(E) parport_pc(E) ppdev(E) grace(E) lp(E) parport(E) sunrpc(E) fuse(E) configfs(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) btrfs(E) blake2b_generic(E) xor(E) raid6_pq(E) libcrc32c(E) crc32c_generic(E) ums_realtek(E) uas(E) usb_storage(E) hid_generic(E) usbhid(E) hid(E) sd_mod(E) sr_mod(E) t10_pi(E) crc_t10dif(E) cdrom(E) crct10dif_generic(E) nouveau(E) mxm_wmi(E) wmi(E) i2c_algo_bit(E) drm_ttm_helper(E) ttm(E) xhci_pci(E) drm_kms_helper(E) ahci(E) r8169(E) crct10dif_pclmul(E) crct10dif_common(E) ehci_pci(E) realtek(E) lpc_ich(E) libahci(E) mdio_devres(E) cec(E) xhci_hcd(E) crc32_pclmul(E) libphy(E) ehci_hcd(E) crc32c_intel(E) libata(E) i2c_i801(E) i2c_smbus(E) drm(E) scsi_mod(E) usbcore(E) fan(E) video(E) button(E)
[ 1945.480157] CPU: 7 PID: 2528 Comm: Xorg Tainted: G            E     5.12.0+ #1
[ 1945.480164] Hardware name: Dell Inc. XPS 8700/0KWVT8, BIOS A06 11/18/2013
[ 1945.480168] RIP: 0010:refcount_warn_saturate+0xa6/0xf0
[ 1945.480177] Code: 05 b9 e2 3d 01 01 e8 79 e5 42 00 0f 0b c3 80 3d a7 e2 3d 01 00 75 95 48 c7 c7 68 61 f2 b1 c6 05 97 e2 3d 01 01 e8 5a e5 42 00 <0f> 0b c3 80 3d 86 e2 3d 01 00 0f 85 72 ff ff ff 48 c7 c7 c0 61 f2
[ 1945.480183] RSP: 0018:ffffbba402fd7d30 EFLAGS: 00010286
[ 1945.480188] RAX: 0000000000000000 RBX: 0000000000000002 RCX: ffff9194fedd8588
[ 1945.480192] RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff9194fedd8580
[ 1945.480196] RBP: ffff918e03c0d800 R08: 0000000000000000 R09: ffffbba402fd7b50
[ 1945.480199] R10: ffffbba402fd7b48 R11: ffffffffb24cc7c8 R12: ffffffffc08b4d20
[ 1945.480202] R13: ffff918e00c2e000 R14: ffff918e7e348c00 R15: ffff918e7e348c00
[ 1945.480206] FS:  00007fa1278f0a40(0000) GS:ffff9194fedc0000(0000) knlGS:0000000000000000
[ 1945.480211] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1945.480215] CR2: 00007f8c0802f000 CR3: 000000013f4bc003 CR4: 00000000001706e0
[ 1945.480219] Call Trace:
[ 1945.480225]  nouveau_gem_new+0xc1/0xf0 [nouveau]
[ 1945.480451]  nouveau_gem_ioctl_new+0x53/0xf0 [nouveau]
[ 1945.480618]  ? nouveau_gem_new+0xf0/0xf0 [nouveau]
[ 1945.480779]  drm_ioctl_kernel+0xaa/0xf0 [drm]
[ 1945.480837]  drm_ioctl+0x20f/0x3a0 [drm]
[ 1945.480883]  ? nouveau_gem_new+0xf0/0xf0 [nouveau]
[ 1945.481058]  nouveau_drm_ioctl+0x55/0xa0 [nouveau]
[ 1945.481233]  __x64_sys_ioctl+0x83/0xb0
[ 1945.481242]  do_syscall_64+0x33/0x80
[ 1945.481251]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 1945.481262] RIP: 0033:0x7fa127d5bcc7
[ 1945.481268] Code: 00 00 00 48 8b 05 c9 91 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 99 91 0c 00 f7 d8 64 89 01 48
[ 1945.481274] RSP: 002b:00007ffe54852078 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 1945.481281] RAX: ffffffffffffffda RBX: 00007ffe548520d0 RCX: 00007fa127d5bcc7
[ 1945.481285] RDX: 00007ffe548520d0 RSI: 00000000c0306480 RDI: 0000000000000010
[ 1945.481288] RBP: 00000000c0306480 R08: 0000000000000000 R09: 000055e83020e010
[ 1945.481292] R10: 00007fa127e25b80 R11: 0000000000000246 R12: 00007ffe548520d0
[ 1945.481296] R13: 0000000000000010 R14: 000055e8302c9fd0 R15: 0000000000001000
[ 1945.481302] ---[ end trace 1717583068871a81 ]---
[ 2081.413684] [TTM] Buffer eviction failed




More information about the dri-devel mailing list