[virglrenderer-devel] [Mesa-dev] issue about context reference

Marek Olšák maraeo at gmail.com
Wed Sep 30 18:56:29 UTC 2020


Hi,

Does the issue happen with mesa/master?

Marek


On Mon, Sep 28, 2020 at 3:11 AM Zhu Yijun <lovemrd at gmail.com> wrote:

> hi all,
>
> I use qemu/kvm to boot some android guests with virgl and run apks,
> the host kernel invokes oom after several hours.
>
> 1. From /proc/meminfo, I see the 'SUnreclaim' is the largest part.
>
> MemTotal: 16553672 kB
> MemFree: 128688 kB
> MemAvailable: 34648 kB
> Slab: 10169908 kB
> SReclaimable: 64632 kB
> SUnreclaim: 10105276 kB
>
> 2. From slabinfo, 'kmalloc-8192' nearly uses 5G memory which is the
> largest part of slab.
>
> kmalloc-8192 566782 566782 8192 4 8 : tunables 0 0 0 : slabdata 141697
> 141697 0
>
> 3. Then I append 'slub_debug=U,kmalloc-8192' to host kernel command
> line to reproduce this issue, find the call number of amdgpu_ctx_free
> is much less than amdgpu_ctx_alloc after running a few minutes test
> with only one android guest. It is the reason that 'kmalloc-8192' slab
> memory increases continuously.
>
> #cat /sys/kernel/slab/kmalloc-8192/alloc_calls
>       2 __vring_new_virtqueue+0x64/0x188 [virtio_ring]
> age=2779779/2779779/2779779 pid=1069 cpus=19 nodes=0
>       1 rd_alloc_device+0x34/0x48 [target_core_mod] age=2776755
> pid=1969 cpus=20 nodes=0
>       2 mb_cache_create+0x7c/0x128 [mbcache]
> age=2777018/2777221/2777425 pid=1186-1810 cpus=3,36 nodes=0
>       2 ext4_fill_super+0x128/0x25b0 [ext4]
> age=2777019/2777222/2777426 pid=1186-1810 cpus=3,36 nodes=0
>       2 svc_rqst_alloc+0x3c/0x170 [sunrpc] age=2775427/2775462/2775497
> pid=2346-2636 cpus=36-37 nodes=0
>       2 cache_create_net+0x4c/0xc0 [sunrpc]
> age=2737590/2757403/2777217 pid=1280-4987 cpus=20,44 nodes=0
>       2 rpc_alloc_iostats+0x2c/0x60 [sunrpc]
> age=2775494/2775495/2775497 pid=2346 cpus=36 nodes=0
>    1570 amdgpu_ctx_init+0xb4/0x2a0 [amdgpu] age=30110/314435/1914218
> pid=63167 cpus=1-7,9-10,16-20,23,27,29-35,40-47,52,60,63,95,118,120,122-123
> nodes=0
>    1570 amdgpu_ctx_ioctl+0x198/0x2f8 [amdgpu] age=30110/314435/1914218
> pid=63167 cpus=1-7,9-10,16-20,23,27,29-35,40-47,52,60,63,95,118,120,122-123
> nodes=0
>       2 gfx_v8_0_init_microcode+0x290/0x740 [amdgpu]
> age=2776838/2776924/2777011 pid=660 cpus=64 nodes=0
>       2 construct+0xe0/0x4b8 [amdgpu] age=2776819/2776901/2776983
> pid=660 cpus=64 nodes=0
>       2 mod_freesync_create+0x68/0x1d0 [amdgpu]
> age=2776819/2776901/2776983 pid=660 cpus=64 nodes=0
>       1 kvm_set_irq_routing+0xa8/0x2c8 [kvm_arm_0] age=1909635
> pid=63172 cpus=56 nodes=0
>       1 fat_fill_super+0x5c/0xc20 [fat] age=2777014 pid=1817 cpus=49
> nodes=0
>      11 cgroup1_mount+0x180/0x4e0 age=2779901/2779901/2779911 pid=1
> cpus=1 nodes=0
>      12 kvmalloc_node+0x64/0xa8 age=35454/1370665/2776188
> pid=2176-63167 cpus=2,23,34,42,44 nodes=0
>     128 zswap_dstmem_prepare+0x48/0x78 age=2780252/2780252/2780252
> pid=1 cpus=19 nodes=0
>       1 register_leaf_sysctl_tables+0x9c/0x1d0 age=2786535 pid=0 cpus=0
> nodes=0
>       2 do_register_framebuffer+0x298/0x300
> age=2779680/2783032/2786385 pid=1-656 cpus=0,5 nodes=0
>       1 vc_do_resize+0xb4/0x570 age=2786385 pid=1 cpus=5 nodes=0
>       5 vc_allocate+0x144/0x218 age=2776216/2776219/2776224 pid=2019
> cpus=40 nodes=0
>       8 arm_smmu_device_probe+0x2d8/0x640 age=2780865/2780894/2780924
> pid=1 cpus=0 nodes=0
>       4 __usb_create_hcd+0x44/0x258 age=2780467/2780534/2780599
> pid=5-660 cpus=0,64 nodes=0
>       2 xhci_alloc_virt_device+0x9c/0x308 age=2780463/2780476/2780489
> pid=5-656 cpus=0 nodes=0
>       1 hid_add_field+0x120/0x320 age=2780373 pid=1 cpus=19 nodes=0
>       2 hid_allocate_device+0x2c/0x100 age=2780345/2780362/2780380
> pid=1 cpus=19 nodes=0
>       1 ipv4_sysctl_init_net+0x44/0x148 age=2737590 pid=4987 cpus=44
> nodes=0
>       1 ipv4_sysctl_init_net+0xa8/0x148 age=2737590 pid=4987 cpus=44
> nodes=0
>       1 ipv4_sysctl_init_net+0xf8/0x148 age=2780293 pid=1 cpus=19 nodes=0
>       1 netlink_proto_init+0x60/0x19c age=2786498 pid=1 cpus=0 nodes=0
>       1 ip_rt_init+0x3c/0x20c age=2786473 pid=1 cpus=3 nodes=0
>       1 ip_rt_init+0x6c/0x20c age=2786472 pid=1 cpus=3 nodes=0
>       1 udp_init+0xa0/0x108 age=2786472 pid=1 cpus=4 nodes=0
>
> # cat /sys/kernel/slab/kmalloc-8192
> #cat /sys/kernel/slab/kmalloc-8192/free_calls
>    1473 <not-available> age=4297679817 pid=0 cpus=0 nodes=0
>      46 rpc_free+0x5c/0x80 [sunrpc] age=1760585/1918856/1935279
> pid=33422-68056 cpus=32,34,38,40-42,48,55,57,59,61-63 nodes=0
>       1 rpc_free_iostats+0x14/0x20 [sunrpc] age=2776482 pid=2346 cpus=36
> nodes=0
>     122 free_user_work+0x30/0x40 [ipmi_msghandler]
> age=59465/347716/1905020 pid=781-128311 cpus=32-46,50,52,63 nodes=0
>     740 amdgpu_ctx_fini+0x98/0xc8 [amdgpu] age=32012/286664/1910687
> pid=63167-63222
> cpus=1-11,16-24,27,29-35,40,42-45,47,52,60,63,95,118,120,122-123
> nodes=0
>     719 amdgpu_ctx_fini+0xb0/0xc8 [amdgpu] age=31957/287696/1910687
> pid=63167-63222
> cpus=1-7,10-11,13,16-24,27,29-35,40-47,52,57,60,63,95,118,120,122-123
> nodes=0
>       1 dc_release_state+0x3c/0x48 [amdgpu] age=2777920 pid=660 cpus=64
> nodes=0
>     115 kvfree+0x38/0x40 age=31170/406614/2777214 pid=2026-63167
> cpus=0-1,6-8,11,22,24-25,27,29-31,34-37,40,42-45,49,63,95,118,123
> nodes=0
>       4 cryptomgr_probe+0xe4/0xf0 age=2778011/2781965/2787371
> pid=727-1808 cpus=6,10,12,17 nodes=0
>     112 skb_free_head+0x2c/0x38 age=31864/385450/2776417
> pid=2649-130896 cpus=8,12,22,30,32,36,38-40,42-49,51,54,56,58-62
> nodes=0
>      11 do_name+0x68/0x258 age=2787385/2787385/2787385 pid=1 cpus=4 nodes=0
>       1 unpack_to_rootfs+0x27c/0x2bc age=2787385 pid=1 cpus=4 nodes=0
>
> 4. To analyze this issue further, I added some debuginfo in the qemu,
> virglrenderer, mesa and libdrm. Found that the context is
> vrend_renderer_create_sub_ctx/vrend_renderer_destroy_sub_ctx from
> virglrenderer, and all the calls of these two functions seems
> normal(the call number between create/destroy grows a little and keep
> nearly
> constant during test period). However, in mesa(19.3 in my system),
> when called amdgpu_ctx_destroy, many context's reference is not 1, so
> it will not go down into the amdgpu driver to free the slab cache.
>
> static inline void amdgpu_ctx_unref(struct amdgpu_ctx *ctx)
> {
>    if (p_atomic_dec_zero(&ctx->refcount)) {
>       amdgpu_cs_ctx_free(ctx->ctx);
>       amdgpu_bo_free(ctx->user_fence_bo);
>       FREE(ctx);
>    }
> }
>
> The ctx->refcount in mesa is maintained by amdgpu_fence_create and
> amdgpu_fence_reference, they are invoked by upper OpenGL command. I'm
> not familiar with this logic, so hope someone can give some advice
> about this issue. Thanks!
>
> Yijun
> _______________________________________________
> mesa-dev mailing list
> mesa-dev at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/virglrenderer-devel/attachments/20200930/ee89cc4e/attachment.htm>


More information about the virglrenderer-devel mailing list