[bug report] drm/amdgpu: add ring buffer information in devcoredump

Khatri, Sunil sukhatri at amd.com
Fri Mar 15 19:08:59 UTC 2024


Thanks for pointing these. I do have some doubt and i raised inline.

On 3/15/2024 8:46 PM, Dan Carpenter wrote:
> Hello Sunil Khatri,
>
> Commit 42742cc541bb ("drm/amdgpu: add ring buffer information in
> devcoredump") from Mar 11, 2024 (linux-next), leads to the following
> Smatch static checker warning:
>
> 	drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c:219 amdgpu_devcoredump_read()
> 	error: we previously assumed 'coredump->adev' could be null (see line 206)
>
> drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
>      171 static ssize_t
>      172 amdgpu_devcoredump_read(char *buffer, loff_t offset, size_t count,
>      173                         void *data, size_t datalen)
>      174 {
>      175         struct drm_printer p;
>      176         struct amdgpu_coredump_info *coredump = data;
>      177         struct drm_print_iterator iter;
>      178         int i;
>      179
>      180         iter.data = buffer;
>      181         iter.offset = 0;
>      182         iter.start = offset;
>      183         iter.remain = count;
>      184
>      185         p = drm_coredump_printer(&iter);
>      186
>      187         drm_printf(&p, "**** AMDGPU Device Coredump ****\n");
>      188         drm_printf(&p, "version: " AMDGPU_COREDUMP_VERSION "\n");
>      189         drm_printf(&p, "kernel: " UTS_RELEASE "\n");
>      190         drm_printf(&p, "module: " KBUILD_MODNAME "\n");
>      191         drm_printf(&p, "time: %lld.%09ld\n", coredump->reset_time.tv_sec,
>      192                         coredump->reset_time.tv_nsec);
>      193
>      194         if (coredump->reset_task_info.pid)
>      195                 drm_printf(&p, "process_name: %s PID: %d\n",
>      196                            coredump->reset_task_info.process_name,
>      197                            coredump->reset_task_info.pid);
>      198
>      199         if (coredump->ring) {
>      200                 drm_printf(&p, "\nRing timed out details\n");
>      201                 drm_printf(&p, "IP Type: %d Ring Name: %s\n",
>      202                            coredump->ring->funcs->type,
>      203                            coredump->ring->name);
>      204         }
>      205
>      206         if (coredump->adev) {
>                      ^^^^^^^^^^^^^^
> Check for NULL
This is the check for NULL. Is there any issue here ?
>
>      207                 struct amdgpu_vm_fault_info *fault_info =
>      208                         &coredump->adev->vm_manager.fault_info;
>      209
>      210                 drm_printf(&p, "\n[%s] Page fault observed\n",
>      211                            fault_info->vmhub ? "mmhub" : "gfxhub");
>      212                 drm_printf(&p, "Faulty page starting at address: 0x%016llx\n",
>      213                            fault_info->addr);
>      214                 drm_printf(&p, "Protection fault status register: 0x%x\n\n",
>      215                            fault_info->status);
>      216         }
>      217
>      218         drm_printf(&p, "Ring buffer information\n");
> --> 219         for (int i = 0; i < coredump->adev->num_rings; i++) {
>                                      ^^^^^^^^^^^^^^
> Unchecked dereference
Agree
>
>      220                 int j = 0;
>      221                 struct amdgpu_ring *ring = coredump->adev->rings[i];
>      222
>      223                 drm_printf(&p, "ring name: %s\n", ring->name);
>      224                 drm_printf(&p, "Rptr: 0x%llx Wptr: 0x%llx RB mask: %x\n",
>      225                            amdgpu_ring_get_rptr(ring),
>      226                            amdgpu_ring_get_wptr(ring),
>      227                            ring->buf_mask);
>      228                 drm_printf(&p, "Ring size in dwords: %d\n",
>      229                            ring->ring_size / 4);
>      230                 drm_printf(&p, "Ring contents\n");
>      231                 drm_printf(&p, "Offset \t Value\n");
>      232
>      233                 while (j < ring->ring_size) {
>      234                         drm_printf(&p, "0x%x \t 0x%x\n", j, ring->ring[j/4]);
>      235                         j += 4;
>      236                 }
>      237         }
>      238
>      239         if (coredump->reset_vram_lost)
>      240                 drm_printf(&p, "VRAM is lost due to GPU reset!\n");
>      241         if (coredump->adev->reset_info.num_regs) {
>                      ^^^^^^^^^^^^^^
> Here too
Agree.
>
>      242                 drm_printf(&p, "AMDGPU register dumps:\nOffset:     Value:\n");
>      243
>      244                 for (i = 0; i < coredump->adev->reset_info.num_regs; i++)
>      245                         drm_printf(&p, "0x%08x: 0x%08x\n",
>      246                                    coredump->adev->reset_info.reset_dump_reg_list[i],
>      247                                    coredump->adev->reset_info.reset_dump_reg_value[i]);
>      248         }
>      249
>      250         return count - iter.remain;
>      251 }


Although adev is a global structure and never in the code it is being 
checked for NULL as it wont be NULL until the driver is unloaded. I can 
add a check  for adev in the beginning of the function 
amdgpu_devcoredump_read for completeness of the tool but still not very 
sure of it.

Christian and Alex Do you agree with my understanding the adev does 
really need a validation for NULL. I dint see throughout the code adev 
to be validated for NULL. Do you recommend to add a check for NULL for 
adev in the above mentioned function/places.

Regards Sunil

>
> regards,
> dan carpenter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20240316/e7d8678d/attachment.htm>


More information about the amd-gfx mailing list