<!DOCTYPE html>
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    Am 15.03.24 um 20:08 schrieb Khatri, Sunil:<br>
    <blockquote type="cite"
      cite="mid:72df7acd-4019-4159-b9c7-ffa8cda9e86a@amd.com">
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      <p>Thanks for pointing these. I do have some doubt and i raised
        inline.<br>
      </p>
      <div class="moz-cite-prefix">On 3/15/2024 8:46 PM, Dan Carpenter
        wrote:<br>
      </div>
      <blockquote type="cite"
        cite="mid:9a7c7f33-dd72-4fe0-918b-00b920f7635d@moroto.mountain">
        <pre class="moz-quote-pre" wrap="">Hello Sunil Khatri,

Commit 42742cc541bb ("drm/amdgpu: add ring buffer information in
devcoredump") from Mar 11, 2024 (linux-next), leads to the following
Smatch static checker warning:

        drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c:219 amdgpu_devcoredump_read()
        error: we previously assumed 'coredump->adev' could be null (see line 206)

drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c
    171 static ssize_t
    172 amdgpu_devcoredump_read(char *buffer, loff_t offset, size_t count,
    173                         void *data, size_t datalen)
    174 {
    175         struct drm_printer p;
    176         struct amdgpu_coredump_info *coredump = data;
    177         struct drm_print_iterator iter;
    178         int i;
    179 
    180         iter.data = buffer;
    181         iter.offset = 0;
    182         iter.start = offset;
    183         iter.remain = count;
    184 
    185         p = drm_coredump_printer(&iter);
    186 
    187         drm_printf(&p, "**** AMDGPU Device Coredump ****\n");
    188         drm_printf(&p, "version: " AMDGPU_COREDUMP_VERSION "\n");
    189         drm_printf(&p, "kernel: " UTS_RELEASE "\n");
    190         drm_printf(&p, "module: " KBUILD_MODNAME "\n");
    191         drm_printf(&p, "time: %lld.%09ld\n", coredump->reset_time.tv_sec,
    192                         coredump->reset_time.tv_nsec);
    193 
    194         if (coredump->reset_task_info.pid)
    195                 drm_printf(&p, "process_name: %s PID: %d\n",
    196                            coredump->reset_task_info.process_name,
    197                            coredump->reset_task_info.pid);
    198 
    199         if (coredump->ring) {
    200                 drm_printf(&p, "\nRing timed out details\n");
    201                 drm_printf(&p, "IP Type: %d Ring Name: %s\n",
    202                            coredump->ring->funcs->type,
    203                            coredump->ring->name);
    204         }
    205 
    206         if (coredump->adev) {
                    ^^^^^^^^^^^^^^
Check for NULL</pre>
      </blockquote>
      This is the check for NULL. Is there any issue here ?<br>
      <blockquote type="cite"
        cite="mid:9a7c7f33-dd72-4fe0-918b-00b920f7635d@moroto.mountain">
        <pre class="moz-quote-pre" wrap="">    207                 struct amdgpu_vm_fault_info *fault_info =
    208                         &coredump->adev->vm_manager.fault_info;
    209 
    210                 drm_printf(&p, "\n[%s] Page fault observed\n",
    211                            fault_info->vmhub ? "mmhub" : "gfxhub");
    212                 drm_printf(&p, "Faulty page starting at address: 0x%016llx\n",
    213                            fault_info->addr);
    214                 drm_printf(&p, "Protection fault status register: 0x%x\n\n",
    215                            fault_info->status);
    216         }
    217 
    218         drm_printf(&p, "Ring buffer information\n");
--> 219         for (int i = 0; i < coredump->adev->num_rings; i++) {
                                    ^^^^^^^^^^^^^^
Unchecked dereference</pre>
      </blockquote>
      Agree<br>
      <blockquote type="cite"
        cite="mid:9a7c7f33-dd72-4fe0-918b-00b920f7635d@moroto.mountain">
        <pre class="moz-quote-pre" wrap="">    220                 int j = 0;
    221                 struct amdgpu_ring *ring = coredump->adev->rings[i];
    222 
    223                 drm_printf(&p, "ring name: %s\n", ring->name);
    224                 drm_printf(&p, "Rptr: 0x%llx Wptr: 0x%llx RB mask: %x\n",
    225                            amdgpu_ring_get_rptr(ring),
    226                            amdgpu_ring_get_wptr(ring),
    227                            ring->buf_mask);
    228                 drm_printf(&p, "Ring size in dwords: %d\n",
    229                            ring->ring_size / 4);
    230                 drm_printf(&p, "Ring contents\n");
    231                 drm_printf(&p, "Offset \t Value\n");
    232 
    233                 while (j < ring->ring_size) {
    234                         drm_printf(&p, "0x%x \t 0x%x\n", j, ring->ring[j/4]);
    235                         j += 4;
    236                 }
    237         }
    238 
    239         if (coredump->reset_vram_lost)
    240                 drm_printf(&p, "VRAM is lost due to GPU reset!\n");
    241         if (coredump->adev->reset_info.num_regs) {
                    ^^^^^^^^^^^^^^
Here too</pre>
      </blockquote>
      Agree.<br>
      <blockquote type="cite"
        cite="mid:9a7c7f33-dd72-4fe0-918b-00b920f7635d@moroto.mountain">
        <pre class="moz-quote-pre" wrap="">    242                 drm_printf(&p, "AMDGPU register dumps:\nOffset:     Value:\n");
    243 
    244                 for (i = 0; i < coredump->adev->reset_info.num_regs; i++)
    245                         drm_printf(&p, "0x%08x: 0x%08x\n",
    246                                    coredump->adev->reset_info.reset_dump_reg_list[i],
    247                                    coredump->adev->reset_info.reset_dump_reg_value[i]);
    248         }
    249 
    250         return count - iter.remain;
    251 }</pre>
      </blockquote>
      <p><br>
      </p>
      <p>Although adev is a global structure and never in the code it is
        being checked for NULL as it wont be NULL until the driver is
        unloaded. I can add a checkĀ  for adev in the beginning of the
        function <span style="white-space: pre-wrap">amdgpu_devcoredump_read for completeness of the tool but still not very sure of it.</span></p>
      <p><span style="white-space: pre-wrap">Christian and Alex
Do you agree with my understanding the adev does really need a validation for NULL. I dint see throughout the code adev to be validated for NULL. Do you recommend to add a check for NULL for adev in the above mentioned function/places.</span></p>
    </blockquote>
    <br>
    No, that doesn't make sense. adev is mandatory to be around for the
    core dump to be valid and accessible.<br>
    <br>
    Without a device you don't have a core dump in the first place.<br>
    <br>
    Regards,<br>
    Christian.<br>
    <br>
    <blockquote type="cite"
      cite="mid:72df7acd-4019-4159-b9c7-ffa8cda9e86a@amd.com">
      <p><span style="white-space: pre-wrap">Regards
Sunil
</span></p>
      <blockquote type="cite"
        cite="mid:9a7c7f33-dd72-4fe0-918b-00b920f7635d@moroto.mountain">
        <pre class="moz-quote-pre" wrap="">regards,
dan carpenter
</pre>
      </blockquote>
    </blockquote>
    <br>
  </body>
</html>