[PATCH v3 2/2] drm/xe/devcoredump: Remove IS_ERR_OR_NULL check for kzalloc

Mon Feb 24 21:38:40 UTC 2025

On Thu, Feb 20, 2025 at 05:36:19PM -0800, John Harrison wrote:
>On 2/20/2025 15:54, Lucas De Marchi wrote:
>>On Thu, Feb 20, 2025 at 05:29:56PM +0100, Michal Wajdeczko wrote:
>>>On 20.02.2025 01:17, Shuicheng Lin wrote:
>>>>kzalloc returns a valid pointer or NULL if the allocation fails.
>>>>It never returns an error pointer. It is better to check for 
>>>>NULL directly.
>>>>
>>>>Signed-off-by: Shuicheng Lin <shuicheng.lin at intel.com>
>>>>Cc: John Harrison <John.C.Harrison at Intel.com>
>>>>Cc: Lucas De Marchi <lucas.demarchi at intel.com>
>>>>---
>>>> drivers/gpu/drm/xe/xe_devcoredump.c | 4 ++--
>>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>>diff --git a/drivers/gpu/drm/xe/xe_devcoredump.c 
>>>>b/drivers/gpu/drm/xe/xe_devcoredump.c
>>>>index 60d15e455017..81b9d9bb3f57 100644
>>>>--- a/drivers/gpu/drm/xe/xe_devcoredump.c
>>>>+++ b/drivers/gpu/drm/xe/xe_devcoredump.c
>>>>@@ -426,8 +426,8 @@ void xe_print_blob_ascii85(struct 
>>>>drm_printer *p, const char *prefix, char suffi
>>>>         drm_printf(p, "Offset not word aligned: %zu", offset);
>>>>
>>>>     line_buff = kzalloc(DMESG_MAX_LINE_LEN, GFP_KERNEL);
>>>>-    if (IS_ERR_OR_NULL(line_buff)) {
>>>>-        drm_printf(p, "Failed to allocate line buffer: %pe", 
>>>>line_buff);
>>>>+    if (!line_buff) {
>>>>+        drm_printf(p, "Failed to allocate line buffer\n");
>>>
>>>btw, since this line will be included in the output, where one could
>>>expect ascii85 data, shouldn't we print that diagnostic message with
>>>some special prefix to make it clear there is nothing to parse? like
>>>
>>>    "# Failed to allocate internal data\n"
>>>
>>>also since caller may have already provided a prefix, shouldn't we also
>>>include it in this diagnostic message?
>>>
>>>    "%s%s# Failed to allocate internal data\n",
>>>    prefix ?: "",
>>>    prefix ? ": " : ""
>>
>>or stop printing and return an error. we are missing the `.error: ...`
>>already that is used in other places.
>>
>>$ git grep '\.error: ' -- drivers/gpu/drm/xe
>>drivers/gpu/drm/xe/xe_vm.c:             drm_printf(p, "[0].error: 
>>%li\n", PTR_ERR(snap));
>>drivers/gpu/drm/xe/xe_vm.c:                     drm_printf(p, 
>>"[%llx].error: %li\n", snap->snap[i].ofs,
>This is the place that should be printing an error. The whole point of 
>this helper is that it wraps up all the blob output. However, do we 

note that this is not printing an error in the log. This is adding the
error message in the place that is supposed to have the *data* for that
key. That's why there was supposed to be a .error key to accompany this
behavior.  Right now if you look only at the devcoredump you have no
clue the data is actually an error message, not real data.

>need to distinguish between a non-capture-process error (e.g. bad VM 
>object) versus an error in the capture itself (e.g. out of memory 
>converting the binary data to a text string)?
>
>Not sure what error routes there are in the VM capture? Are they 
>things that are important to include in the devcoredump because they 
>have significant meaning about what caused the hang? Or are the only 
>possible errors related to the capture process itself - failing to 
>allocate memory to store the capture or such?
>
>If the only errors are capture related then yes, just change this line 
>to print "[%prefix].error: %errno\n". But if there is use to 
>distinguish between bad VM objects and failed captures, then maybe 
>this one should be "[%prefix].capture_error: %errno\n" or something?

-ENOMEM vs something else would already be a very good indicative.

This discussion can continue. For now applying these patches that are
orthogonal.

Applied both to drm-xe-next.

thanks,
Lucas De Marchi

>
>John.
>
>
>>
>>Lucas De Marchi
>>
>>
>>
>>
>>>
>>>>         return;
>>>>     }
>>>>
>>>
>