[PATCH] drm/amdgpu: trace amdgpu_job fence details

Christian König christian.koenig at amd.com
Fri Feb 24 08:13:35 UTC 2017


Am 23.02.2017 um 21:48 schrieb Andres Rodriguez:
>
>
> On 2017-02-23 02:46 PM, Andres Rodriguez wrote:
>>
>>
>> On 2017-02-23 03:20 AM, Christian König wrote:
>>> Am 23.02.2017 um 00:47 schrieb Andres Rodriguez:
>>>> This trace is intended to provide the required information to 
>>>> associate
>>>> the completion of an amdgpu_job with its corresponding dma_fence_*
>>>> tracepoints.
>>>>
>>>> Signed-off-by: Andres Rodriguez <andresx7 at gmail.com>
>>>> ---
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c   |  2 ++
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 22 
>>>> ++++++++++++++++++++++
>>>>   2 files changed, 24 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>> index 86a1242..84a04e4 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>> @@ -177,6 +177,8 @@ static struct dma_fence *amdgpu_job_run(struct 
>>>> amd_sched_job *sched_job)
>>>>       /* if gpu reset, hw fence will be replaced here */
>>>>       dma_fence_put(job->fence);
>>>>       job->fence = dma_fence_get(fence);
>>>> +    trace_amdgpu_job_attach_fence(job);
>>>> +
>>>>       amdgpu_job_free_resources(job);
>>>>       return fence;
>>>>   }
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h 
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
>>>> index a18ae1e..0a61ed9 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h
>>>> @@ -147,6 +147,28 @@ TRACE_EVENT(amdgpu_sched_run_job,
>>>>                 __entry->fence, __entry->ring_name, __entry->num_ibs)
>>>>   );
>>>>   +TRACE_EVENT(amdgpu_job_attach_fence,
>>>> +        TP_PROTO(struct amdgpu_job *job),
>>>> +        TP_ARGS(job),
>>>> +        TP_STRUCT__entry(
>>>> +                 __field(struct amdgpu_device *, adev)
>>>> +                 __field(struct amd_sched_job *, sched_job)
>>>
>>> Neither add adev nor sched_job pointer to your trace point.
>> Would it be okay then to grab a reference to the scheduler fence in 
>> amdgpu_job_run and print the sched fence pointer here?
>>
>> That would give us enough info to correlate other trace points like 
>> amdgpu_cs_ioctl aor amdgpu_sched_run_job to the respective dma_fence 
>> tracepoints.
>>
>>
>>>
>>> The adev pointer is completely meaningless in the logs and the 
>>> scheduler job might already be freed when the printk is called.
> Hey Christian,
>
> I was just double checking the lifetime of the scheduler job, and it 
> should still be valid during amdgpu_job_run. The lifetime of the job 
> is tied to the scheduler finished fence which won't be signaled until 
> amdgpu_job_run finishes even if the HW fence has signaled.
>
> This is assuming that the trace printks are synchronous.

That assumption is incorrect.

The parameters are stored on a ring buffer when the trace point is 
executed, but the string is composed later on when you request the trace 
content from userspace.

Additional to that printing pointers which are heavily reallocated is a 
bad idea because once the job is freed the pointer will certainly be 
reused for another job.

Using the numbers from the scheduler fence is the right approach here 
and you don't even need to grab another reference, just save the context 
and sequence number as identifier for the job.

Regards,
Christian.

>
> So I think printing the job and dropping adev should be okay if the 
> above holds true, unless I'm missing something else.
>
>>>
>>> Just checked the source and a couple of other trace points get this 
>>> horrible wrong as well, so that isn't your fault. Going to put it on 
>>> my todo to fix those.
>>>
>>> Regards,
>>> Christian.
>>>
>>>> +                 __string(timeline, 
>>>> job->fence->ops->get_timeline_name(job->fence))
>>>> +                 __field(unsigned int, context)
>>>> +                 __field(unsigned int, seqno)
>>>> +                 ),
>>>> +
>>>> +        TP_fast_assign(
>>>> +               __entry->adev = job->adev;
>>>> +               __entry->sched_job = &job->base;
>>>> +               __assign_str(timeline, 
>>>> job->fence->ops->get_timeline_name(job->fence))
>>>> +               __entry->context = job->fence->context;
>>>> +               __entry->seqno = job->fence->seqno;
>>>> +               ),
>>>> +        TP_printk("adev=%p, sched_job=%p, timeline:%s, context:%u, 
>>>> seqno:%u",
>>>> +              __entry->adev, __entry->sched_job,
>>>> +              __get_str(timeline), __entry->context, __entry->seqno)
>>>> +);
>>>>     TRACE_EVENT(amdgpu_vm_grab_id,
>>>>           TP_PROTO(struct amdgpu_vm *vm, int ring, struct 
>>>> amdgpu_job *job),
>>>
>>>
>>
>



More information about the amd-gfx mailing list