[PATCH] drm/amdgpu: Fix the null pointer issue for tdr

Mon Nov 18 16:23:58 UTC 2019

Can you explain why ? As I see it - list_empty_careful is specifically 
designed for the case where the only other concurrent operation in 
progress is list_del_init 
(https://www.kernel.org/doc/htmldocs/kernel-api/API-list-empty-careful.html) 
- which is exactly what happens in this patch, no other list altering 
operation can take place concurrently - so it looks safe to use for me.

Andrey

On 11/18/19 11:16 AM, Christian König wrote:
> Hi Andrey,
>
> the only thing which doesn't looks so good is the switch to 
> list_empty_careful in drm_sched_cleanup_jobs.
>
> We either take the lock here or we don't, but please not that extra 
> checking.
>
> Christian.
>
> Am 18.11.19 um 15:07 schrieb Andrey Grodzovsky:
>> Thanks Emily.
>>
>> Christan - ping for review.
>>
>> Andrey
>>
>> On 11/14/19 11:39 PM, Deng, Emily wrote:
>>> Hi Andrey,
>>>       Currently, I am busying with another issue, maybe will try 
>>> next week.
>>>
>>> Best wishes
>>> Emily Deng
>>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: Grodzovsky, Andrey <Andrey.Grodzovsky at amd.com>
>>>> Sent: Friday, November 15, 2019 6:14 AM
>>>> To: Koenig, Christian <Christian.Koenig at amd.com>; Deng, Emily
>>>> <Emily.Deng at amd.com>; amd-gfx at lists.freedesktop.org
>>>> Subject: Re: [PATCH] drm/amdgpu: Fix the null pointer issue for tdr
>>>>
>>>> Attached.
>>>>
>>>> Emily - can you give it a try ?
>>>>
>>>> Andrey
>>>>
>>>> On 11/14/19 3:12 AM, Christian König wrote:
>>>>>> What about instead of peeking at the job to actually remove it from
>>>>>> ring_mirror_list right there,
>>>>> Also an interesting idea. We would need to protect the mirror list
>>>>> with a lock again, but that should be the lesser evil.
>>>>>
>>>>> Maybe prototype that and see if it works or not.
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>> Am 13.11.19 um 17:00 schrieb Andrey Grodzovsky:
>>>>>>
>>>>>> On 11/13/19 9:20 AM, Christian König wrote:
>>>>>>> Another more fundamental question: Could we get rid of the timeout
>>>>>>> job at all?
>>>>>>
>>>>>> There are other stuff there besides picking the first unfinished job
>>>>>> which is common for all the drivers - such as freeing guilty 
>>>>>> signaled
>>>>>> job and rearming the timeout work timer.
>>>>>>
>>>>>>
>>>>>>> I mean we used to give this as parameter to the scheduler callback
>>>>>>> because we had the timeout worker in the job, but that is no longer
>>>>>>> the case.
>>>>>>>
>>>>>>> E.g. in drm_sched_job_timedout() we do the following:
>>>>>>>>          job = list_first_entry_or_null(&sched->ring_mirror_list,
>>>>>>>>                                         struct drm_sched_job, 
>>>>>>>> node);
>>>>>>> Why don't we just remove that here and only get the first job after
>>>>>>> we have stopped the scheduler?
>>>>>>
>>>>>> Should be ok since we have the extra check for __kthread_should_park
>>>>>> in drm_sched_cleanup_jobs which will protect us in this case from a
>>>>>> wakeup of sched thread and execution of in drm_sched_cleanup_jobs
>>>>>> after we already parked it. The problem here is we need the
>>>>>> drm_sched_job to access the private data for each client driver (see
>>>>>> amdgpu_job_timedout for example). What about instead of peeking at
>>>>>> the job to actually remove it from ring_mirror_list right there, go
>>>>>> ahead with it through the reset routine, if it's signaled in the
>>>>>> meanwhile that great - release it, otherwise put it back into
>>>>>> ring_mirror_list in drm_sched_resubmit_jobs.
>>>>>>
>>>>>> Andrey
>>>>>>
>>>>>>
>>>>>>> Regards,
>>>>>>> Christian.
>>>>>>>
>>>>>>> Am 13.11.19 um 15:12 schrieb Andrey Grodzovsky:
>>>>>>>> This why I asked for a trace with timer enabled, but since 
>>>>>>>> there is
>>>>>>>> a finite number of places we touch the timer Emily can just put
>>>>>>>> prints there. Also, I wonder if this temp fix helps her with the
>>>>>>>> issue or not.
>>>>>>>>
>>>>>>>> Andrey
>>>>>>>>
>>>>>>>> On 11/13/19 2:36 AM, Christian König wrote:
>>>>>>>>> The question is where do we rearm the timer for this problem to
>>>>>>>>> occur?
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Christian.
>>>>>>>>>
>>>>>>>>> Am 12.11.19 um 20:21 schrieb Andrey Grodzovsky:
>>>>>>>>>> I was able to reproduce the crash by using the attached
>>>>>>>>>> simulate_crash.patch - waiting on guilty job to signal in reset
>>>>>>>>>> work and artificially rearming the timeout timer just before the
>>>>>>>>>> check for !cancel_delayed_work(&sched->work_tdr) in
>>>>>>>>>> drm_sched_cleanup_jobs - crash log attached in crash.log. This I
>>>>>>>>>> think confirms my theory i described earlier in this thread.
>>>>>>>>>>
>>>>>>>>>> basic_fix.patch handles this by testing whether another timer
>>>>>>>>>> already armed ob this scheduler or is there a timeout work in
>>>>>>>>>> execution right now (see documentation for work_busy) - 
>>>>>>>>>> obviously
>>>>>>>>>> this is not a full solution as this will not protect from races
>>>>>>>>>> if for example there is immediate work scheduling such as in
>>>>>>>>>> drm_sched_fault -  so we probably need to account for this by
>>>>>>>>>> making drm_sched_cleanup_jobs (at least in the part where it
>>>>>>>>>> iterates ring mirror list and frees jobs) and GPU reset really
>>>>>>>>>> mutually exclusive and not like now.
>>>>>>>>>>
>>>>>>>>>> Andrey
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 11/11/19 4:11 PM, Christian König wrote:
>>>>>>>>>>> Hi Emily,
>>>>>>>>>>>
>>>>>>>>>>> you need to print which scheduler instance is freeing the jobs
>>>>>>>>>>> and which one is triggering the reset. The TID and PID is
>>>>>>>>>>> completely meaningless here since we are called from different
>>>>>>>>>>> worker threads and the TID/PID can change on each call.
>>>>>>>>>>>
>>>>>>>>>>> Apart from that I will look into this a bit deeper when I have
>>>>>>>>>>> time.
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Christian.
>>>>>>>>>>>
>>>>>>>>>>> Am 12.11.19 um 07:02 schrieb Deng, Emily:
>>>>>>>>>>>> Hi Christian,
>>>>>>>>>>>>      I add the follow print in function 
>>>>>>>>>>>> drm_sched_cleanup_jobs.
>>>>>>>>>>>>  From the log it shows that only use cancel_delayed_work could
>>>>>>>>>>>> not avoid to free job when the sched is in reset. But don’t
>>>>>>>>>>>> know exactly where it is wrong about the driver. Do you have
>>>>>>>>>>>> any suggestion about this?
>>>>>>>>>>>> + printk("Emily:drm_sched_cleanup_jobs:begin,tid:%lu,
>>>>>>>>>>>> pid:%lu\n", current->tgid, current->pid);
>>>>>>>>>>>>          /*
>>>>>>>>>>>>           * Don't destroy jobs while the timeout worker is
>>>>>>>>>>>> running  OR thread
>>>>>>>>>>>>           * is being parked and hence assumed to not touch
>>>>>>>>>>>> ring_mirror_list
>>>>>>>>>>>>           */
>>>>>>>>>>>>           if ((sched->timeout != MAX_SCHEDULE_TIMEOUT &&
>>>>>>>>>>>> !cancel_delayed_work(&sched->work_tdr)))
>>>>>>>>>>>>                  return;
>>>>>>>>>>>> + printk("Emily:drm_sched_cleanup_jobs,tid:%lu, pid:%lu\n",
>>>>>>>>>>>> current->tgid, current->pid);
>>>>>>>>>>>> Best wishes
>>>>>>>>>>>> Emily Deng
>>>>>>>>>>>> Nov 12 12:58:20 ubuntu-drop-August-2018-rc2-gpu0-vf02 kernel:
>>>>>>>>>>>> [11380.695091] Emily:drm_sched_cleanup_jobs:begin,tid:2262,
>>>>>>>>>>>> pid:2262
>>>>>>>>>>>> Nov 12 12:58:20 ubuntu-drop-August-2018-rc2-gpu0-vf02 kernel:
>>>>>>>>>>>> [11380.695104] Emily:drm_sched_cleanup_jobs:begin,tid:2262,
>>>>>>>>>>>> pid:2262
>>>>>>>>>>>> Nov 12 12:58:20 ubuntu-drop-August-2018-rc2-gpu0-vf02 kernel:
>>>>>>>>>>>> [11380.695105] Emily:drm_sched_cleanup_jobs,tid:2262, pid:2262
>>>>>>>>>>>> Nov 12 12:58:20 ubuntu-drop-August-2018-rc2-gpu0-vf02 kernel:
>>>>>>>>>>>> [11380.695107] Emily:drm_sched_cleanup_jobs:begin,tid:2262,
>>>>>>>>>>>> pid:2262
>>>>>>>>>>>> Nov 12 12:58:20 ubuntu-drop-August-2018-rc2-gpu0-vf02 kernel:
>>>>>>>>>>>> [11380.695107] Emily:drm_sched_cleanup_jobs,tid:2262, pid:2262
>>>>>>>>>>>> Nov 12 12:58:20 ubuntu-drop-August-2018-rc2-gpu0-vf02 kernel:
>>>>>>>>>>>> [11381.222954] [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
>>>> ring
>>>>>>>>>>>> sdma0 timeout, signaled seq=78585, emitted seq=78587 Nov 12
>>>>>>>>>>>> 12:58:20 ubuntu-drop-August-2018-rc2-gpu0-vf02 kernel:
>>>>>>>>>>>> [11381.224275] [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
>>>>>>>>>>>> Process information: process pid 0 thread pid 0,
>>>>>>>>>>>> s_job:00000000fe75ab36,tid=15603, pid=15603 Nov 12 12:58:20
>>>>>>>>>>>> ubuntu-drop-August-2018-rc2-gpu0-vf02 kernel:
>>>>>>>>>>>> [11381.225413] amdgpu 0000:00:08.0: GPU reset begin!
>>>>>>>>>>>> Nov 12 12:58:20 ubuntu-drop-August-2018-rc2-gpu0-vf02 kernel:
>>>>>>>>>>>> [11381.225417] Emily:drm_sched_cleanup_jobs:begin,tid:2262,
>>>>>>>>>>>> pid:2262
>>>>>>>>>>>> Nov 12 12:58:20 ubuntu-drop-August-2018-rc2-gpu0-vf02 kernel:
>>>>>>>>>>>> [11381.225425] Emily:drm_sched_cleanup_jobs:begin,tid:2262,
>>>>>>>>>>>> pid:2262
>>>>>>>>>>>> Nov 12 12:58:20 ubuntu-drop-August-2018-rc2-gpu0-vf02 kernel:
>>>>>>>>>>>> [11381.225425] Emily:drm_sched_cleanup_jobs,tid:2262, pid:2262
>>>>>>>>>>>> Nov 12 12:58:20 ubuntu-drop-August-2018-rc2-gpu0-vf02 kernel:
>>>>>>>>>>>> [11381.225428] Emily:amdgpu_job_free_cb,Process information:
>>>>>>>>>>>> process  pid 0 thread  pid 0, s_job:00000000fe75ab36, 
>>>>>>>>>>>> tid:2262,
>>>>>>>>>>>> pid:2262
>>>>>>>>>>>> Nov 12 12:58:20 ubuntu-drop-August-2018-rc2-gpu0-vf02 kernel:
>>>>>>>>>>>> [11381.225429] Emily:drm_sched_cleanup_jobs:begin,tid:2262,
>>>>>>>>>>>> pid:2262
>>>>>>>>>>>> Nov 12 12:58:20 ubuntu-drop-August-2018-rc2-gpu0-vf02 kernel:
>>>>>>>>>>>> [11381.225430] Emily:drm_sched_cleanup_jobs,tid:2262, pid:2262
>>>>>>>>>>>> Nov 12 12:58:20 ubuntu-drop-August-2018-rc2-gpu0-vf02 kernel:
>>>>>>>>>>>> [11381.225473] Emily:drm_sched_cleanup_jobs:begin,tid:2253,
>>>>>>>>>>>> pid:2253
>>>>>>>>>>>> Nov 12 12:58:20 ubuntu-drop-August-2018-rc2-gpu0-vf02 kernel:
>>>>>>>>>>>> [11381.225486] Emily:drm_sched_cleanup_jobs:begin,tid:2262,
>>>>>>>>>>>> pid:2262
>>>>>>>>>>>> Nov 12 12:58:20 ubuntu-drop-August-2018-rc2-gpu0-vf02 kernel:
>>>>>>>>>>>> [11381.225489] Emily:drm_sched_cleanup_jobs,tid:2262, pid:2262
>>>>>>>>>>>> Nov 12 12:58:20 ubuntu-drop-August-2018-rc2-gpu0-vf02 kernel:
>>>>>>>>>>>> [11381.225494] Emily:amdgpu_job_free_cb,Process information:
>>>>>>>>>>>> process  pid 0 thread  pid 0, s_job:00000000f086ec84, 
>>>>>>>>>>>> tid:2262,
>>>>>>>>>>>> pid:2262
>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>> From: Grodzovsky, Andrey <Andrey.Grodzovsky at amd.com>
>>>>>>>>>>>>> Sent: Tuesday, November 12, 2019 11:28 AM
>>>>>>>>>>>>> To: Koenig, Christian <Christian.Koenig at amd.com>; Deng, Emily
>>>>>>>>>>>>> <Emily.Deng at amd.com>; amd-gfx at lists.freedesktop.org
>>>>>>>>>>>>> Subject: Re: [PATCH] drm/amdgpu: Fix the null pointer issue
>>>>>>>>>>>> for tdr
>>>>>>>>>>>>> Thinking more about this claim - we assume here that if
>>>>>>>>>>>> cancel_delayed_work
>>>>>>>>>>>>> returned true it guarantees that timeout work is not running
>>>>>>>>>>>> but, it merely
>>>>>>>>>>>>> means there was a pending timeout work which was removed
>>>> from
>>>>>>>>>>>>> the workqueue before it's timer elapsed and so it didn't have
>>>>>>>>>>>>> a
>>>>>>>>>>>> chance to be
>>>>>>>>>>>>> dequeued and executed, it doesn't cover already executing
>>>>>>>>>>>> work. So there is a
>>>>>>>>>>>>> possibility where while timeout work started executing 
>>>>>>>>>>>>> another
>>>>>>>>>>>> timeout work
>>>>>>>>>>>>> already got enqueued (maybe through earlier cleanup jobs or
>>>>>>>>>>>> through
>>>>>>>>>>>>> drm_sched_fault) and if at this point another
>>>>>>>>>>>> drm_sched_cleanup_jobs runs
>>>>>>>>>>>>> cancel_delayed_work(&sched->work_tdr) will return true even
>>>>>>>>>>>> while there is a
>>>>>>>>>>>>> timeout job in progress.
>>>>>>>>>>>>> Unfortunately we cannot change cancel_delayed_work to
>>>>>>>>>>>>> cancel_delayed_work_sync to flush the timeout work as timeout
>>>>>>>>>>>> work itself
>>>>>>>>>>>>> waits for schedule thread  to be parked again when calling
>>>>>>>>>>>> park_thread.
>>>>>>>>>>>>> Andrey
>>>>>>>>>>>>>
>>>>>>>>>>>>> ________________________________________
>>>>>>>>>>>>> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org> on
>>>>>>>>>>>> behalf of
>>>>>>>>>>>>> Koenig, Christian <Christian.Koenig at amd.com>
>>>>>>>>>>>>> Sent: 08 November 2019 05:35:18
>>>>>>>>>>>>> To: Deng, Emily; amd-gfx at lists.freedesktop.org
>>>>>>>>>>>>> Subject: Re: [PATCH] drm/amdgpu: Fix the null pointer issue
>>>>>>>>>>>> for tdr
>>>>>>>>>>>>> Hi Emily,
>>>>>>>>>>>>>
>>>>>>>>>>>>> exactly that can't happen. See here:
>>>>>>>>>>>>>
>>>>>>>>>>>>>>           /* Don't destroy jobs while the timeout worker is
>>>>>>>>>>>> running */
>>>>>>>>>>>>>>           if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
>>>>>>>>>>>>>> !cancel_delayed_work(&sched->work_tdr))
>>>>>>>>>>>>>>                   return NULL;
>>>>>>>>>>>>> We never free jobs while the timeout working is running to
>>>>>>>>>>>> prevent exactly
>>>>>>>>>>>>> that issue.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Am 08.11.19 um 11:32 schrieb Deng, Emily:
>>>>>>>>>>>>>> Hi Christian,
>>>>>>>>>>>>>>         The drm_sched_job_timedout-> amdgpu_job_timedout 
>>>>>>>>>>>>>> call
>>>>>>>>>>>>> amdgpu_device_gpu_recover. I mean the main scheduler free the
>>>>>>>>>>>> jobs while
>>>>>>>>>>>>> in amdgpu_device_gpu_recover, and before calling
>>>> drm_sched_stop.
>>>>>>>>>>>>>> Best wishes
>>>>>>>>>>>>>> Emily Deng
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>> From: Koenig, Christian <Christian.Koenig at amd.com>
>>>>>>>>>>>>>>> Sent: Friday, November 8, 2019 6:26 PM
>>>>>>>>>>>>>>> To: Deng, Emily <Emily.Deng at amd.com>;
>>>>>>>>>>>> amd-gfx at lists.freedesktop.org
>>>>>>>>>>>>>>> Subject: Re: [PATCH] drm/amdgpu: Fix the null pointer issue
>>>>>>>>>>>> for tdr
>>>>>>>>>>>>>>> Hi Emily,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> well who is calling amdgpu_device_gpu_recover() in this 
>>>>>>>>>>>>>>> case?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> When it's not the scheduler we shouldn't have a guilty job
>>>>>>>>>>>> in the first place.
>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Am 08.11.19 um 11:22 schrieb Deng, Emily:
>>>>>>>>>>>>>>>> Hi Chrisitan,
>>>>>>>>>>>>>>>>          No, I am with the new branch and also has the
>>>>>>>>>>>> patch. Even it
>>>>>>>>>>>>>>>> are freed by
>>>>>>>>>>>>>>> main scheduler, how we could avoid main scheduler to free
>>>>>>>>>>>> jobs while
>>>>>>>>>>>>>>> enter to function amdgpu_device_gpu_recover?
>>>>>>>>>>>>>>>> Best wishes
>>>>>>>>>>>>>>>> Emily Deng
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>>>> From: Koenig, Christian <Christian.Koenig at amd.com>
>>>>>>>>>>>>>>>>> Sent: Friday, November 8, 2019 6:15 PM
>>>>>>>>>>>>>>>>> To: Deng, Emily <Emily.Deng at amd.com>; amd-
>>>>>>>>>>>>> gfx at lists.freedesktop.org
>>>>>>>>>>>>>>>>> Subject: Re: [PATCH] drm/amdgpu: Fix the null pointer
>>>>>>>>>>>> issue for tdr
>>>>>>>>>>>>>>>>> Hi Emily,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> in this case you are on an old code branch.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Jobs are freed now by the main scheduler thread and only
>>>>>>>>>>>> if no
>>>>>>>>>>>>>>>>> timeout handler is running.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> See this patch here:
>>>>>>>>>>>>>>>>>> commit 5918045c4ed492fb5813f980dcf89a90fefd0a4e
>>>>>>>>>>>>>>>>>> Author: Christian König <christian.koenig at amd.com>
>>>>>>>>>>>>>>>>>> Date:   Thu Apr 18 11:00:21 2019 -0400
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>         drm/scheduler: rework job destruction
>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Am 08.11.19 um 11:11 schrieb Deng, Emily:
>>>>>>>>>>>>>>>>>> Hi Christian,
>>>>>>>>>>>>>>>>>>           Please refer to follow log, when it enter to
>>>>>>>>>>>>>>>>>> amdgpu_device_gpu_recover
>>>>>>>>>>>>>>>>> function, the bad job 000000005086879e is freeing in
>>>>>>>>>>>> function
>>>>>>>>>>>>>>>>> amdgpu_job_free_cb at the same time, because of the
>>>>>>>>>>>> hardware fence
>>>>>>>>>>>>>>> signal.
>>>>>>>>>>>>>>>>> But amdgpu_device_gpu_recover goes faster, at this case,
>>>>>>>>>>>>>>>>> the s_fence is already freed, but job is not freed in 
>>>>>>>>>>>>>>>>> time.
>>>>>>>>>>>> Then this issue
>>>>>>>>>>>>> occurs.
>>>>>>>>>>>>>>>>>> [ 449.792189] [drm:amdgpu_job_timedout [amdgpu]]
>>>>>>>>>>>> *ERROR* ring
>>>>>>>>>>>>>>> sdma0
>>>>>>>>>>>>>>>>>> timeout, signaled seq=2481, emitted seq=2483 [
>>>>>>>>>>>>>>>>>> 449.793202] [drm:amdgpu_job_timedout [amdgpu]]
>>>> *ERROR*
>>>>>>>>>>>>>>>>>> Process
>>>>>>>>>>>> information:
>>>>>>>>>>>>>>>>> process  pid 0 thread pid 0, s_job:000000005086879e [
>>>>>>>>>>>> 449.794163]
>>>>>>>>>>>>>>>>> amdgpu
>>>>>>>>>>>>>>>>> 0000:00:08.0: GPU reset begin!
>>>>>>>>>>>>>>>>>> [ 449.794175] Emily:amdgpu_job_free_cb,Process
>>>>>>>>>>>> information:
>>>>>>>>>>>>>>>>>> process pid 0 thread  pid 0, s_job:000000005086879e [
>>>>>>>>>>>> 449.794221]
>>>>>>>>>>>>>>>>>> Emily:amdgpu_job_free_cb,Process information: process
>>>>>>>>>>>> pid 0
>>>>>>>>>>>>>>>>>> thread pid 0, s_job:0000000066eb74ab [ 449.794222]
>>>>>>>>>>>>>>>>>> Emily:amdgpu_job_free_cb,Process information: process
>>>>>>>>>>>> pid 0
>>>>>>>>>>>>>>>>>> thread pid 0, s_job:00000000d4438ad9 [ 449.794255]
>>>>>>>>>>>>>>>>>> Emily:amdgpu_job_free_cb,Process information: process
>>>>>>>>>>>> pid 0
>>>>>>>>>>>>>>>>>> thread pid 0, s_job:00000000b6d69c65 [ 449.794257]
>>>>>>>>>>>>>>>>>> Emily:amdgpu_job_free_cb,Process information: process
>>>>>>>>>>>> pid 0
>>>>>>>>>>>>>>>>>> thread pid 0,
>>>>>>>>>>>>>>>>> s_job:00000000ea85e922 [ 449.794287]
>>>>>>>>>>>>>>>>> Emily:amdgpu_job_free_cb,Process
>>>>>>>>>>>>>>>>> information: process pid 0 thread pid 0,
>>>>>>>>>>>> s_job:00000000ed3a5ac6 [
>>>>>>>>>>>>>>>>> 449.794366] BUG: unable to handle kernel NULL pointer
>>>>>>>>>>>> dereference
>>>>>>>>>>>>>>>>> at
>>>>>>>>>>>>>>>>> 00000000000000c0 [ 449.800818] PGD 0 P4D 0
>>>> [  449.801040]
>>>>>>>>>>>> Oops:
>>>>>>>>>>>>>>>>> 0000 [#1] SMP PTI
>>>>>>>>>>>>>>>>>> [ 449.801338] CPU: 3 PID: 55 Comm: kworker/3:1 Tainted:
>>>>>>>>>>>> G OE
>>>>>>>>>>>>>>>>> 4.18.0-15-generic #16~18.04.1-Ubuntu
>>>>>>>>>>>>>>>>>> [ 449.802157] Hardware name: QEMU Standard PC (i440FX
>>>> +
>>>>>>>>>>>> PIIX,
>>>>>>>>>>>>>>>>>> 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 [
>>>>>>>>>>>>>>>>>> 449.802944]
>>>>>>>>>>>>>>>>>> Workqueue: events drm_sched_job_timedout [amd_sched]
>>>> [
>>>>>>>>>>>>>>>>>> 449.803488]
>>>>>>>>>>>>>>> RIP:
>>>>>>>>>>>>>>>>> 0010:amdgpu_device_gpu_recover+0x1da/0xb60 [amdgpu]
>>>>>>>>>>>>>>>>>> [ 449.804020] Code: dd ff ff 49 39 c5 48 89 55 a8 0f 85
>>>>>>>>>>>> 56 ff ff
>>>>>>>>>>>>>>>>>> ff
>>>>>>>>>>>>>>>>>> 45 85 e4 0f
>>>>>>>>>>>>>>>>> 85 a1 00 00 00 48 8b 45 b0 48 85 c0 0f 84 60 01 00 00 48
>>>>>>>>>>>> 8b 40 10
>>>>>>>>>>>>>>>>> <48> 8b
>>>>>>>>>>>>>>> 98
>>>>>>>>>>>>>>>>> c0 00 00 00 48 85 db 0f 84 4c 01 00 00 48 8b 43
>>>>>>>>>>>> 48 a8 01
>>>>>>>>>>>>>>>>>> [ 449.805593] RSP: 0018:ffffb4c7c08f7d68 EFLAGS:
>>>>>>>>>>>> 00010286 [
>>>>>>>>>>>>>>>>>> 449.806032] RAX: 0000000000000000 RBX:
>>>> 0000000000000000
>>>>>>>>>>>> RCX:
>>>>>>>>>>>>>>>>>> 0000000000000000 [ 449.806625] RDX: ffffb4c7c08f5ac0
>>>> RSI:
>>>>>>>>>>>>>>>>>> 0000000fffffffe0 RDI: 0000000000000246 [ 449.807224]
>>>> RBP:
>>>>>>>>>>>>>>>>>> ffffb4c7c08f7de0 R08: 00000068b9d54000 R09:
>>>>>>>>>>>> 0000000000000000 [
>>>>>>>>>>>>>>>>>> 449.807818] R10: 0000000000000000 R11:
>>>> 0000000000000148
>>>>>>>>>>>> R12:
>>>>>>>>>>>>>>>>>> 0000000000000000 [ 449.808411] R13: ffffb4c7c08f7da0
>>>> R14:
>>>>>>>>>>>>>>>>>> ffff8d82b8525d40 R15: ffff8d82b8525d40 [ 449.809004] FS:
>>>>>>>>>>>>>>>>>> 0000000000000000(0000) GS:ffff8d82bfd80000(0000)
>>>>>>>>>>>>>>>>>> knlGS:0000000000000000 [ 449.809674] CS:  0010 DS: 0000
>>>>>>>>>>>> ES: 0000
>>>>>>>>>>>>> CR0:
>>>>>>>>>>>>>>>>>> 0000000080050033 [ 449.810153] CR2: 00000000000000c0
>>>> CR3:
>>>>>>>>>>>>>>>>>> 000000003cc0a001 CR4: 00000000003606e0 [ 449.810747]
>>>> DR0:
>>>>>>>>>>>>>>>>> 0000000000000000 DR1: 0000000000000000 DR2:
>>>>>>>>>>>> 0000000000000000 [
>>>>>>>>>>>>>>>>> 449.811344] DR3: 0000000000000000 DR6:
>>>> 00000000fffe0ff0 DR7:
>>>>>>>>>>>>>>>>> 0000000000000400 [ 449.811937] Call Trace:
>>>>>>>>>>>>>>>>>> [ 449.812206] amdgpu_job_timedout+0x114/0x140
>>>> [amdgpu]
>>>>>>>>>>>>>>>>>> [ 449.812635] drm_sched_job_timedout+0x44/0x90
>>>>>>>>>>>>>>>>>> [amd_sched] [ 449.813139]  ?
>>>>>>>>>>>>>>>>>> amdgpu_cgs_destroy_device+0x10/0x10
>>>>>>>>>>>> [amdgpu] [
>>>>>>>>>>>>>>>>>> 449.813609] ? drm_sched_job_timedout+0x44/0x90
>>>>>>>>>>>> [amd_sched] [
>>>>>>>>>>>>>>>>>> 449.814077] process_one_work+0x1fd/0x3f0 [ 449.814417]
>>>>>>>>>>>>>>>>>> worker_thread+0x34/0x410 [ 449.814728]
>>>>>>>>>>>> kthread+0x121/0x140 [
>>>>>>>>>>>>>>>>>> 449.815004] ? process_one_work+0x3f0/0x3f0 [
>>>>>>>>>>>> 449.815374]  ?
>>>>>>>>>>>>>>>>>> kthread_create_worker_on_cpu+0x70/0x70
>>>>>>>>>>>>>>>>>> [  449.815799] ret_from_fork+0x35/0x40
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>>>>>> From: Koenig, Christian <Christian.Koenig at amd.com>
>>>>>>>>>>>>>>>>>>> Sent: Friday, November 8, 2019 5:43 PM
>>>>>>>>>>>>>>>>>>> To: Deng, Emily <Emily.Deng at amd.com>; amd-
>>>>>>>>>>>>>>> gfx at lists.freedesktop.org
>>>>>>>>>>>>>>>>>>> Subject: Re: [PATCH] drm/amdgpu: Fix the null pointer
>>>>>>>>>>>> issue for
>>>>>>>>>>>>>>>>>>> tdr
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Am 08.11.19 um 10:39 schrieb Deng, Emily:
>>>>>>>>>>>>>>>>>>>> Sorry, please take your time.
>>>>>>>>>>>>>>>>>>> Have you seen my other response a bit below?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I can't follow how it would be possible for
>>>>>>>>>>>> job->s_fence to be
>>>>>>>>>>>>>>>>>>> NULL without the job also being freed.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> So it looks like this patch is just papering over some
>>>>>>>>>>>> bigger issues.
>>>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Best wishes
>>>>>>>>>>>>>>>>>>>> Emily Deng
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>>>>>>>> From: Koenig, Christian <Christian.Koenig at amd.com>
>>>>>>>>>>>>>>>>>>>>> Sent: Friday, November 8, 2019 5:08 PM
>>>>>>>>>>>>>>>>>>>>> To: Deng, Emily <Emily.Deng at amd.com>; amd-
>>>>>>>>>>>>>>>>> gfx at lists.freedesktop.org
>>>>>>>>>>>>>>>>>>>>> Subject: Re: [PATCH] drm/amdgpu: Fix the null pointer
>>>>>>>>>>>> issue for
>>>>>>>>>>>>>>>>>>>>> tdr
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Am 08.11.19 um 09:52 schrieb Deng, Emily:
>>>>>>>>>>>>>>>>>>>>>> Ping.....
>>>>>>>>>>>>>>>>>>>>> You need to give me at least enough time to wake up
>>>>>>>>>>>>>>>>>>>>> :)
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Best wishes
>>>>>>>>>>>>>>>>>>>>>> Emily Deng
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>>>>>>>>>> From: amd-gfx
>>>>>>>>>>>> <amd-gfx-bounces at lists.freedesktop.org> On
>>>>>>>>>>>>>>> Behalf
>>>>>>>>>>>>>>>>>>>>>>> Of Deng, Emily
>>>>>>>>>>>>>>>>>>>>>>> Sent: Friday, November 8, 2019 10:56 AM
>>>>>>>>>>>>>>>>>>>>>>> To: Koenig, Christian <Christian.Koenig at amd.com>;
>>>>>>>>>>>>>>>>>>>>>>> amd- gfx at lists.freedesktop.org
>>>>>>>>>>>>>>>>>>>>>>> Subject: RE: [PATCH] drm/amdgpu: Fix the null
>>>>>>>>>>>> pointer issue
>>>>>>>>>>>>>>>>>>>>>>> for tdr
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>>>>>>>>>>>>> From: Christian König
>>>>>>>>>>>> <ckoenig.leichtzumerken at gmail.com>
>>>>>>>>>>>>>>>>>>>>>>>> Sent: Thursday, November 7, 2019 7:28 PM
>>>>>>>>>>>>>>>>>>>>>>>> To: Deng, Emily <Emily.Deng at amd.com>;
>>>>>>>>>>>>>>>>>>>>>>>> amd-gfx at lists.freedesktop.org
>>>>>>>>>>>>>>>>>>>>>>>> Subject: Re: [PATCH] drm/amdgpu: Fix the null
>>>>>>>>>>>> pointer issue
>>>>>>>>>>>>>>>>>>>>>>>> for tdr
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Am 07.11.19 um 11:25 schrieb Emily Deng:
>>>>>>>>>>>>>>>>>>>>>>>>> When the job is already signaled, the s_fence is
>>>>>>>>>>>> freed.
>>>>>>>>>>>>>>>>>>>>>>>>> Then it will has null pointer in
>>>>>>>>>>>> amdgpu_device_gpu_recover.
>>>>>>>>>>>>>>>>>>>>>>>> NAK, the s_fence is only set to NULL when the job
>>>>>>>>>>>> is destroyed.
>>>>>>>>>>>>>>>>>>>>>>>> See drm_sched_job_cleanup().
>>>>>>>>>>>>>>>>>>>>>>> I know it is set to NULL in drm_sched_job_cleanup.
>>>>>>>>>>>> But in one
>>>>>>>>>>>>>>>>>>>>>>> case, when it enter into the
>>>>>>>>>>>> amdgpu_device_gpu_recover, it
>>>>>>>>>>>>>>>>>>>>>>> already in drm_sched_job_cleanup, and at this time,
>>>>>>>>>>>> it will
>>>>>>>>>>>>>>>>>>>>>>> go to free
>>>>>>>>>>>>>>>>> job.
>>>>>>>>>>>>>>>>>>>>>>> But the amdgpu_device_gpu_recover sometimes is
>>>>>>>>>>>> faster. At
>>>>>>>>>>>>>>>>>>>>>>> that time, job is not freed, but s_fence is already
>>>>>>>>>>>> NULL.
>>>>>>>>>>>>>>>>>>>>> No, that case can't happen. See here:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> drm_sched_job_cleanup(s_job);
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> amdgpu_ring_priority_put(ring, s_job->s_priority);
>>>>>>>>>>>>>>>>>>>>>> dma_fence_put(job->fence);
>>>>>>>>>>>>>>>>>>>>>> amdgpu_sync_free(&job->sync);
>>>>>>>>>>>>>>>>>>>>>> amdgpu_sync_free(&job->sched_sync);
>>>>>>>>>>>>>>>>>>>>>> kfree(job);
>>>>>>>>>>>>>>>>>>>>> The job itself is freed up directly after freeing the
>>>>>>>>>>>> reference
>>>>>>>>>>>>>>>>>>>>> to the
>>>>>>>>>>>>>>>>> s_fence.
>>>>>>>>>>>>>>>>>>>>> So you are just papering over a much bigger problem
>>>>>>>>>>>> here. This
>>>>>>>>>>>>>>>>>>>>> patch is a clear NAK.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> When you see a job without an s_fence then that
>>>>>>>>>>>> means the
>>>>>>>>>>>>>>>>>>>>>>>> problem is somewhere else.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>>>>>>>>> Christian.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Signed-off-by: Emily Deng <Emily.Deng at amd.com>
>>>>>>>>>>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>>>>>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> |  2
>>>>>>>>>>>>>>>>>>>>>>>>> +- drivers/gpu/drm/scheduler/sched_main.c |
>>>>>>>>>>>>>>>>>>>>>>>>> 11
>>>>>>>>>>>> ++++++---
>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>>>>>>>>>>>> 2 files changed, 7 insertions(+), 6
>>>>>>>>>>>> deletions(-)
>>>>>>>>>>>>>>>>>>>>>>>>> diff --git
>>>>>>>>>>>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>>>>>>>>>>>>>>>>>>>> index e6ce949..5a8f08e 100644
>>>>>>>>>>>>>>>>>>>>>>>>> ---
>>>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>>>>>>>>>>>>>>>>>>>> +++
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>>>>>>>>>>>>>>>>>>>>>> @@ -4075,7 +4075,7 @@ int
>>>>>>>>>>>>>>> amdgpu_device_gpu_recover(struct
>>>>>>>>>>>>>>>>>>>>>>>> amdgpu_device *adev,
>>>>>>>>>>>>>>>>>>>>>>>>> *
>>>>>>>>>>>>>>>>>>>>>>>>>              * job->base holds a reference to
>>>>>>>>>>>> parent fence
>>>>>>>>>>>>>>>>>>>>>>>>> */
>>>>>>>>>>>>>>>>>>>>>>>>> -  if (job && job->base.s_fence->parent &&
>>>>>>>>>>>>>>>>>>>>>>>>> +  if (job && job->base.s_fence &&
>>>>>>>>>>>>>>>>>>>>>>>>> + job->base.s_fence->parent
>>>>>>>>>>>>>>>>>>> &&
>>>>>>>>>>>>>>>>>>>>>>>>> dma_fence_is_signaled(job->base.s_fence-
>>>>> parent))
>>>>>>>>>>>>>>>>>>>>>>>>> job_signaled = true;
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> diff --git
>>>>>>>>>>>>>>>>>>>>>>>>> a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>>>>>>>>>>>>>>>>>>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>>>>>>>>>>>>>>>>>>>> index 31809ca..56cc10e 100644
>>>>>>>>>>>>>>>>>>>>>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>>>>>>>>>>>>>>>>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>>>>>>>>>>>>>>>>>>>>>>>> @@ -334,8 +334,8 @@ void
>>>>>>>>>>>>> drm_sched_increase_karma(struct
>>>>>>>>>>>>>>>>>>>>>>>> drm_sched_job
>>>>>>>>>>>>>>>>>>>>>>>>> *bad)
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> spin_lock(&rq->lock);
>>>>>>>>>>>>>>>>>>>>>>>>> list_for_each_entry_safe(entity, tmp,
>>>>>>>>>>>>>>> &rq-
>>>>>>>>>>>>>>>>>>>> entities,
>>>>>>>>>>>>>>>>>>>>>>>> list) {
>>>>>>>>>>>>>>>>>>>>>>>>> - if
>>>>>>>>>>>> (bad->s_fence->scheduled.context
>>>>>>>>>>>>>>>>>>> ==
>>>>>>>>>>>>>>>>>>>>>>>>> - entity->fence_context) {
>>>>>>>>>>>>>>>>>>>>>>>>> +                          if (bad->s_fence &&
>>>>>>>>>>>>>>>>>>>>>>>>> + (bad->s_fence-
>>>>>>>>>>>>>>>>>>>>>>>>> scheduled.context ==
>>>>>>>>>>>>>>>>>>>>>>>>> + entity->fence_context)) {
>>>>>>>>>>>>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>>>>>>>>>>>> (atomic_read(&bad-
>>>>>>>>>>>>>>>>>>>> karma) >
>>>>>>>>>>>>>>>>>>>>>>>>> bad->sched-
>>>>>>>>>>>>>>>> hang_limit)
>>>>>>>>>>>>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>>>>>>>>>>>> (entity-
>>>>>>>>>>>>>>>> guilty) @@ -376,7 +376,7 @@ void
>>>>>>>>>>>>>>>>>>>>>>>>> drm_sched_stop(struct
>>>>>>>>>>>>>>>>>>> drm_gpu_scheduler
>>>>>>>>>>>>>>>>>>>>>>>> *sched, struct drm_sched_job *bad)
>>>>>>>>>>>>>>>>>>>>>>>>> * This iteration is thread safe as
>>>>>>>>>>>> sched thread
>>>>>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>> stopped.
>>>>>>>>>>>>>>>>>>>>>>>>> */
>>>>>>>>>>>>>>>>>>>>>>>>> list_for_each_entry_safe_reverse(s_job, tmp,
>>>>>>>>>>>>>>>>>>>>>>>>> &sched- ring_mirror_list, node) {
>>>>>>>>>>>>>>>>>>>>>>>>> -          if (s_job->s_fence->parent &&
>>>>>>>>>>>>>>>>>>>>>>>>> +          if (s_job->s_fence &&
>>>>>>>>>>>> s_job->s_fence->parent &&
>>>>>>>>>>>>>>>>>>>>>>>>> dma_fence_remove_callback(s_job-
>>>>>>>>>>>>>>>> s_fence-
>>>>>>>>>>>>>>>>>>>> parent,
>>>>>>>>>>>>>>>>>>>>>>>>> &s_job->cb)) {
>>>>>>>>>>>>>>>>>>>>>>>>> atomic_dec(&sched->hw_rq_count);
>>>>>>>>>>>>>>> @@ -
>>>>>>>>>>>>>>>>>>> 395,7
>>>>>>>>>>>>>>>>>>>>>>> +395,8 @@ void
>>>>>>>>>>>>>>>>>>>>>>>>> drm_sched_stop(struct drm_gpu_scheduler
>>>>>>>>>>>>>>>>>>>>>>>> *sched, struct drm_sched_job *bad)
>>>>>>>>>>>>>>>>>>>>>>>>> *
>>>>>>>>>>>>>>>>>>>>>>>>> * Job is still alive so fence refcount at
>>>>>>>>>>>>>>> least 1
>>>>>>>>>>>>>>>>>>>>>>>>> */
>>>>>>>>>>>>>>>>>>>>>>>>> - dma_fence_wait(&s_job->s_fence->finished,
>>>>>>>>>>>>>>>>>>> false);
>>>>>>>>>>>>>>>>>>>>>>>>> + if (s_job->s_fence)
>>>>>>>>>>>>>>>>>>>>>>>>> + dma_fence_wait(&s_job->s_fence-
>>>>>>>>>>>>>>>>>>>> finished,
>>>>>>>>>>>>>>>>>>>>>>>> false);
>>>>>>>>>>>>>>>>>>>>>>>>> /*
>>>>>>>>>>>>>>>>>>>>>>>>> * We must keep bad job alive for later
>>>>>>>>>>>>>>> use
>>>>>>>>>>>>>>>>>>> during @@
>>>>>>>>>>>>>>>>>>>>>>>> -438,7
>>>>>>>>>>>>>>>>>>>>>>>>> +439,7 @@ void drm_sched_start(struct
>>>>>>>>>>>> drm_gpu_scheduler
>>>>>>>>>>>>>>>>> *sched,
>>>>>>>>>>>>>>>>>>>>>>>>> +bool
>>>>>>>>>>>>>>>>>>>>>>>> full_recovery)
>>>>>>>>>>>>>>>>>>>>>>>>> * GPU recovers can't run in parallel.
>>>>>>>>>>>>>>>>>>>>>>>>>              */
>>>>>>>>>>>>>>>>>>>>>>>>> list_for_each_entry_safe(s_job, tmp,
>>>>>>>>>>>>>>>>>>>>>>>>> &sched->ring_mirror_list,
>>>>>>>>>>>>>>>>>>>>>>>>> node)
>>>>>>>>>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>>>>>>>>>>> - struct dma_fence *fence =
>>>>>>>>>>>> s_job->s_fence->parent;
>>>>>>>>>>>>>>>>>>>>>>>>> + struct dma_fence *fence =
>>>>>>>>>>>> s_job->s_fence ?
>>>>>>>>>>>>>>>>>>>>>>>>> + s_job-
>>>>>>>>>>>>>>>>>>>> s_fence-
>>>>>>>>>>>>>>>>>>>>>>>>> parent :
>>>>>>>>>>>>>>>>>>>>>>>>> +NULL;
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> atomic_inc(&sched->hw_rq_count);
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>>>> amd-gfx mailing list amd-gfx at lists.freedesktop.org
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>>>>>>>>> <https://lists.freedesktop.org/mailman/listinfo/amd-gfx>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> amd-gfx mailing list
>>>>>>>>>>>>> amd-gfx at lists.freedesktop.org
>>>>>>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>>>>>>> _______________________________________________
>>>>>>>>>> amd-gfx mailing list
>>>>>>>>>> amd-gfx at lists.freedesktop.org
>>>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>>>> _______________________________________________
>>>>>> amd-gfx mailing list
>>>>>> amd-gfx at lists.freedesktop.org
>>>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx at lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>