[PATCH 1/4] drm/amdgpu: add sched sync for amdgpu job
zhoucm1
david1.zhou at amd.com
Wed May 10 09:20:15 UTC 2017
On 2017年05月10日 17:21, Christian König wrote:
> Am 10.05.2017 um 11:00 schrieb zhoucm1:
>>
>>
>> On 2017年05月10日 16:50, Christian König wrote:
>>> Am 10.05.2017 um 10:38 schrieb zhoucm1:
>>>>
>>>>
>>>> On 2017年05月10日 16:26, Christian König wrote:
>>>>> Am 10.05.2017 um 09:31 schrieb Chunming Zhou:
>>>>>> this is an improvement for previous patch, the sched_sync is to
>>>>>> store fence
>>>>>> that could be skipped as scheduled, when job is executed, we
>>>>>> didn't need
>>>>>> pipeline_sync if all fences in sched_sync are signalled,
>>>>>> otherwise insert
>>>>>> pipeline_sync still.
>>>>>>
>>>>>> Change-Id: I26d3a2794272ba94b25753d4bf367326d12f6939
>>>>>> Signed-off-by: Chunming Zhou <David1.Zhou at amd.com>
>>>>>> ---
>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 +
>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 7 ++++++-
>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 5 ++++-
>>>>>> 3 files changed, 11 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>>>> index 787acd7..ef018bf 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>>>> @@ -1162,6 +1162,7 @@ struct amdgpu_job {
>>>>>> struct amdgpu_vm *vm;
>>>>>> struct amdgpu_ring *ring;
>>>>>> struct amdgpu_sync sync;
>>>>>> + struct amdgpu_sync sched_sync;
>>>>>> struct amdgpu_ib *ibs;
>>>>>> struct fence *fence; /* the hw fence */
>>>>>> uint32_t preamble_status;
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>>>>>> index 2c6624d..86ad507 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>>>>>> @@ -121,6 +121,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring
>>>>>> *ring, unsigned num_ibs,
>>>>>> {
>>>>>> struct amdgpu_device *adev = ring->adev;
>>>>>> struct amdgpu_ib *ib = &ibs[0];
>>>>>> + struct fence *tmp;
>>>>>> bool skip_preamble, need_ctx_switch;
>>>>>> unsigned patch_offset = ~0;
>>>>>> struct amdgpu_vm *vm;
>>>>>> @@ -167,8 +168,12 @@ int amdgpu_ib_schedule(struct amdgpu_ring
>>>>>> *ring, unsigned num_ibs,
>>>>>> return r;
>>>>>> }
>>>>>> - if (ring->funcs->emit_pipeline_sync && job &&
>>>>>> job->need_pipeline_sync)
>>>>>> + if (ring->funcs->emit_pipeline_sync && job &&
>>>>>> + (tmp = amdgpu_sync_get_fence(&job->sched_sync))) {
>>>>>> + job->need_pipeline_sync = true;
>>>>>> amdgpu_ring_emit_pipeline_sync(ring);
>>>>>> + fence_put(tmp);
>>>>>> + }
>>>>>> if (vm) {
>>>>>> amdgpu_ring_insert_nop(ring, extra_nop); /* prevent CE
>>>>>> go too fast than DE */
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>>> index cfa97ab..fa0c8b1 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>>>>> @@ -60,6 +60,7 @@ int amdgpu_job_alloc(struct amdgpu_device
>>>>>> *adev, unsigned num_ibs,
>>>>>> (*job)->need_pipeline_sync = false;
>>>>>> amdgpu_sync_create(&(*job)->sync);
>>>>>> + amdgpu_sync_create(&(*job)->sched_sync);
>>>>>> return 0;
>>>>>> }
>>>>>> @@ -98,6 +99,7 @@ static void amdgpu_job_free_cb(struct
>>>>>> amd_sched_job *s_job)
>>>>>> fence_put(job->fence);
>>>>>> amdgpu_sync_free(&job->sync);
>>>>>> + amdgpu_sync_free(&job->sched_sync);
>>>>>> kfree(job);
>>>>>> }
>>>>>> @@ -107,6 +109,7 @@ void amdgpu_job_free(struct amdgpu_job *job)
>>>>>> fence_put(job->fence);
>>>>>> amdgpu_sync_free(&job->sync);
>>>>>> + amdgpu_sync_free(&job->sched_sync);
>>>>>> kfree(job);
>>>>>> }
>>>>>> @@ -154,7 +157,7 @@ static struct fence
>>>>>> *amdgpu_job_dependency(struct amd_sched_job *sched_job)
>>>>>> }
>>>>>> if (amd_sched_dependency_optimized(fence,
>>>>>> sched_job->s_entity))
>>>>>> - job->need_pipeline_sync = true;
>>>>>> + amdgpu_sync_fence(job->adev, &job->sched_sync, fence);
>>>>>
>>>>> This can result in an -ENOMEM
>>>> will handle it.
>>>>> and additional to that we only need to remember the last fence
>>>>> optimized like this, not all of them.
>>>>>
>>>>> So just keep the last one found here in job->sched_fence instead.
>>>> I guess this isn't enough.
>>>> The dependency is not in order when calling, so the last one is not
>>>> always the last scheduled fence.
>>>> And they could be sched fence not hw fence, although they are
>>>> handled by same hw ring, but the sched fence context isn't same.
>>>> so we still need sched_sync here, right?
>>>
>>> No, amdgpu_job_dependency is only called again when the returned
>>> fence is signaled (or scheduled on the same ring).
>> Let use give an example for it:
>> Assume job->sync has two fences(fenceA and fenceB) which could be
>> scheduled. fenceA is from entity1, fenceB is from entity2, but both
>> for gfx engine, but fenceA could be submitted to hw ring behind fenceB.
>> the order in job->sync list is: others---->fenceA---->fenceB--->others.
>> when calling amdgpu_job_dependency, fenceA will be checked first, and
>> then fenceB.
>>
>> If following your proposal, we only store fenceB, but fenceA is the
>> later. Which isn't expected.
>
> Ah! Indeed, I didn't realized that the dependent fence could have
> already been scheduled.
>
> Mhm, how are we going to handle the out of memory situation then? Sine
> we are inside a kernel thread we are not supposed to fail at this point.
like grab vmid failed case, add DRM_ERROR, is it ok?
Regards,
David Zhou
>
> Regards,
> Christian.
>
>>
>>
>> Regards,
>> David Zhou
>>>
>>> So when this is called and you find that you need to wait for
>>> another fence the order is guaranteed.
>>>
>>> Regards,
>>> Christian.
>>>
>>>>
>>>> Regards,
>>>> David zhou
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>> return fence;
>>>>>> }
>>>>>
>>>>>
>>>>
>>>
>>
>
More information about the amd-gfx
mailing list