[PATCH 2/3] drm/amdkfd: only keep release_mem function for Hawaii

Thu Nov 7 20:37:17 UTC 2019

After considering Kent's concern, I actually took the consolidation to the next level where v9 and v10 have no divergence. With that, I think the "mustness" is stronger. Please check out the new patch.

Regards,

Yong

On 2019-11-07 3:31 p.m., Kuehling, Felix wrote:
On 2019-11-07 14:40, Zhao, Yong wrote:
Hi Felix,

The code working fine is true except that all not new features after this duplication are broken. If I want to make all GFX10 feature complete, I have to either manually adapt several duplications to the GFX10 file or do this consolidation. From this perspective and ease of my work, it is a must.

"A must" means there is no alternative. You already listed two alternatives yourself: "either manually adapt several duplications to the GFX10 file or do this consolidation."

In _your_ opinion, the consolidation means less work for _you_. That's _your_ point of view. The discussion in this code review pointed out other points of view. When you take all of them into account, you may reconsider what is less work overall, and what is easier to maintain.

I'm not opposing your change per-se. But I'd like you to consider the whole picture, including the consequences of any design decisions you're making and imposing on anyone working on this code in the future. In this cases I think it's a relatively minor issue and it may just come down to a matter of opinion that I don't feel terribly strongly about.

With that said, the change is

Reviewed-by: Felix Kuehling <Felix.Kuehling at amd.com><mailto:Felix.Kuehling at amd.com>

Regards,

  Felix

Regards,
Yong

________________________________
From: Kuehling, Felix <Felix.Kuehling at amd.com><mailto:Felix.Kuehling at amd.com>
Sent: Thursday, November 7, 2019 2:12 PM
To: Zhao, Yong <Yong.Zhao at amd.com><mailto:Yong.Zhao at amd.com>; Alex Deucher <alexdeucher at gmail.com><mailto:alexdeucher at gmail.com>
Cc: Russell, Kent <Kent.Russell at amd.com><mailto:Kent.Russell at amd.com>; amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org> <amd-gfx at lists.freedesktop.org><mailto:amd-gfx at lists.freedesktop.org>
Subject: Re: [PATCH 2/3] drm/amdkfd: only keep release_mem function for Hawaii

On 2019-11-07 13:54, Zhao, Yong wrote:
Hi Kent,

This consolidation is a must, because we should not have duplicated it in the first place.

The code is working fine with the duplication. You disagree with duplicating the code in the first place. But that's just your opinion. It's not a must in any objective sense.

The kernel queue functions by design are generic. The reasson why GFX8 and GFX9 are different is because GFX9 is SOC15 where packet formats and doorbell size changed.  On the other hand, kfd_kernel_queue_v7.c file is pretty much empty by reusing v8 functions, even though it is there. Furthermore, in my opinion kfd_kernel_queue_v7.c should be merged into v8 counterpart, From GFX9 onwards, packet formats should stay the same. For kernel queues, we should be able to differentiate it by pre SOC15 or not, and I have an impression that MEC firmware agrees to maintain the kernel queue interface stable across generations most of time.

OK, you're making assumptions about PM4 packets on future ASIC generations. It's true that the transition to SOC15 with 64-bit doorbells and read/write-pointers was particularly disruptive. Your assumption will hold until it gets broken by some other disruptive change.

For now, if you want clear naming, we could call the GFXv7/8 packet manager functions "pre-SOC15" or "legacy" and the GFXv9/10 and future functions "SOC15". This may work for a while. But I suspect at some point something is going to change and we'll need to create a new version for a newer ASIC generation. You already have a small taste of that with the different TBA-enable bit in the MAP_PROCESS packet in GFXv10.

Regards,

  Felix

Regards,
Yong
________________________________
From: Alex Deucher <alexdeucher at gmail.com><mailto:alexdeucher at gmail.com>
Sent: Thursday, November 7, 2019 1:32 PM
To: Kuehling, Felix <Felix.Kuehling at amd.com><mailto:Felix.Kuehling at amd.com>
Cc: Zhao, Yong <Yong.Zhao at amd.com><mailto:Yong.Zhao at amd.com>; Russell, Kent <Kent.Russell at amd.com><mailto:Kent.Russell at amd.com>; amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org> <amd-gfx at lists.freedesktop.org><mailto:amd-gfx at lists.freedesktop.org>
Subject: Re: [PATCH 2/3] drm/amdkfd: only keep release_mem function for Hawaii

On Thu, Nov 7, 2019 at 12:47 PM Kuehling, Felix <Felix.Kuehling at amd.com><mailto:Felix.Kuehling at amd.com> wrote:
>
> No, please lets not add a new nomenclature for PM4 packet versions. GFX versions are agreed on between hardware, firmware, and software and it's generally understood what they mean. If we add a new PM4 packet versioning scheme on our own, then this will add a lot of confusion when talking to firmware teams.
>
> You know, this would all be more straight forward if we accepted a little bit of code duplication and had packet writing functions per GFX version. You'll see this pattern a lot in the amdgpu driver where each IP version duplicates a bunch of code. In many cases you may be able to save a few lines of code by sharing functions between IP versions. But you'll add some confusion and burden on future maintenance.
>
> This is the price we pay for micro-optimizing minor code duplication.

What we've tried to do in amdgpu is to break out shared code in to
common helpers that are not IP specific and use that in each IP module
(e.g., amdgpu_uvd.c amdgpu_gfx.c, etc.).  Sometimes we can use a
particular chunk of code across multiple generations.  E.g., the uvd
stuff is a good example.  We have shared generic uvd helpers that work
for most UVD IP versions, and then if we need an IP specific version,
we override that in the callbacks with a version specific one.  E.g.,
for the video decode engines we use the generic helpers for a number
of ring functions:

static const struct amdgpu_ring_funcs uvd_v7_0_ring_vm_funcs = {
...
    .test_ring = uvd_v7_0_ring_test_ring,
        .test_ib = amdgpu_uvd_ring_test_ib,
        .insert_nop = uvd_v7_0_ring_insert_nop,
        .pad_ib = amdgpu_ring_generic_pad_ib,
        .begin_use = amdgpu_uvd_ring_begin_use,
        .end_use = amdgpu_uvd_ring_end_use,
...
};

while we override more of them for the video encode engines:

static const struct amdgpu_ring_funcs uvd_v7_0_enc_ring_vm_funcs = {
...
    .test_ring = uvd_v7_0_enc_ring_test_ring,
        .test_ib = uvd_v7_0_enc_ring_test_ib,
        .insert_nop = amdgpu_ring_insert_nop,
    .insert_end = uvd_v7_0_enc_ring_insert_end,
        .pad_ib = amdgpu_ring_generic_pad_ib,
        .begin_use = amdgpu_uvd_ring_begin_use,
        .end_use = amdgpu_uvd_ring_end_use,
...
};

But still maintain IP specific components.

Alex

>
> Regards,
>   Felix
>
> On 2019-11-07 12:39, Zhao, Yong wrote:
>
> Hi Kent,
>
> I can't agree more on this. Also, the same applies to the file names. Definitely we need to agree on the naming scheme before making it happen.
>
> Yong
>
> On 2019-11-07 12:33 p.m., Russell, Kent wrote:
>
> I think that the versioning is getting a little confusing since we’re using the old GFX versions, but not really sticking to it due to the shareability of certain managers and shaders. Could we look into doing something like gen1 or gen2, or some other more ambiguous non-GFX-related versioning? Otherwise we’ll keep having these questions of “why is Hawaii GFX8”, “why is Arcturus GFX9”, etc. Then if things change, we just up the value concretely, instead of maybe doing a v11 if GFX11 changes things, and only GFX11 ASICs use those functions/variables.
>
>
>
> Obviously not high-priority, but maybe something to consider as you continue to consolidate and remove duplicate code.
>
>
>
> Kent
>
>
>
> From: amd-gfx <amd-gfx-bounces at lists.freedesktop.org><mailto:amd-gfx-bounces at lists.freedesktop.org> On Behalf Of Zhao, Yong
> Sent: Thursday, November 7, 2019 11:57 AM
> To: Kuehling, Felix <Felix.Kuehling at amd.com><mailto:Felix.Kuehling at amd.com>; amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org>
> Subject: Re: [PATCH 2/3] drm/amdkfd: only keep release_mem function for Hawaii
>
>
>
> Hi Felix,
>
>
>
> That's because v8 and v7 share the same packet_manager_funcs. In this case, it is better to keep it as it is.
>
>
>
> Regards,
>
> Yong
>
> ________________________________
>
> From: Kuehling, Felix <Felix.Kuehling at amd.com><mailto:Felix.Kuehling at amd.com>
> Sent: Wednesday, November 6, 2019 11:45 PM
> To: Zhao, Yong <Yong.Zhao at amd.com><mailto:Yong.Zhao at amd.com>; amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org> <amd-gfx at lists.freedesktop.org><mailto:amd-gfx at lists.freedesktop.org>
> Subject: Re: [PATCH 2/3] drm/amdkfd: only keep release_mem function for Hawaii
>
>
>
> On 2019-10-30 20:17, Zhao, Yong wrote:
> > release_mem won't be used at all on GFX9 and GFX10, so delete it.
>
> Hawaii was GFXv7. So we're not using the release_mem packet on GFXv8
> either. Why arbitrarily limit this change to GFXv9 and 10?
>
> Regards,
>    Felix
>
> >
> > Change-Id: I13787a8a29b83e7516c582a7401f2e14721edf5f
> > Signed-off-by: Yong Zhao <Yong.Zhao at amd.com><mailto:Yong.Zhao at amd.com>
> > ---
> >   .../gpu/drm/amd/amdkfd/kfd_kernel_queue_v10.c | 35 ++-----------------
> >   .../gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c  | 33 ++---------------
> >   2 files changed, 4 insertions(+), 64 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v10.c b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v10.c
> > index aed32ab7102e..bfd6221acae9 100644
> > --- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v10.c
> > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v10.c
> > @@ -298,37 +298,6 @@ static int pm_query_status_v10(struct packet_manager *pm, uint32_t *buffer,
> >        return 0;
> >   }
> >
> > -
> > -static int pm_release_mem_v10(uint64_t gpu_addr, uint32_t *buffer)
> > -{
> > -     struct pm4_mec_release_mem *packet;
> > -
> > -     WARN_ON(!buffer);
> > -
> > -     packet = (struct pm4_mec_release_mem *)buffer;
> > -     memset(buffer, 0, sizeof(struct pm4_mec_release_mem));
> > -
> > -     packet->header.u32All = pm_build_pm4_header(IT_RELEASE_MEM,
> > -                                     sizeof(struct pm4_mec_release_mem));
> > -
> > -     packet->bitfields2.event_type = CACHE_FLUSH_AND_INV_TS_EVENT;
> > -     packet->bitfields2.event_index = event_index__mec_release_mem__end_of_pipe;
> > -     packet->bitfields2.tcl1_action_ena = 1;
> > -     packet->bitfields2.tc_action_ena = 1;
> > -     packet->bitfields2.cache_policy = cache_policy__mec_release_mem__lru;
> > -
> > -     packet->bitfields3.data_sel = data_sel__mec_release_mem__send_32_bit_low;
> > -     packet->bitfields3.int_sel =
> > -             int_sel__mec_release_mem__send_interrupt_after_write_confirm;
> > -
> > -     packet->bitfields4.address_lo_32b = (gpu_addr & 0xffffffff) >> 2;
> > -     packet->address_hi = upper_32_bits(gpu_addr);
> > -
> > -     packet->data_lo = 0;
> > -
> > -     return sizeof(struct pm4_mec_release_mem) / sizeof(unsigned int);
> > -}
> > -
> >   const struct packet_manager_funcs kfd_v10_pm_funcs = {
> >        .map_process                    = pm_map_process_v10,
> >        .runlist                        = pm_runlist_v10,
> > @@ -336,13 +305,13 @@ const struct packet_manager_funcs kfd_v10_pm_funcs = {
> >        .map_queues                     = pm_map_queues_v10,
> >        .unmap_queues                   = pm_unmap_queues_v10,
> >        .query_status                   = pm_query_status_v10,
> > -     .release_mem                    = pm_release_mem_v10,
> > +     .release_mem                    = NULL,
> >        .map_process_size               = sizeof(struct pm4_mes_map_process),
> >        .runlist_size                   = sizeof(struct pm4_mes_runlist),
> >        .set_resources_size             = sizeof(struct pm4_mes_set_resources),
> >        .map_queues_size                = sizeof(struct pm4_mes_map_queues),
> >        .unmap_queues_size              = sizeof(struct pm4_mes_unmap_queues),
> >        .query_status_size              = sizeof(struct pm4_mes_query_status),
> > -     .release_mem_size               = sizeof(struct pm4_mec_release_mem)
> > +     .release_mem_size               = 0,
> >   };
> >
> > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c
> > index 3b5ca2b1d7a6..f0e4910a8865 100644
> > --- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c
> > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c
> > @@ -336,35 +336,6 @@ static int pm_query_status_v9(struct packet_manager *pm, uint32_t *buffer,
> >        return 0;
> >   }
> >
> > -
> > -static int pm_release_mem_v9(uint64_t gpu_addr, uint32_t *buffer)
> > -{
> > -     struct pm4_mec_release_mem *packet;
> > -
> > -     packet = (struct pm4_mec_release_mem *)buffer;
> > -     memset(buffer, 0, sizeof(struct pm4_mec_release_mem));
> > -
> > -     packet->header.u32All = pm_build_pm4_header(IT_RELEASE_MEM,
> > -                                     sizeof(struct pm4_mec_release_mem));
> > -
> > -     packet->bitfields2.event_type = CACHE_FLUSH_AND_INV_TS_EVENT;
> > -     packet->bitfields2.event_index = event_index__mec_release_mem__end_of_pipe;
> > -     packet->bitfields2.tcl1_action_ena = 1;
> > -     packet->bitfields2.tc_action_ena = 1;
> > -     packet->bitfields2.cache_policy = cache_policy__mec_release_mem__lru;
> > -
> > -     packet->bitfields3.data_sel = data_sel__mec_release_mem__send_32_bit_low;
> > -     packet->bitfields3.int_sel =
> > -             int_sel__mec_release_mem__send_interrupt_after_write_confirm;
> > -
> > -     packet->bitfields4.address_lo_32b = (gpu_addr & 0xffffffff) >> 2;
> > -     packet->address_hi = upper_32_bits(gpu_addr);
> > -
> > -     packet->data_lo = 0;
> > -
> > -     return 0;
> > -}
> > -
> >   const struct packet_manager_funcs kfd_v9_pm_funcs = {
> >        .map_process            = pm_map_process_v9,
> >        .runlist                = pm_runlist_v9,
> > @@ -372,12 +343,12 @@ const struct packet_manager_funcs kfd_v9_pm_funcs = {
> >        .map_queues             = pm_map_queues_v9,
> >        .unmap_queues           = pm_unmap_queues_v9,
> >        .query_status           = pm_query_status_v9,
> > -     .release_mem            = pm_release_mem_v9,
> > +     .release_mem            = NULL,
> >        .map_process_size       = sizeof(struct pm4_mes_map_process),
> >        .runlist_size           = sizeof(struct pm4_mes_runlist),
> >        .set_resources_size     = sizeof(struct pm4_mes_set_resources),
> >        .map_queues_size        = sizeof(struct pm4_mes_map_queues),
> >        .unmap_queues_size      = sizeof(struct pm4_mes_unmap_queues),
> >        .query_status_size      = sizeof(struct pm4_mes_query_status),
> > -     .release_mem_size       = sizeof(struct pm4_mec_release_mem)
> > +     .release_mem_size       = 0,
> >   };
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org<mailto:amd-gfx at lists.freedesktop.org>
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20191107/37d1f33f/attachment-0001.html>