<div dir="auto"><div><br><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">Christian König <<a href="mailto:christian.koenig@amd.com">christian.koenig@amd.com</a>> ezt írta (időpont: 2023. ápr. 11., Ke 11:23):<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>
Am 11.04.23 um 11:06 schrieb Timur Kristóf:<br>
<blockquote type="cite">
<div dir="auto">
<div><br>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">Bas Nieuwenhuizen <<a href="mailto:bas@basnieuwenhuizen.nl" target="_blank" rel="noreferrer">bas@basnieuwenhuizen.nl</a>>
ezt írta (időpont: 2023. ápr. 11., Ke 10:25):<br>
</div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On Tue,
Apr 11, 2023 at 10:10 AM Christian König<br>
<<a href="mailto:christian.koenig@amd.com" rel="noreferrer noreferrer" target="_blank">christian.koenig@amd.com</a>>
wrote:<br>
><br>
> Am 09.04.23 um 20:59 schrieb Bas Nieuwenhuizen:<br>
> > We need to introduce a new version of the info
struct because<br>
> > libdrm_amdgpu forgot any versioning info in
amdgpu_query_hw_ip_info<br>
> > so the mesa<->libdrm_amdgpu interface
can't handle increasing the<br>
> > size.<br>
> ><br>
> > This info would be used by radv to figure out
when we need to<br>
> > split a submission into multiple submissions.
radv currently has<br>
> > a limit of 192 which seems to work for most gfx
submissions, but<br>
> > is way too high for e.g. compute or sdma.<br>
><br>
> Why do you need so many IBs in the first place?<br>
><br>
> You can use sub-IBs since R600 and newer hw even
supports a JUMP command<br>
> to chain IBs together IIRC.<br>
<br>
Couple of reasons:<br>
<br>
1) We can't reliably use sub-IBs (both because on GFX6-7
there are<br>
indirect draw commands that cannot be used in sub-IBs and
because the<br>
Vulkan API exposes sub-IBs, so we can have the primary IBs
be sub-IBs<br>
already)<br>
</blockquote>
</div>
</div>
<div dir="auto"><br>
</div>
<div dir="auto">Furthermore, only the GFX queue supports the
"IB2" packet that is used to implement sub-IBs.</div>
<div dir="auto"><br>
</div>
<div dir="auto">(The same packet hangs the compute queue and is
documented as not working in the PAL source code. Other queues
don't seem to have a packet for this purpose.)</div>
<div dir="auto"><br>
</div>
<div dir="auto">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">2) We
believed GFX6 may not support chaining. (Then again, it
turns<br>
out the GFX7+ packet may just work on GFX6?<br>
<a href="https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22406" rel="noreferrer noreferrer noreferrer" target="_blank">https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22406</a>)<br>
</blockquote>
</div>
</div>
<div dir="auto"><br>
</div>
<div dir="auto">I was actually quite surprised when I found this
out. Mesa developers seem to have believed that this is not
possible on GFX6. I'd love to get some more context on this,
did it always work or was it added in a FW update?</div>
<div dir="auto"><br>
</div>
<div dir="auto">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">3)
Chaining doesn't work if the IB may be in flight multiple
times in<br>
a different sequence.<br>
</blockquote>
</div>
</div>
<div dir="auto"><br>
</div>
<div dir="auto">Additionally, chaining also doesn't work on any
queue other than GFX and compute. As far as we know, SDMA and
the various encoder/decoder queues don't have a packet for
chaining.</div>
<div dir="auto"><br>
</div>
<div dir="auto">Christian, please let us know if we are wrong
about this.</div>
</div>
</blockquote>
<br>
I'm not an expert for this either. Marek and Pierre-eric know more
about that stuff than me. On the other hand I could ping the
firmware people as well if our UMD guys can't answer that.<br></div></blockquote></div></div><div dir="auto"><br></div><div dir="auto">Thanks, I'm looking forward to hearing more about it.</div><div dir="auto"><br></div><div dir="auto"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div>
<br>
It's just that last time we discussed this internally somebody from
the PAL team commented that this isn't an issue any more because we
don't need that many IBs any more.<br></div></blockquote></div></div><div dir="auto"><br></div><div dir="auto">It is true that the typical game would only submit a few IBs: either 2 (1 preamble + 1 IB that has all the command buffers chained) or 8 for gang submit (4 preambles, 1 ACE IB, 1 GFX IB, 2 postambles), but we would like to ensure that the less common use cases that can't use chaining work well too.</div><div dir="auto"><br></div><div dir="auto">We also have an increased interest in using the other HW queues in RADV these days: video encode/decode queues for the Vulkan Video API, and of course SDMA which we are considering for transfer queues in the future.</div><div dir="auto"><br></div><div dir="auto"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div>
<br>
libdrm defined that you shouldn't use more than 4 IBs in a CS, on
the other hand we dropped checking that long ago and exposing the
numbers of IBs the UMD can use is just good design.<br></div></blockquote></div></div><div dir="auto"><br></div><div dir="auto">Yes, I agree.</div><div dir="auto"><br></div><div dir="auto">Can we remove that old (and confusing) define from libdrm? Or does anyone still depend on that?</div><div dir="auto"><br></div><div dir="auto"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div>
<br>
Bas what do you think of adding an AMDGPU_INFO_MAX_IBS query instead
of extending the drm_amdgpu_info_hw_ip structure?<br>
<br>
Background is that the drm_amdgpu_info_hw_ip structure is actually
not meant to be used for sw parameters (which the maximum number of
IBs is) and we wouldn't need to dance around issues with query size
parameters because that function takes the size as parameter.<br></div></blockquote></div></div><div dir="auto"><br></div><div dir="auto">Sounds good to me but I'll defer to Bas's judgement on this.</div><div dir="auto"><br></div><div dir="auto"><br></div><div dir="auto"><br></div><div dir="auto"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div>
<br>
Regards,<br>
Christian.<br>
<br>
<blockquote type="cite">
<div dir="auto">
<div dir="auto"><br>
</div>
<div dir="auto"><br>
</div>
<div dir="auto">Best regards,</div>
<div dir="auto">Timur</div>
<div dir="auto"><br>
</div>
<div dir="auto"><br>
</div>
<div dir="auto"><br>
</div>
<div dir="auto">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
We try to chain when we can but (2) and (3) are cases
where we<br>
fallback to submitting multiple IBs.<br>
<br>
><br>
> For the kernel UAPI you only need separate IBs if you
have separate<br>
> properties on them, E.g. preamble or submitting to a
different engine.<br>
><br>
> Everything else just needs additional CPU overhead in
the IOCTL.<br>
><br>
> Regards,<br>
> Christian.<br>
><br>
> ><br>
> > Because the kernel handling is just fine we can
just use the v2<br>
> > everywhere and have the return_size do the
versioning for us,<br>
> > with userspace interpreting 0 as unknown.<br>
> ><br>
> > Userspace implementation:<br>
> > <a href="https://gitlab.freedesktop.org/bnieuwenhuizen/drm/-/tree/ib-rejection" rel="noreferrer noreferrer noreferrer" target="_blank">https://gitlab.freedesktop.org/bnieuwenhuizen/drm/-/tree/ib-rejection</a><br>
> > <a href="https://gitlab.freedesktop.org/bnieuwenhuizen/mesa/-/tree/ib-rejection" rel="noreferrer noreferrer noreferrer" target="_blank">https://gitlab.freedesktop.org/bnieuwenhuizen/mesa/-/tree/ib-rejection</a><br>
> ><br>
> > v2: Use base member in the new struct.<br>
> ><br>
> > Link: <a href="https://gitlab.freedesktop.org/drm/amd/-/issues/2498" rel="noreferrer noreferrer noreferrer" target="_blank">https://gitlab.freedesktop.org/drm/amd/-/issues/2498</a><br>
> > Signed-off-by: Bas Nieuwenhuizen <<a href="mailto:bas@basnieuwenhuizen.nl" rel="noreferrer noreferrer" target="_blank">bas@basnieuwenhuizen.nl</a>><br>
> > Cc: David Airlie <<a href="mailto:airlied@gmail.com" rel="noreferrer noreferrer" target="_blank">airlied@gmail.com</a>><br>
> > ---<br>
> > drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 31
++++++++++++++-----------<br>
> > include/uapi/drm/amdgpu_drm.h | 14
+++++++++++<br>
> > 2 files changed, 31 insertions(+), 14
deletions(-)<br>
> ><br>
> > diff --git
a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c<br>
> > index 89689b940493..5b575e1aef1a 100644<br>
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c<br>
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c<br>
> > @@ -360,7 +360,7 @@ static int
amdgpu_firmware_info(struct drm_amdgpu_info_firmware
*fw_info,<br>
> ><br>
> > static int amdgpu_hw_ip_info(struct
amdgpu_device *adev,<br>
> > struct
drm_amdgpu_info *info,<br>
> > - struct
drm_amdgpu_info_hw_ip *result)<br>
> > + struct
drm_amdgpu_info_hw_ip2 *result)<br>
> > {<br>
> > uint32_t ib_start_alignment = 0;<br>
> > uint32_t ib_size_alignment = 0;<br>
> > @@ -431,6 +431,7 @@ static int
amdgpu_hw_ip_info(struct amdgpu_device *adev,<br>
> > return -EINVAL;<br>
> > }<br>
> ><br>
> > + result->max_ibs = UINT_MAX;<br>
> > for (i = 0; i < adev->num_rings;
++i) {<br>
> > /* Note that this uses that ring
types alias the equivalent<br>
> > * HW IP exposes to userspace.<br>
> > @@ -438,6 +439,8 @@ static int
amdgpu_hw_ip_info(struct amdgpu_device *adev,<br>
> > if
(adev->rings[i]->funcs->type ==
info->query_hw_ip.type &&<br>
> >
adev->rings[i]->sched.ready) {<br>
> > ++num_rings;<br>
> > + result->max_ibs =
min(result->max_ibs,<br>
> > +
adev->rings[i]->max_ibs);<br>
> > }<br>
> > }<br>
> ><br>
> > @@ -452,36 +455,36 @@ static int
amdgpu_hw_ip_info(struct amdgpu_device *adev,<br>
> > num_rings =
min(amdgpu_ctx_num_entities[info->query_hw_ip.type],<br>
> > num_rings);<br>
> ><br>
> > - result->hw_ip_version_major =
adev->ip_blocks[i].version->major;<br>
> > - result->hw_ip_version_minor =
adev->ip_blocks[i].version->minor;<br>
> > + result->base.hw_ip_version_major =
adev->ip_blocks[i].version->major;<br>
> > + result->base.hw_ip_version_minor =
adev->ip_blocks[i].version->minor;<br>
> ><br>
> > if (adev->asic_type >= CHIP_VEGA10)
{<br>
> > switch (type) {<br>
> > case AMD_IP_BLOCK_TYPE_GFX:<br>
> > -
result->ip_discovery_version =
adev->ip_versions[GC_HWIP][0];<br>
> > +
result->base.ip_discovery_version =
adev->ip_versions[GC_HWIP][0];<br>
> > break;<br>
> > case AMD_IP_BLOCK_TYPE_SDMA:<br>
> > -
result->ip_discovery_version =
adev->ip_versions[SDMA0_HWIP][0];<br>
> > +
result->base.ip_discovery_version =
adev->ip_versions[SDMA0_HWIP][0];<br>
> > break;<br>
> > case AMD_IP_BLOCK_TYPE_UVD:<br>
> > case AMD_IP_BLOCK_TYPE_VCN:<br>
> > case AMD_IP_BLOCK_TYPE_JPEG:<br>
> > -
result->ip_discovery_version =
adev->ip_versions[UVD_HWIP][0];<br>
> > +
result->base.ip_discovery_version =
adev->ip_versions[UVD_HWIP][0];<br>
> > break;<br>
> > case AMD_IP_BLOCK_TYPE_VCE:<br>
> > -
result->ip_discovery_version =
adev->ip_versions[VCE_HWIP][0];<br>
> > +
result->base.ip_discovery_version =
adev->ip_versions[VCE_HWIP][0];<br>
> > break;<br>
> > default:<br>
> > -
result->ip_discovery_version = 0;<br>
> > +
result->base.ip_discovery_version = 0;<br>
> > break;<br>
> > }<br>
> > } else {<br>
> > - result->ip_discovery_version =
0;<br>
> > +
result->base.ip_discovery_version = 0;<br>
> > }<br>
> > - result->capabilities_flags = 0;<br>
> > - result->available_rings = (1 <<
num_rings) - 1;<br>
> > - result->ib_start_alignment =
ib_start_alignment;<br>
> > - result->ib_size_alignment =
ib_size_alignment;<br>
> > + result->base.capabilities_flags = 0;<br>
> > + result->base.available_rings = (1
<< num_rings) - 1;<br>
> > + result->base.ib_start_alignment =
ib_start_alignment;<br>
> > + result->base.ib_size_alignment =
ib_size_alignment;<br>
> > return 0;<br>
> > }<br>
> ><br>
> > @@ -536,7 +539,7 @@ int amdgpu_info_ioctl(struct
drm_device *dev, void *data, struct drm_file *filp)<br>
> > }<br>
> > return copy_to_user(out,
&ui32, min(size, 4u)) ? -EFAULT : 0;<br>
> > case AMDGPU_INFO_HW_IP_INFO: {<br>
> > - struct drm_amdgpu_info_hw_ip ip =
{};<br>
> > + struct drm_amdgpu_info_hw_ip2 ip =
{};<br>
> > int ret;<br>
> ><br>
> > ret = amdgpu_hw_ip_info(adev,
info, &ip);<br>
> > diff --git a/include/uapi/drm/amdgpu_drm.h
b/include/uapi/drm/amdgpu_drm.h<br>
> > index b6eb90df5d05..6b9e35b6f747 100644<br>
> > --- a/include/uapi/drm/amdgpu_drm.h<br>
> > +++ b/include/uapi/drm/amdgpu_drm.h<br>
> > @@ -1128,6 +1128,9 @@ struct
drm_amdgpu_info_device {<br>
> > __u32 enabled_rb_pipes_mask_hi;<br>
> > };<br>
> ><br>
> > +/* The size of this struct cannot be increased
for compatibility reasons, use<br>
> > + * struct drm_amdgpu_info_hw_ip2 instead.<br>
> > + */<br>
> > struct drm_amdgpu_info_hw_ip {<br>
> > /** Version of h/w IP */<br>
> > __u32 hw_ip_version_major;<br>
> > @@ -1144,6 +1147,17 @@ struct
drm_amdgpu_info_hw_ip {<br>
> > __u32 ip_discovery_version;<br>
> > };<br>
> ><br>
> > +struct drm_amdgpu_info_hw_ip2 {<br>
> > + /** Previous version of the struct */<br>
> > + struct drm_amdgpu_info_hw_ip base;<br>
> > + /** The maximum number of IBs one can
submit in a single submission<br>
> > + * ioctl. (When using gang submit: this is
per IP type)<br>
> > + */<br>
> > + __u32 max_ibs;<br>
> > + /** padding to 64bit for arch differences
*/<br>
> > + __u32 pad;<br>
> > +};<br>
> > +<br>
> > struct drm_amdgpu_info_num_handles {<br>
> > /** Max handles as supported by firmware
for UVD */<br>
> > __u32 uvd_max_handles;<br>
><br>
</blockquote>
</div>
</div>
</div>
</blockquote>
<br>
</div>
</blockquote></div></div></div>