[PATCH 18/22] drm/amdgpu: add flag for high priority contexts v4
Emil Velikov
emil.l.velikov at gmail.com
Fri Mar 3 14:48:07 UTC 2017
On 2 March 2017 at 03:52, Andres Rodriguez <andresx7 at gmail.com> wrote:
>
>
> On 2017-02-28 08:13 PM, Emil Velikov wrote:
>>
>> Hi Andres,
>>
>> There's a couple of nitpicks below, but feel free to address those as
>> follow-up. Considering they're correct of course ;-)
>
>
> As much as I'd like the to let future me deal with those issues, the UAPI
> behavior is something I'd like to get nailed down early and avoid changing.
>
> So any nitpicks here are more than welcome now (better than later :) )
>
>>
>> On 28 February 2017 at 22:14, Andres Rodriguez <andresx7 at gmail.com> wrote:
>>>
>>> Add a new context creation parameter to express a global context
>>> priority.
>>>
>>> Contexts allocated with AMDGPU_CTX_PRIORITY_HIGH will receive higher
>>> priority to scheduler their work than AMDGPU_CTX_PRIORITY_NORMAL
>>> (default) contexts.
>>>
>>> v2: Instead of using flags, repurpose __pad
>>> v3: Swap enum values of _NORMAL _HIGH for backwards compatibility
>>> v4: Validate usermode priority and store it
>>>
>>> Signed-off-by: Andres Rodriguez <andresx7 at gmail.com>
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 +
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 41
>>> +++++++++++++++++++++++----
>>> drivers/gpu/drm/amd/scheduler/gpu_scheduler.h | 1 +
>>> include/uapi/drm/amdgpu_drm.h | 7 ++++-
>>> 4 files changed, 44 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> index e30c47e..366f6d3 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> @@ -664,20 +664,21 @@ struct amdgpu_ctx_ring {
>>> struct amd_sched_entity entity;
>>> };
>>>
>>> struct amdgpu_ctx {
>>> struct kref refcount;
>>> struct amdgpu_device *adev;
>>> unsigned reset_counter;
>>> spinlock_t ring_lock;
>>> struct dma_fence **fences;
>>> struct amdgpu_ctx_ring rings[AMDGPU_MAX_RINGS];
>>> + int priority;
>>> bool preamble_presented;
>>> };
>>>
>>> struct amdgpu_ctx_mgr {
>>> struct amdgpu_device *adev;
>>> struct mutex lock;
>>> /* protected by lock */
>>> struct idr ctx_handles;
>>> };
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>>> index 400c66b..22a15d6 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
>>> @@ -18,47 +18,75 @@
>>> * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>>> * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>>> * OTHER DEALINGS IN THE SOFTWARE.
>>> *
>>> * Authors: monk liu <monk.liu at amd.com>
>>> */
>>>
>>> #include <drm/drmP.h>
>>> #include "amdgpu.h"
>>>
>>> -static int amdgpu_ctx_init(struct amdgpu_device *adev, struct amdgpu_ctx
>>> *ctx)
>>> +static enum amd_sched_priority amdgpu_to_sched_priority(int
>>> amdgpu_priority)
>>> +{
>>> + switch (amdgpu_priority) {
>>> + case AMDGPU_CTX_PRIORITY_HIGH:
>>> + return AMD_SCHED_PRIORITY_HIGH;
>>> + case AMDGPU_CTX_PRIORITY_NORMAL:
>>> + return AMD_SCHED_PRIORITY_NORMAL;
>>> + default:
>>> + WARN(1, "Invalid context priority %d\n",
>>> amdgpu_priority);
>>> + return AMD_SCHED_PRIORITY_NORMAL;
>>> + }
>>> +}
>>> +
>>> +static int amdgpu_ctx_init(struct amdgpu_device *adev,
>>> + uint32_t priority,
>>> + struct amdgpu_ctx *ctx)
>>> {
>>> unsigned i, j;
>>> int r;
>>> + enum amd_sched_priority sched_priority;
>>> +
>>> + sched_priority = amdgpu_to_sched_priority(priority);
>>> +
>>
>> This will trigger dmesg spam on normal user input. I'd keep the WARN
>> in amdgpu_to_sched_priority, but move the function call after the
>> validation of priority.
>> Thinking about it the input validation really belongs in the ioctl -
>> amdgpu_ctx_ioctl().
>>
>
> Agreed.
>
>>> + if (priority >= AMDGPU_CTX_PRIORITY_NUM)
>>> + return -EINVAL;
>>> +
>>> + if (sched_priority < 0 || sched_priority >=
>>> AMD_SCHED_MAX_PRIORITY)
>>> + return -EINVAL;
>>> +
>>> + if (sched_priority == AMD_SCHED_PRIORITY_HIGH &&
>>> !capable(CAP_SYS_ADMIN))
>>
>> This is not obvious neither in the commit message nor the UAPI. I'd
>> suggest adding a comment in the latter.
>
>
> Will do.
>
>> If memory is not failing - high prio will _not_ work with render nodes
>> so you really want to cover and/or explain why.
>
>
> High priority will work fine with render nodes. I'm testing using radv which
> uses render nodes actually.
>
> But I've had my fair share of two bugs canceling each other out. So if you
> do have some insight on why this is the case, let me know.
>
Right - got confused by the CAP_SYS_ADMIN cap. What I meant above is -
why do we need it, does it impose specific workflow, permissions for
the userspace to use high prio ?
This is the type of thing, I think, must be documented in the UAPI header.
>>
>>> + return -EACCES;
>>>
>>> memset(ctx, 0, sizeof(*ctx));
>>> ctx->adev = adev;
>>> + ctx->priority = priority;
>>> kref_init(&ctx->refcount);
>>> spin_lock_init(&ctx->ring_lock);
>>> ctx->fences = kcalloc(amdgpu_sched_jobs * AMDGPU_MAX_RINGS,
>>> sizeof(struct dma_fence*), GFP_KERNEL);
>>> if (!ctx->fences)
>>> return -ENOMEM;
>>>
>>> for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
>>> ctx->rings[i].sequence = 1;
>>> ctx->rings[i].fences = &ctx->fences[amdgpu_sched_jobs *
>>> i];
>>> }
>>>
>>> ctx->reset_counter = atomic_read(&adev->gpu_reset_counter);
>>>
>>> /* create context entity for each ring */
>>> for (i = 0; i < adev->num_rings; i++) {
>>> struct amdgpu_ring *ring = adev->rings[i];
>>> struct amd_sched_rq *rq;
>>>
>>> - rq = &ring->sched.sched_rq[AMD_SCHED_PRIORITY_NORMAL];
>>> + rq = &ring->sched.sched_rq[sched_priority];
>>> r = amd_sched_entity_init(&ring->sched,
>>> &ctx->rings[i].entity,
>>> rq, amdgpu_sched_jobs);
>>> if (r)
>>> goto failed;
>>> }
>>>
>>> return 0;
>>>
>>> failed:
>>> for (j = 0; j < i; j++)
>>> @@ -83,39 +111,41 @@ static void amdgpu_ctx_fini(struct amdgpu_ctx *ctx)
>>> kfree(ctx->fences);
>>> ctx->fences = NULL;
>>>
>>> for (i = 0; i < adev->num_rings; i++)
>>> amd_sched_entity_fini(&adev->rings[i]->sched,
>>> &ctx->rings[i].entity);
>>> }
>>>
>>> static int amdgpu_ctx_alloc(struct amdgpu_device *adev,
>>> struct amdgpu_fpriv *fpriv,
>>> + uint32_t priority,
>>> uint32_t *id)
>>> {
>>> struct amdgpu_ctx_mgr *mgr = &fpriv->ctx_mgr;
>>> struct amdgpu_ctx *ctx;
>>> int r;
>>>
>>> ctx = kmalloc(sizeof(*ctx), GFP_KERNEL);
>>> if (!ctx)
>>> return -ENOMEM;
>>>
>>> mutex_lock(&mgr->lock);
>>> r = idr_alloc(&mgr->ctx_handles, ctx, 1, 0, GFP_KERNEL);
>>> if (r < 0) {
>>> mutex_unlock(&mgr->lock);
>>> kfree(ctx);
>>> return r;
>>> }
>>> +
>>> *id = (uint32_t)r;
>>> - r = amdgpu_ctx_init(adev, ctx);
>>> + r = amdgpu_ctx_init(adev, priority, ctx);
>>> if (r) {
>>> idr_remove(&mgr->ctx_handles, *id);
>>> *id = 0;
>>> kfree(ctx);
>>> }
>>> mutex_unlock(&mgr->lock);
>>> return r;
>>> }
>>>
>>> static void amdgpu_ctx_do_release(struct kref *ref)
>>> @@ -179,32 +209,33 @@ static int amdgpu_ctx_query(struct amdgpu_device
>>> *adev,
>>> ctx->reset_counter = reset_counter;
>>>
>>> mutex_unlock(&mgr->lock);
>>> return 0;
>>> }
>>>
>>> int amdgpu_ctx_ioctl(struct drm_device *dev, void *data,
>>> struct drm_file *filp)
>>> {
>>> int r;
>>> - uint32_t id;
>>> + uint32_t id, priority;
>>>
>>> union drm_amdgpu_ctx *args = data;
>>> struct amdgpu_device *adev = dev->dev_private;
>>> struct amdgpu_fpriv *fpriv = filp->driver_priv;
>>>
>>> r = 0;
>>> id = args->in.ctx_id;
>>> + priority = args->in.priority;
>>>
>> Hmm we don't seem to be doing any in.flags validation - not cool.
>> Someone seriously wants to add that and check the remaining ioctls.
>> At the same time - I think you want to add a flag bit "HAS_PRIORITY"
>> [or similar] and honour in.priority only when that is set.
>>
>> Even if the USM drivers are safe, this will break on a poor soul that
>> is learning how to program their GPU. "My program was running before -
>> I updated the kernel and it no longer does :-("
>
>
> Improving validation of already existing parameters is probably something
> better served in a separate series (so in this case I will take you up on
> the earlier offer)
>
Agreed.
> As far as a HAS_PRIORITY flag goes, I'm not sure it would really be
> necessary. libdrm_amdgpu zeroes their data structures since its first commit
> (0936139), so the change should be backwards compatible with all
> libdrm_amdgpu versions.
>
> If someone is bypassing libdrm_amdgpu and hitting the IOCTLs directly, then
> I hope they know what they are doing. The only apps I've really seen do this
> are fuzzers, and they probably wouldn't care.
>
> I'm assuming we do have the flexibility of relying on the usermode library
> as the point of the backwards compatibility.
>
Since there is no validation in the first place anyone can feed
garbage into the ioctl without knowing that they're doing it wrong.
Afaict the rule tends to be - kernel updates should never break
userspace.
And by the time you realise there is one here, it'll be too late.
An even if we ignore all that for a moment, flags is there exactly for
things like HAS_PRIORITY and adding/managing is trivial.
Then again, if you feel so strongly against ~5 lines of code, so be it :-(
Thanks
Emil
More information about the amd-gfx
mailing list