[PATCH v2 00/25] AMDKFD kernel driver
Oded Gabbay
oded.gabbay at amd.com
Tue Jul 22 01:10:58 PDT 2014
On 22/07/14 10:23, Daniel Vetter wrote:
> On Mon, Jul 21, 2014 at 10:23:43PM +0300, Oded Gabbay wrote:
>> But Jerome, the core problem still remains in effect, even with your
>> suggestion. If an application, either via userspace queue or via ioctl,
>> submits a long-running kernel, than the CPU in general can't stop the
>> GPU from running it. And if that kernel does while(1); than that's it,
>> game's over, and no matter how you submitted the work. So I don't really
>> see the big advantage in your proposal. Only in CZ we can stop this wave
>> (by CP H/W scheduling only). What are you saying is basically I won't
>> allow people to use compute on Linux KV system because it _may_ get the
>> system stuck.
>>
>> So even if I really wanted to, and I may agree with you theoretically on
>> that, I can't fulfill your desire to make the "kernel being able to
>> preempt at any time and be able to decrease or increase user queue
>> priority so overall kernel is in charge of resources management and it
>> can handle rogue client in proper fashion". Not in KV, and I guess not
>> in CZ as well.
>
> At least on intel the execlist stuff which is used for preemption can be
> used by both the cpu and the firmware scheduler. So we can actually
> preempt when doing cpu scheduling.
>
> It sounds like current amd hw doesn't have any preemption at all. And
> without preemption I don't think we should ever consider to allow
> userspace to directly submit stuff to the hw and overload. Imo the kernel
> _must_ sit in between and reject clients that don't behave. Of course you
> can only ever react (worst case with a gpu reset, there's code floating
> around for that on intel-gfx), but at least you can do something.
>
> If userspace has a direct submit path to the hw then this gets really
> tricky, if not impossible.
> -Daniel
>
Hi Daniel,
See the email I just sent to Jerome regarding preemption. Bottom line, in KV, we
can preempt running queues, except from the case of a stuck gpu kernel. In CZ,
this was solved.
So, in this regard, I don't think there is any difference between userspace
queues and ioctl.
Oded
More information about the dri-devel
mailing list