Input and games.

Mon Apr 22 12:32:50 PDT 2013

On Mon, Apr 22, 2013 at 1:40 PM, Pekka Paalanen <ppaalanen at gmail.com> wrote:

>>     Gamepads, by contrast, are all mostly the same these days, much
>> like mice.  You can find oddball ones like that PC gamepad that was up
>> on Kickstarter recently which had a trackball in place of the right
>> thumb stick, but the core gamepad is now every bit as standardized as
>> the core mouse.
>
> Alright, do you really mean that the controls are as standard as
> mouse buttons and wheels, and we would not need a per-device-model
> database? If so, then sure, a mouse-like Wayland protocol would
> indeed be possible.

    What I mean is that in practice, the difference between game
controllers is almost entirely two things; which particular bit in the
button mask gets set/cleared by any particular button, and which axis
maps to which control.  Right now (unless things have changed), for
example, if you plug in an xbox 360 controller:

- left stick is axis (0, 1)
- left trigger is axis 2
- right stick is axis (3, 4)
- right trigger is axis 5
- dpad is axis (6, 7)

    I had to determine that by logging values and playing with the
controls to see what made what numbers move.  The xbox controller (ie:
not 360) may be the same order, it may not be.  The Dual Shock
controller may be the same order, it likely isn't.  So unless we're
really lucky something has to convert the axis ordering to a canonical
form.

    Likewise, the buttons are just indexed, since as far as I can tell
without cracking open the code, JSGetButtonState() is just:

return (buttons & (1 << index));

    I'd vastly prefer keysyms here; I don't want to have to go look up
which button is START on this controller, or have to figure out which
index is the bottom right face button.

    So, some layer needs to translate buttons to keysyms, and adjust
the axis ordering (and possibly scaling) to fit the canonical
controller model, which I would suggest essentially be two analog
sticks, two analog triggers, plus keys (where the four dpad directions
are keys).  The translation layer needn't be very complex; as long as
there's some way to query evdev or the underlying system to find out
exactly what kind of device this is, it's a simple matter of per-axis
scale, offset and index translation (ie: scale this axis by -0.5f, add
1.0f, map to left trigger) and a list of bit to keysym lookups.

    So, in terms of hardware capabilities, there is very much a
standard.  In terms of how that hardware is presented to the system
over USB, the same data is all there, but your guess is as good as
mine with regards to ordering.  Which is the problem I'd really like
to see solved.

>>     Hmm.  Who would I talk to about getting this started?
>
> I'm not sure. If you're looking for volunteers, just throwing the
> idea out in public is a start, but to have some chances of
> succeeding, you probably need to start the work yourself, or pay
> someone to do it. If it turns out good, other projects might start
> using it, and also contributing.
>
> But as per above, maybe we really don't need it?

    I'm hoping not.  If it is needed, I think it's going to have to
sit between Wayland and evdev; part of the point of this (to me, at
least) is to isolate the game from the kind of permissions that you
require to open evdev devices.  Or for that matter, isolate the player
from having to edit their udev rules just to get a gamepad working.

> Looking at Weston, it seems to do the extra effort to ensure that
> it does not send repeats to clients.

    Excellent.

>>     I would have thought that for pointer warping specifically that
>> it's one of those cases where the right thing to do is have a server
>> round-trip; the program with focus requests pointer warp, and the
>> server comes back either with a mouse move event that has been
>> suitably flagged, or with a warp event.  Aside from anything else,
>> that means:
>>
>> - the warp event isn't official until it has been blessed
>> - the warp event can be timestamped sanely
>> - the server has the option to modify or reject the warp
>
> Still, I think would be problematic. If a client continuously warps
> the pointer to the middle of its window, getting away from that
> would be difficult, and I can't see any heuristic the compositor
> could use to prevent that.
>
> Granted, it's not that different from pointer lock. I just believe
> that arbitrary pointer warping is disrupting to a user, and we need
> to limit it to special cases, like pointer lock. Even just
> requiring keyboard focus to be able to warp goes a long way.

    That's the thing; as long as the user has a way of taking focus
away from the program (alt-tab or whatever), the malicious things you
can do with pointer warping are *exactly* the malicious things you can
do with pointer lock, and they can be escaped/broken the same way.

    Personally, I can't stand things like warp-cursor-to-popup-window;
it drives me nuts, and often as not it steals focus when I'm in the
middle of something and misuses my input.  That kind of pointer
warping can die unloved.

    I see the utility of pointer warp as being for subtle effects that
make gui toolkits feel better; edge resistance, moderating movement
speed on sliders and scroll bar thumbs, even little things like (say)
having a color wheel where the mouse pointer moves at half speed for
extra precision *if* the user is holding down the shift key.

    You get two major things from this kind of pointer warping.  One
is that it lets you modulate the pointer speed so the behavior of the
pointer is relative to the control its working on; for the scroll bar
example, giant documents should *feel* like they are giant, because
that simple feedback tells you things directly about what you are
manipulating.  The other thing is related; the control over the
pointer speed means things become possible that otherwise aren't; with
the warping scroll bar thumb I described elsewhere, you can use the
thumb to position the view anywhere on a document.  Without it, your
scrollbar thumb steps are quantized to the pixel range of the thumb as
drawn in the window, which means that with larger documents you can
only do gross positioning with the thumb and then must switch
controls.

> Let's see if we could solve the issues better than relying on
> general pointer warping.

    I did mention elsewhere that the problem domain I described could
also be solved with pointer speed modulation; if you could say "user
has grabbed scroll thumb for loooong document, scale pointer speed by
0.25", "player has released scroll thumb, remove pointer speed scale
factor".

    That would solve the problem, and would be something you could
make part of the program state like pointer locking; it only applies
in the specific program's window, and only when that window has focus.

> It is always good to have professional opinions. I'm not sure if we
> have had any game developers looking into things before you.

    I kind of thought that might be the case; that's part of the
reason I'm being as verbose as I am.  That, and a natural tendency to
start lecturing people at the drop of a hat.  :)

    If you have any questions (input-related or not), ask away; there
are plenty of other game devs who are as qualified to answer these
questions as I am, and they may well have different opinions, but with
that caveat in mind I'll answer any questions I can.

>>     Ok.  Is there a policy on what happens to input during focus
>> changes?  For example, say I have two windows, 1 and 2, with focus on
>> 1.  I hold down shift-a, move focus to window 2, release shift, move
>> focus back to window 1, release a.  What events does window 1 receive?
>>  How about window 2?
>
> Yes. This is from the top of my head. When a window loses keyboard
> focus, it should just stop doing anything with the last keyboard
> state it got, since it doesn't have the keyboard anymore. A window
> that receives keyboard focus should reset its keyboard state, and
> use the new initial state transmitted with the keyboard enter event.

    Ok, that sounds sane.  On some other systems there has been fun
with stray keyup events and the like leaking in when focus arrives.

>>     Interesting.  I'm less sold on changing screen resolution these
>> days now that LCDs have largely replaced multisync monitors, but
>> that's still kind of neat.
>
> Indeed, but you can still choose between GPU and panel scaling,
> should scaling be needed. And of course centering without scaling.
> GPU scaling may or may not imply a composition operation, which is
> essentially a copy. The other options can be zero-copy paths, if
> the client provides buffers suitable for scanout (we achieve that
> basically via EGL).

    That sounds reasonable.

>>     In iOS, at least at for touch stuff, they have "cancel" events,
>> which are basically "You know that last event I sent you?  Uhh... it
>> turns out it was part of a gesture, and got eaten by it.  Pretend it
>> never happened.".
>
> Yes, touch protocols are quite different to conventional input
> protocols, and I'm not too familiar with them.

    Touch is a little weird, for two main reasons; lack of context,
and gestures.

    Gestures are a little bit hairy because you have an ugly choice to
make; do you want to have horribly laggy input, or do you want to have
input with do-overs?  Just about everyone goes with do-overs; it's
more of a hassle for the programs dealing with the input (since they
might have to undo actions or forget things), but the other method has
unacceptable lag on all touch input.

    As an example of what I mean, consider something with a mouse
analogy; the double click.  The equivalent in the touch world is the
double tap, and it has roughly the same problem; how long do you wait
before you decide that something was a double tap?  If you deliver the
first tap immediately, you might have to come back later and say
"oops, that was really a double tap, forget about that previous single
tap I told you about, it never happened...", but if you wait until
you're certain what the user did, you can't deliver a single tap until
you've timed out on detecting a double tap (and a triple tap, and a
long press, and...).  With mice, it's not so bad, because people are
pretty good at double clicking those microswitch buttons fast; most
people can manage a double click in a significant fraction of a
second.

    Double tap requires more physical movement, and more effort; the
microswitch on a mouse pushes your finger back, but with touch you
have to physically pull your finger back before making the second tap.
 With double tap we're talking about detection hang times of around a
second, which is *hideous* lag for single tap; as I said elsewhere, in
HCI class they told me 150ms was the point where people started
noticing keyboard lag on VT102s, so waiting a second on every tap to
see if it's a double tap (especially when combined with the
significant fraction of a second lag most capacitive touch panels seem
to have) is just untenable.

    So, do-overs are the norm.  You can see why even more when you get
into detecting swipes and the like.

    The lack of context problem is that touches are either there, or
not there.  There is no "click", no "right click", nothing but there
or not.  And you can't even be totally sure *what* is there; if you
take two fingers on a touch panel, move them right together and then
move them apart again, at the hardware level what the panel sees is
two points moving together, one of the points going away, a new point
appearing, and the two points moving apart.  It can't tell that the
"new" touch is related to the "old" one, nor can it tell which of the
two original touches disappeared.  Once they're all off the screen,
there's no pointer per se.  It's not like a mouse where the cursor is
always somewhere.

>> >>     If the events are just coming in as a pile in 60Hz ticks, it's all
>> >> good and we can do everything we need to.  If they're coming in as a
>> >> pile at 10Hz ticks, it's going to be difficult to make anything more
>> >> active than Solitaire.
>> >
>> > Yes as far as I understand, currently input events are sent to Wayland
>> > clients as a burst at every compositor repaint cycle, which happens at
>> > the monitor refresh rate, so for a 60 Hz monitor, you would be getting
>> > them in bursts at 60 Hz.
>>
>>     That's as good as we get from consoles, and besides, there isn't
>> much point in most games in running your simulator at a higher
>> frequency than the refresh rate.
>
> What about these hardcore fps-gamers who want at least 200+ frames
> per second, or they can't win? :-)

    If you're sending me vsync at 200Hz, I'll gladly update my
simulation at 200Hz and chew input at 200Hz.  :)

    Most players are playing on 60Hz refresh monitors, and those LCD
monitors have enough lag on them that it really doesn't matter if the
simulation ticks are happening (and eating input) faster than that.
Even if you react at the speed of light (literally), you're
interacting with data where he simulation probably started at least
50ms ago, got fed to the GPU 33ms ago, streamed out to the monitor
17ms ago, and spent at least a frame (at least, possibly many more if
we're talking about an LCD TV) cooking in the monitor's circuitry.

    We're not making games twitchy enough to appeal to the vsync-off
crowd, but I think especially now with the way that LCD monitors
actually work the vsync-off crowd is basically asking for better
racing tyres for their fighter jet.

> That is something I have never understood, apart from game engine
> bugs where frame rate affected the physics simulation outcome like
> allowing to jump higher.

    I think it's partly that, though these days physics engines are a
lot more robust than they used to be (to be fair, the slight increase
in CPU horsepower and memory over the past couple of decades might
have had a minor hand there, not to mention ubiquitous availability of
accelerated floating point hardware).  I think it's also partly the
"moar fastur" thing, which doesn't really make sense any more because
the system is so much more buffered and mediated now than it was ten
years ago.

>>     That's still probably good enough.  If you've got input data sets
>> coming in at the frame rate of the display, and you know that the data
>> is internally ordered, I think there's enough to work with.  We're
>> essentially talking about an ordered set of deltas or positions
>> covering a 16.67ms snapshot of time.  The timeframe isn't long enough
>> that the intersample time provides any useful nuance, at least not
>> with input from humans.
>
> Are you sure about that? Sorry for insisting, but this is the first
> time I can ask a game professional about that. :-)

    Well, bear in mind I'm as much from a console background as a PC
game background, and the consoles all run 60Hz (some of them used to
run 50Hz in some regions, but I *think* that's no longer true?  I've
been dealing with handheld and mobile a lot lately where it's all 60Hz
refresh).

    The only machines that really have the option to run faster than
60Hz are PCs.  There are some games that will presumably take input
just as fast as you can feed it to them, so I'd still consider
supporting the extension; someone might find a use for it, or someone
might be able to incorporate input into their simulation with the
timestamps properly respected.  Most of the game world runs just fine
on 60Hz ticks.

                                          Todd.

--
 Todd Showalter, President,
 Electron Jump Games, Inc.