Input and games.

Mon Apr 22 00:41:39 PDT 2013

Hi Todd,

Jonas Kulla already replied on several items, but it's easier for
me to comment on everything I have something to say, so pardon if I
repeat some things.

On Fri, 19 Apr 2013 12:31:19 -0400
Todd Showalter <todd at electronjump.com> wrote:

> On Fri, Apr 19, 2013 at 5:18 AM, Pekka Paalanen <ppaalanen at gmail.com> wrote:
> 
> > I am going to reply from the Wayland protocol point of view, and what
> > Wayland explicitly can (and must) do for you. This is likely much lower
> > level than what a game programmer would like to use. How SDL or some
> > other higher level library exposes input is a different matter, and I
> > will not comment on that. We just want to make everything possible on
> > the Wayland protocol level.
> 
>     That's fair.  We don't use SDL in our projects, so I'm coming at
> this partly from the point of view of someone who will be operating at
> the protocol level.

The protocol level is not convenient for an application developer in
some cases, and it's not even meant to be. We explicitly leave lots of
processing for a so-called toolkit library. At several points below I
say that something is out of the scope of libwayland-client, and my
above comment was just a fore-warning about that. :-)

> > I do not think we can happily let client applications open input devices
> > themselves, so this is clearly a thing we need to improve on. In other
> > words, I believe we should come up with a protocol extension where the
> > server opens the input devices, and either passes the file descriptor to
> > a client, or the server translates evdev events into Wayland protocol
> > events. "How" and "what" are still open questions, as is every other
> > detail of input devices that are not keyboards, mice, or touchscreens.
> 
>     This is certainly what I'd prefer, personally, whether it's a
> file-descriptor based system, event messaging, or polling functions.
> It would be really nice to get gamepads and the like in there, if
> possible.
> 
> > There was once some talk about "raw input event protocol", but there is
> > not even a sketch of it, AFAIK.
> 
>     I'm not familiar enough with Wayland yet to take the lead on
> something like that, but I can certainly help.
> 
> >>     It would be really nice if there was some sort of configuration
> >> that could be read so we'd know how the player wanted these things
> >> mapped, and some sort of way for the player to set that configuration
> >> up outside the game.
> >
> > Right, and whether this could be a Wayland thing or not, depends on the
> > above, how to handle misc input devices in general.
> >
> > Keyboards already have extensive mapping capabilities. A Wayland server
> > sends keycodes (I forget in which space exactly) and a keymap, and
> > clients feed the keymap and keycodes into libxkbcommon, which
> > translates them into something actually useful. Maybe something similar
> > could be invented for game controllers? But yes, this is off-topic for
> > Wayland, apart from the protocol of what event codes and other data to
> > pass.
> 
>     Fair enough.

In other emails, it seems you are really looking for a mapping
library for gamepads and joysticks, at least for the usual devices.
While the Wayland protocol should support this, I do not think it
is in scope for Wayland to actually define it. As with keyboards,
the Wayland protocol allows passing keymaps around, and one type of
keymaps is what xkbcommon uses. xkbcommon then actually defines the
mappings, symbols, some state tracking, etc. which Wayland does not.

Such a mapping library and standard should be started as a separate
project, initially building on top of evdev directly. When that
works, we can come up with the Wayland protocol extension to
support that.

> > Wayland protocol in event driven. Polling does not make sense, since it
> > would mean a synchronous round-trip to the server, which for something
> > like this is just far too expensive, and easily (IMHO) worked around.
> >
> > So, you have to maintain input state yourself, or by a library you use.
> > It could even be off-loaded to another thread.
> 
>     This is what we do now, essentially; accumulate the incoming
> events to assemble each frame's input device state.  It would be
> convenient if Wayland did it for us, but obviously we're already
> operating this way on X11, Win32 and OSX.

If by Wayland here you mean libwayland-client, then no, it is out
of scope. libwayland-client is only a protocol binding for C, more
alike libxcb than xlib, if I have understood them right. It does
not keep state; that is intended for higher level libraries or
toolkits.

> > There is also a huge advantage over polling: in an event driven design,
> > it is impossible to miss very fast, transient actions, which polling
> > would never notice. And whether you need to know if such a transient
> > happened, or how many times is happened, or how long time each
> > transient took between two game ticks, is all up to you and available.
> 
>     In truth, we don't usually deal with pure polling at the low level
> unless it's a game where we can guarantee that we're not going to drop
> frames.  Even then, things like mouse, touch or stylus input can come
> in way faster than vsync, and game simulation ticks are usually (for
> relatively obvious reasons) timed to vsyncs.

Ok, I misunderstood then. By polling you meant continuously
accumulating state and then inspecting it atomically at arbitrary
times. This is out of the scope of Wayland (libwayland-client), as far
as I know.

>     In our engine, the input system has several parts, collected in a
> per-player virtualized input structure.  It contains:
> 
> - analog axis
>   - previous position
>   - current position
>   - delta (current - prev)
>   - array of positions used to generate this frame's data
> 
> - buttons
>   - previous frame state bitmap (1 bit per key/button)
>   - current frame state bitmap
>   - trigger bitmap (cur & ~prev)
>   - release bitmap (prev & ~cur)
>   - byte map of presses
> 
>     If a key/button event was received since the last update, that key
> or button is left down for at least one update, even if it went up
> again before the snapshot went out.  If the game cares how many times
> a button or key was pressed between updates, it can look the key up in
> the byte map rather than the bitmap.
> 
>     Likewise, while accumulated position/delta is usually good enough
> for mouse/touch/stylus input and almost always good enough for
> joystick input, there are times when you want to do things like
> gesture recognition where it really pays to have the data at the
> finest possible resolution.  Most parts of the game won't care, but
> the data is there if it's needed.

Right, but as far as Wayland is concerned, we just want to make all
this possible, but not necessarily high level enough to be
straightforward to use. We are mostly interested in the protocol.
Offering high level APIs on top of it is for other projects. Different
applications want different things from the higher level API, so you
would probably need some impedance matching anyway. That includes also
data structures: self-invented, glib, Qt, ... which again need some
wrapping to turn into your own data structures, unless your application
is already built on them.

>     Which reminds me; it would be extremely useful to be able to shut
> off key repeat for a specific client (ie: a game) without having to
> shut it off globally.

I believe key repeat is implemented client-side, so there is
nothing to switch off. I think whether a key repeats or not depends
also on the keymap, which the server does not process on clients'
behalf. Instead, clients are handed the keymap and raw key values,
and expected to do the right thing. (This is yet another thing left
for toolkits.)

There is actually a design principle behind having key repeat in the
client side. Real user input always corresponds to some physical action
of the user, happening at a point in time. The kernel input drivers
tell us the timestamp. All Wayland protocol events corresponding to
real user input have that timestamp. Key repeat events would not be
real user input events.

Furthermore, we do not fake input events. If we wanted to support e.g.
pointer warping, we would probably need a new event to tell clients
that the pointer was moved by something else than the user, and it
would not have a timestamp (since we cannot assume that we can fake
a meaningful timestamp).

> > Event driven is a little more work for the "simple" games, but it gives
> > you guarantees. Would you not agree?
> 
>     We can definitely work with it.  As much as anything it's a
> question of convenience; the question is really how much
> superstructure we need to build on top to get what we need.  We've
> already got that superstructure elsewhere, so porting it over is
> simple enough.  It would be more convenient if we didn't have to, but
> it's not a deal breaker.
> 
>     For context, I'm not trying to convince you to change the protocol
> or the model per se; aside from anything else, I don't yet understand
> it well enough to seriously critique it.  A large part of what I'm
> hoping to do here is offer some insight into how games tend to use
> input, the kind of needs games often have, and the sorts of
> considerations that make a system easier or harder to put a game on.
> Wayland obviously has competing considerations, some of which are
> arguably more important than games.  If one can imagine such a thing.
> 
>     One thing worth noting here is why we want operate on virtualized
> input structures rather than raw events.  One reason I mentioned
> above; accumulating events so that they can be applied between frames.
>  Another reason is complexity management; games can be quite complex
> beasts consisting of many parts, and everything that can be done to
> isolate those parts makes the game easier to develop and maintain.
> 
>     The classic problem with a purely event-driven program is that
> somewhere in it there is a giant event loop that knows about
> everything in the program.  In something simple like a calculator,
> it's not a problem, but once you scale up to a large system with
> multiple subsystems the event loop can turn into a nightmare.  Having
> virtualized input structures that the game can query means that input
> tests can be isolated to the code where they belong. ie:
> 
> if(KeyTrigger(KEY_D) && KeyDown(KEY_CTRL))
> {
>   Log("heap integrity %d\n", check_heap_integrity());
> }
> 
>     You can achieve some of the same modularity with function pointer
> lists or similar hooks, but having a virtualized input structure has
> (in my experience at least) been the cleanest abstraction.

Yup, and that is not something to be solved by libwayland-client
but a toolkit library used by the application, be that GTK+, Qt, SDL,
or something else.

I think Wayland is too new to have such a general purpose library
developed yet. Such a library should probably be formed by
consolidating code that repeats in many toolkits, rather than be
designed from scratch. This also assumes that there are some common
data structures to be shared, which is not certain.

> > Is this referring to the problem of "oops, my mouse left the Quake
> > window when I tried to turn"? Or maybe more of "oops, the pointer hit
> > the monitor edge and I cannot turn any more?" I.e. absolute vs.
> > relative input events?
> 
>     Partly.  The issue is that *sometimes* a game wants the mouse and
> keyboard to behave in the standard way (ie: the mouse controls the
> pointer and lets you click gui elements, the keyboard is for entering
> text and hitting control keys) and *sometimes* the game wants the
> mouse motion to control an in-game object (often the camera) and just
> wants the keyboard and mouse buttons to be a big bag of digital
> buttons.  With the Quake example, when the pause menu is up, or when
> the terminal has been called down, the game wants the keyboard to be
> generating text commands on the terminal and the mouse to be able to
> select text and click on buttons.  When the terminal is gone and the
> game isn't paused, Quake wants the mouse to control the camera view
> and the keyboard WASD keys are emulating a game controller dpad.
> 
>     So, yes, absolute vs. relative events is part of the issue, but
> it's also part of a greater context; whether the keyboard is
> generating strings or digital inputs, whether the mouse is generating

FWIW, wl_keyboard generates neither, to my understanding. It sends
some kind of keycodes, and a keymap, which describes what the codes
mean. How you use that information is up to you.

> positions or deltas, under what circumstances focus is allowed to
> leave the window, whether the mouse pointer is visible, and things
> like how system-wide hotkeys factor in to things.  Can I capture the
> keyboard and mouse without preventing the user from using alt-tab to
> switch to another program, for instance?
> 
>     Clean, fast switching between these states is part of it as well;
> in a game like Quake, as above, you want to be able to capture the
> mouse when the game is playing, but "uncapture" it when the pause menu
> or the game terminal are up, or if the player switches focus to
> another program.  In an RTS, you might want a visible cursor but want
> to constrain the mouse to the window to allow the map to scroll.  You
> might want to use the keyboard mostly for hotkeys, but if they hit
> enter you want them to be able to type a string in to broadcast to
> multiplayer chat.  The scroll wheel might control either the message
> scrollback or the zoom level, depending on what the cursor is floating
> over.
> 
>     There's also the question of clean recovery; if a game has changed
> the video mode (if that's allowed any more, though these days with LCD
> panels and robust 3D hardware maybe that's just a bad idea), turned
> off key repeat and captured the mouse, all of that needs to be
> reverted if the game exits ungracefully.  Which sometimes happens,
> especially during development.

The clean recovery is a major design principle in Wayland. We do
not have a request "change videomode" at all (for normal clients).
Instead, we have a request for: this is my window surface, please
make it fullscreen, and I would really prefer if you
scaled/centered/switched video mode to achieve that. That is, the
client can make a wish of how it wants the fullscreening done, in
case the surface size is different from the current output
resolution. The compositor is allowed to do whatever, but of course
it should honour the wish most of the time. The compositor knows
the output hardware better, and it also knows everything about the
desktop environment state (open windows, etc.), so the compositor is
the best in charge for choosing the way. The compositor can also take
the user's desktop preferences into account.

That allows some interesting features. Say, your desktop is
1600x1200, and your game wants 1024x768. The compositor is actually
free to change the video mode at any time it chooses. For instance,
it can switch to 1024x768 only when the game is active and on top,
and switch to 1600x1200 when the user activates another window. You
could even switch between several fullscreen applications wanting
different resolutions.

Also, should be game crash, the "video mode" is tied to the window,
so when the window gets killed, the video mode is automatically
restored for the desktop.

> > There is a relative motion events proposal for mice:
> > http://lists.freedesktop.org/archives/wayland-devel/2013-February/007635.html
> 
>     Something like that will be needed for a lot of styles of game,
> but also has use elsewhere.  For example, there used to be a widget on
> Irix machines IIRC, that looked like a trackball.  If you put the
> mouse pointer on it, held down the mouse button and then moved the
> mouse, it would scroll the trackball control rather than move the
> mouse pointer.
> 
>     Similarly, when you're doing scroll bars, if you want to make a
> scroll bar where dragging the thumb moves the scrolled view at a rate
> that is pixel-proportional rather than window-size proportional, you
> have to be able to warp the pointer; otherwise, the view is slaved to
> the thumb position, so taller views scroll by faster.
> 
>     Concrete example:  Let's say I have a document that is 1000 pixels
> tall, in a view that's 400 pixels tall.  Let's fudge the math a bit,
> say the thumb is one pixel tall and the region the thumb can be
> scrolled over is the full height of the window.  The window shows 40%
> of the document.  Without pointer warping, each step in the scroll bar
> is (600 / 400) pixels, so we're scrolling on average 1.5 pixels of
> view for every pixel the thumb moves up or down the screen.
> 
>     Now, in the same view, we have a 250000 pixel tall document.  The
> document got longer, but the scroll bar is the same height (and thus,
> the same number of steps).  Each step of the scroll bar is now (249600
> / 400), or 624 pixels, enough that each scroll thumb movement scrolls
> more than 1.5x the view area.
> 
>     The classic solution to this is when the scroll amount goes above
> or below sane thresholds, the view is moved by a sane amount, the
> scroll bar is moved by the correct amount (if any) for the new view,
> and if necessary the pointer is warped to the new thumb position.
> 
> > Clients cannot warp the pointer, so there is no way to hack around it.
> > We need to explicitly support it.
> 
>    Hmm.  The inability to warp the pointer is going to put constraints
> on gui designs even outside of games.  Consider the scrollbar example,
> above.  That one isn't just a matter of locking the pointer somewhere,
> it's a matter of positioning the pointer based on the new scroll thumb
> position.  If anything we're actually better off in games in that
> scenario, because we can just shut off the pointer draw and draw a
> pointer in-engine.
> 
>     I'm assuming there are sane protocol reasons for not allowing
> pointer warping, but I think you'll find it's one of those PITAs that
> you need to implement to avoid greater pain later.  Bad scroll bar
> behavior is one of those things that can grate on people.
> 
>     Within games, there's the classic "try to move the mouse off the
> window, the pointer stops and the map scrolls" case that we'd like to
> be able to handle.

Right, and the pointer lock proposal should solve all these pointer
problems and uses cases, but with one caveat: when the pointer lock
is broken/released, the absolute pointer position is still where it
was when the pointer lock was activated. So in your scroll bar
example, the user would think the pointer warped back to the point
where he started dragging the scroll thumb, when in reality the
pointer never moved.

Btw. the functional details of the pointer lock proposal are in this patch:
http://lists.freedesktop.org/archives/wayland-devel/2013-February/007638.html

I wonder if we could solve that problem by adding a request, that
will set the pointer position, but only during the pointer lock, and
when the window is active, i.e. between the locked wl_pointer.enter and
leave events. Clients only ever know about surface-local coordinates,
so the position would be specified in surface coordinates. How would
that sound?

As for the pointer cursor, activating the pointer lock automatically
hides the cursor surface in that proposal, so if you still need the
cursor, you have to draw it yourself. OTOH, that should not be too
hard, since clients always provide the cursor image, anyway.

> > Ah yes, deltas are the relative motion events, see above.
> 
>     Deltas are quite useful, though obviously we can calculate them
> ourselves.  Some of the desire to have deltas from the system input
> comes admittedly from an admittedly somewhat childish engineering
> distaste for repeated translation back and forth between deltas and
> absolute positions as the data percolates up through the software
> stack.  Coming out of the hardware (at least for classical mice and
> trackballs) the "analog" values are all deltas.

Yeah, pointer lock gives you deltas.

> > Aah, reading this the third time, I finally understood what you meant
> > by input capture. The URL above for the relative motion events should
> > be exactly this. We are more accustomed to the term "pointer grab" or
> > "grabbing", meaning that during the grab, all input events go to this
> > particular window, until the grab is ended.
> 
>     Ok, I'll try to stick to that term.  The thing is, we don't
> necessarily want *all* events routed to us; we don't want to trap
> system-level stuff like program switching (alt-tab), the "lock screen"
> button, the volume and brightness controls, the screenshot button (if
> any) and so forth.  We want *most* of the events routed to us, but not
> to the exclusion of system and window manager functionality.

Such system level input events (e.g. hotkeys) are eaten by the
compositor before they are ever sent to any client, so you simply
cannot block or grab them. But, if a hotkey is actually a composition
of several keys, like alt+tab, the client with the keyboard focus will
see alt going down, but not the tab, IIRC. (And then the protocol takes
care of not leaving the client with a "stuck" alt.)

Btw. the input protocol is designed so, that you cannot take grabs at
arbitrary times, and you cannot expect your grab to hold forever if you
don't release it. The compositor is allowed to deny grab requests, and
also break grabs at any time. It is impossible for a client to hold the
server hostage.

> > One thing you didn't list is input latency. In Wayland, every
> > input event from user actions has a timestamp corresponding to when
> > they occurred, but the events may not be relayed to clients ASAP.
> > Instead, for instance Weston relays input only during the refresh
> > cycle, I think. That might be a problem for games wanting to minimize
> > input latency, since it limits input state update rate to the monitor
> > refresh rate.
> 
>     That's potentially an issue; I'd just assumed events would be
> delivered immediately.  Might it be possible to have a knob we could
> push to request "real time" input, for whatever value of "real time"
> the system can manage?  Again, in some cases (mostly where a game is
> not grabbing input) the game can deal with keyboard and mouse input at
> desktop rates; things like text entry and gui interaction aren't
> always time critical in games (though they can be).

Right, so we should start a discussion for such a feature. The best way
to start would be to propose the protocol for switching to "immediate
input" mode. Another is to file a bug, so we get a permanent and
trackable record of the feature request, if no-one wants to drive the
protocol addition development themselves.

>     Often, though, we want the lowest latency the system can manage.
> There's often an update lag on monitors already, and some input
> systems (especially things like touch panels; try a draw program on
> the ipad to see why Nintendo still uses resistive touch screens
> despite their disadvantages) can have atrocious lag.  In a living-room
> game PC hooked up to a TV, you can be looking at lags of several
> frames between the video signal going out on the wire and appearing on
> the display due to HDMI signal processing and cleanup, plus a
> potential frame or two of lag on wireless gamepads, keyboards and
> mice.  The game adds at least a frame of lag due to the nature of the
> simulation tick, and potentially another depending on how display
> buffering is done.  Any lag on top of that and we're wandering
> dangerously close to the 150ms delay that my HCI prof said was
> "perceptible input lag", and those studies were done with people using
> VT102s to do text entry, not gamers playing twitch games.
> 
>     I think the best option would be the "real time" switch; let a
> client tell the server "these events are time-critical to me".  We
> probably don't need all events at max speed; window metadata (resize,
> move, destroy...) and so forth can be delivered whenever it's
> convenient.  A game might only really need "real time" input for (say)
> the mouse and WASD keys, or it might only care about "real time" input
> from the gamepad.  The actual requirements may well differ at
> different parts of the game.
> 
> > Depending on the game and physics engine, of course, is it possible to
> > make use of the input event timestamps to integrate the effect of, say,
> > a button going down some time in the past, instead of assuming it went
> > down when this game tick started?
> 
>     In some games, sure.  The problem is, any lag like that can
> potentially end badly for the player.  What if we've already killed
> them before the input comes in?  What if it's a network game, and the
> new input means that instead of being killed by player B, they
> actually got player B first?
> 
>     In general, the problem is that yes, we can go back and correct
> the simulation for the revised input, but what we *can't* do is revise
> the player's decisions based on the previously incorrect simulation
> that we've already showed them.  Games strive to have as tight a
> feedback loop as possible, so if the simulation is not fed input when
> it happens, we're putting information in front of the player that
> we're going to revise *after* they have started reacting to it.
> 
> > What I'm trying to ask is, are the timestamps useful at all for games,
> > and/or would you really need a minimum latency input event delivery
> > regardless of the computational and power cost?
> 
>     Timestamps can be useful as a fallback, but minimum latency is by
> far the highest priority.  Lower latency translates directly to a
> better play experience.  The difference of even a frame of lag has a
> measurable effect on player enjoyment and control.
> 
> > Keeping in mind, that event based input delivery does not rely on high
> > update rates, like polling does, to not miss anything.
> 
>     If the events are just coming in as a pile in 60Hz ticks, it's all
> good and we can do everything we need to.  If they're coming in as a
> pile at 10Hz ticks, it's going to be difficult to make anything more
> active than Solitaire.

Yes as far as I understand, currently input events are sent to Wayland
clients as a burst at every compositor repaint cycle, which happens at
the monitor refresh rate, so for a 60 Hz monitor, you would be getting
them in bursts at 60 Hz.

Note, that this is specific to Weston, though. I'm not sure if we have
any guidelines on this for other compositors, and the protocol does not
say anything about it. I cannot imagine why anyone would want to delay
input events beyond the repaint cycle. Others might just do the
immediate sending, since it is easier to implement.

(Actually it seems to be more complex in Weston that I implied here,
but this is probably the practically worst case currently.)

> > There is also one more catch with the timestamps. Their base is
> > arbitrary, and a client does not know which clock produces them.
> > Therefore they are only useful as realtive to other input event
> > timestamps. Would you need a way to get the current time in the
> > input clock to be able to use them properly?
> 
>     At least in our case, we're typically running the simulation off
> of either a vsync clock (consoles, mostly) or a millisecond clock
> (gettimeofday() or the platform equivalent).  Anything coming in we
> typically try to relate to those.  Some sort of timestamp we could
> relate to an actual world clock would be important; without it we'd be
> into calculating times based on heuristics, with all that implies.
> 
>     VSync stamps would be good enough, or millisecond stamps.
> Anything with fixed time units.  As long as we know the size of the
> time unit and some arbitrary base time (ie: the timestamp of the first
> event we got), that's all we really need; if we need to relate it to
> the wall clock, we can call gettimeofday() and compare.  If the time
> units aren't fixed (ie: if they're just monotonically increasing IDs
> that don't actually encode time values and are only useful for
> establishing order), the results for games will be unfortunate.

The timestamps are specified to be in milliseconds, and preferrably from
a monotonic clock (which e.g. gettimeofday is not). The things that are
not specified are the absolute value of the clock (or epoch, or
relation to the real time clock), and resolution.

Matching input event timestamps to any specific clock you read yourself
is not easy. The input event timestamp is at an unknown time in the past,
while the clock you read is "current". I can't see a way to just
compute the difference.

Therefore we would need a protocol extension to ask in which clock do
the timestamps come, and still the answer might be something you do not
know about, or cannot access, so you would still need some fallback.
And we cannot just ask the compositor for the current time, since the
compositor might not have access to that clock, and there is a delay
between the compositor reading the clock, and the client waking up and
processing the event.

We also do not have any guarantees (without adding a protocol
extension) that the vsync timestamp is from the same clock as the input
event timestamps.

Btw. the vsync and presentation events, and their timestamps are yet
another whole new story, unrelated to input. :-)

Thanks,
pq