Keyboard input
Mike Paquette
paquette.mj at gmail.com
Tue Dec 14 10:02:05 PST 2010
On Dec 14, 2010, at 2:20 AM, Stefanos A. wrote:
>
>
> 2010/12/14 Mike Paquette <paquette.mj at gmail.com>
> On Dec 13, 2010, at 4:47 PM, Daniel wrote:
>
> > Any thoughts on keyboard input?
>
> I recently completed work on a novel window system not too different from Wayland. We ran across a number of potentially desirable features and behaviors regarding keyboard input.
>
> * Keyboard mapping/translation
>
> We found it handy to have the keyboard mapping done in the server. An initial mapping table was loaded at startup, based on the user's preferred language and the type of keyboard present. At login time, the user's preferences were consulted, and an optional mapping table could be loaded that overrides the startup table. Some folks wanted the ability to use a different mapping just within their application, so we added the ability to associate a mapping just with one stream of events, to a specific application.
>
>
> One small question to get the hang of this. I use the awesome colemak keyboard layout (http://colemak.com/) which has a top row that reads "qwfpg" instead of "qwerty". Yesterday, I installed Starcraft 2 and was amazed to see that it recognized this and remapped its hotkeys to fit my keyboard layout automatically. For instance, the "e" (qwerty) hotkey was mapped to "f" (colemak) which it has the same physical location on the keyboard. Had it remained at "e" (colemak) I'd have to press "k" (qwerty) which, needless to say, would be much less usable.
>
> I was genuinely surprised by this, as it's the first (and only!) application that managed to recognize and adopt to my keyboard layout. Previously, I had to either setup hotkeys manually (time-consuming) or switch between colemak/qwerty when chatting/playing (annoying and error-prone).
>
> My question is, how could this be implemented using your keyboard mapping scheme?
That's actually pretty straightforward. The bad experiences you've had were probably due to apps using the scancodes or virtual keycodes, the untranslated stuff from each key. What I did for hot keys was to take each keycode as they came in and translate the key code through the keymapping to get a localized character code before testing for a hot key. Developers were asked to register hot keys by the ASCII set character on the keycap, lower case. The keymapping tables for almost all keyboards would yield a lower case ASCII character if they were told that only the 'Command' modifier key was pressed. When we did the key translation for hot key hit testing, we would tell the translation software that the Command key modifier was present, no prior state, and feed in the keycode. (For a couple of odd keymapping tables, the Command trick gave a non-ASCII value. We had to test for a non-ASCII result, and if we got that, tried the translation again with a state of no modifier keys pressed.)
This gave us hot keys that could be specified without regard to keyboard layout or language. For a few specialized keys that didn't appear in the keymapping tables, such as brightness and volume, we still had to support specifying the hot keys by scancode or virtual keycode.
The 'menu keys', keyboard shortcuts for menu items, are handled in a similar manner, with the events just going to the foreground app as usual.
>
> * Hot keys
>
> Hot keys, or keystroke combinations set to preform a specific system management task rather than generate key events, can be set. These hot keys are typically used to manage display brightness, sound volume/mute, or activate specialized functions such as soft disk eject, or activating a display overlay. A hot key is typically a function key, or a key in the main array of keys combined with modifier keys, such as Control-Alt-Delete. We grouped the hot keys into system-level non-overridable and non-disableable keys, user interface controls that could be disabled or overridden, and application level keys that could be disabled or overridden. There's a bunch of fiddly refinements to these that I can go into if anyone is interested. (Full screen apps disabling almost all hotkeys automatically, for example.) Hot key events are routed to specific applications registered with the hot key.
>
> We implemented a registry mechanism that handed out hot keys first come, first served, which also checked for silly things such as registering 'a' as a hot key, no modifiers, and handled collisions between hot key requests, including the system reserved keys.
>
> Sounds like a refined version of X11 passive grabs. What happens when an application requests a hotkey that is already registered at UI level? (Fail or override?)
>
> The fullscreen-mode-disables-hotkeys touch is great!
Hot keys could be registered as exclusive or non-exclusive in my old server. Most of the critical UI keys were registered as exclusive keys. If an app tried to register an exclusive key that collided with one already registered, it would get back an error indicating the resource was already in use. If it tried to register a non-exclusive hotkey, that would succeed, but the key would remain disabled until the exclusive key already registered was removed or disabled.
Multiple non-exclusive hot keys would all get hot key events when their key combination was entered. Yes, that's deliberate.
>
>
> * Key Thief
>
> This is the behavior where an application grabs the keyboard focus for a while, then hands it back. We wound up actually implementing a stack of Key Thiefs. (Why is a fascinating UI design question inappropriate for this list.) When a key thief grabs keyboard focus, the former focus holder gets an event telling him that he has lost the keyboard focus, so UI can be updated if needed. When a key thief releases focus (procedurally, or by ceasing to run) focus is returned to the prior holder with a notifying event so UI can be updated if needed.
>
> It may be desirable for a key thief to have the option of disabling most hot keys.
>
> Can a thief propagate the stolen events back to the receiving event queues? Is it possible to receive keyboard events without input focus? (I'm thinking of the following scenario: multi-window application that uses a dedicated input thread to coordinate input for all windows).
The key thief could propagate stolen events back to other event queues, but because of the way my old system was designed, that wasn't usually necessary. (VNC servers forwarding a window or set of windows for one app made use of this facility, but I can't think of anything else.)
The key thief was there specifically to allow keyboard input to be grabbed away from the app that would normally be the input focus. Consider an app that needed to execute an operation that needed elevated privileges, such as mounting a disk image. That app might ask a security service to authenticate and run the operation. If the security service needs to authenticate the user as a person who is allowed to perform that operation, it would present a UI panel. The original app still has keyboard focus, and is currently blocked in a request to the security service, so it won't respond to requests such as a change of keyboard focus. The UI panel from the security server acts as a 'key thief' and steals input focus momentarily while it collects authenticating information, then releases the keyboard focus back to the app when it dismisses the UI panel.
Because of the system design of my old window server, the case where an application has multiple windows and a dedicated input thread is actually the trivial case, or normal mode of operation. Each application holds a connection to the window server. Each connection consists of an identifying token, an RPC channel to make requests to the server (create a window, flush this region, resize the window, etc), and an event channel on which annotated events for all windows the app had created are returned. Each event included the window identifier, global, and local coordinates, along with event-specific data. In normal operation, a process would hold one connection, which would 'own' all the windows for that process. Code within the application process could examine all events when in a special state such as a modal drag, or simply forward events to the appropriate window objects for further processing (normal mode of operation).
This meant that for things like the rootless X11 server app, the process got all the events for all of it's windows with no special coding needed, and would just map the events into the X equivalents for the usual processing and dispatch.
For specialized operations such as sandboxing, an application could grant use rights to it's connection (or one of it's connections) to another process. For security reasons this handoff was always from the owner process to the process that would act as a sort of delegate. (Think of the case of a web browser running an untrusted plug-in. The plug-in can be run in a sandboxed process, but granted access to the browser's connection for event receipt and drawing access to the browser's window.)
Mike Paquette
More information about the wayland-devel
mailing list