J, K, or How to choose keyboard shortcuts for web applications
Keyboard shortcuts are a powerful productivity booster widely used in desktop software. How can we apply them in web applications?
Some 20 years ago I was finishing my diploma. Home computer was by far a luxury, so I worked by nights in a friend’s office. The friend introduced me to a powerful Windows 3.11 Word with tons of great possibilities that made writing a sheer pleasure (comparing to a typewriter!). Remember one thing, he said. Whatever you do, whatever you write, hit Ctrl+S right after you’ve finished typing. Other skills will come later. Ctrl+S you should master right now.
Now web apps have become true working environments for many activities (including writing diplomas). Productivity and speed are a business necessity, not buzz words. Some keyboard shortcuts web apps inherit from the desktop software. But can we blindly copy them?
Why keyboard shortcuts?
Shortcuts accelerate common actions and are mostly used by experts. So basically any web application that supports a stable scenario of repetitive actions and aimed at professionals can benefit from providing keyboard shortcuts to streamline the process.
It’s not the speed of just one operation (what’s faster to type Ctrl+C or to press Copy button in the toolbar?), but the whole sequence of actions and possibility to concentrate on the task at hand.
Looking for the Zoom under the View menu definitely requires more conscious attention and precision than pressing Ctrl +. Taking hands off the keyboard to grab a mouse will slow you down because it breaks the flow of thought. Yet using shortcuts requires knowledge and experience, and with lack of standards, discoverability and learnability are the main problems you’re going to address when creating key maps for your web application.
Overriding standard shortcuts
There are two categories of keyboard shortcuts you must be very careful of:
- native browser and operation system shortcuts that affect the browser behaviour (not the contents of the web page), like Ctrl+N, Ctrl+O, Ctrl+S, Ctrl+P, Ctrl+W, etc.
- native browser shortcuts for the page content, like Ctrl+C, Ctrl+Z, Ctrl+F, spacebar etc.
I would definitely recommend to simply avoid overriding the first type of shortcuts. Ctrl+N for example, is a standard shortcut for starting a new document, project, etc in desktop software. Isn’t it an obvious choice for your web app? No. Because your application is not the only one. I might have dozens of tabs running pretty much the same number of web applications. Preventing me from opening a new browser window will be **very** annoying.
Of course, you can try some Ctrl+Alt+Shift+N instead, but several modifiers are difficult to memorise. They don’t look like a system, which helps to understand the logic behind the shortcuts.
As for the second type (the standard browser shortcuts affecting the tab contents), it’s better to either provide the same action or customise its behaviour.
Lets take, for example Ctrl+F, Find in page. The shortcut works in every browser and though with some differences, does the same: finds text in the currently open browser tab. Changing this shortcut to show your own customised Search window is OK, if it doesn’t limit the functionality but broadens it (e.g. search not only in the text, but also with some specific parameters of the application, etc.) But using Ctrl+F to show, for example some Functions window would be totally wrong.
Letters only shortcuts
There’s a very good option to spare yourself headache of selecting modifiers and avoid mixing them up with the native browser and OS shortcuts. Letters. One or two letter shortcuts are simpler and easier to remember, they won’t interfere with OS or browser.
You can use one letter shortcuts for most common actions (c for comment), and two letter shortcuts for less common actions (gp for go projects page).
Letters make the shortcuts more memorable, because unlike modifiers they can convey meaning. Something like shorthand: gp - Go to projects is easy to memorise, Ctrl+Alt+p (or was it Alt+Shift+p?) can be only remembered after hundred times.
Combining one and two letter shortcuts
It’s better not to use both one and two letter shortcuts if they start with the same letter. For instance, you want to use c for comments, and cr for comment restrictions. The first keystroke will bring up the first action. Technically, you can add a delay after the first letter, to wait for a (probable) second key press. But this will delay the first action, and might look and feel like your application is slow. Just imagine that you press c for comments and the text area appears after half a second delay. Not so cool, is it? Better assign the rc for restrict comments instead :)
You can though use one letter as a starting point in a whole menu of controls. For example, use b shortcut to open the brushes palette in an image editing software, and then p for pencil, another b for brush, r for colour replacement, etc. So if you need to switch to pencil, you can just press b then p. If you need a trickier option, don’t know the shortcuts, etc., you press b and then arrow keys to navigate the menu.
Using letter shortcuts while editing text
There’s also another problem - letter shortcuts don’t work when you’re writing. For this case you can either provide a simple way to step out of the editing mode (press ESC to switch modes, or Enter to finish editing, and e to edit again, etc.), use ‘smart syntax’ (Markdown or other formatting language), or append modifiers. For instance, you use / to set focus to the search field, but when you’re typing you’ll need a modifier for the same action, for example, Alt /.
Keyboard shortcuts are also good for controlling temporary modes, or quasimodes. Quasimode is a term coined by seminal Jeff Raskin to describe a state when application can work in different modes but still stay modeless. To cite Wikipedia:
“The application enters into that mode as long as the user is performing a conscious action, like pressing a key and keeping it pressed while invoking a command. If the sustaining action is stopped without executing a command, the application returns to a neutral status.”
So you don’t have to remember in which mode the application is now, and look for a ‘switch-this-mode-off’ style control. Simply press a button to enter a mode, and release it to return to the default state. For instance, we press Shift to draw a square instead of a rectangle or capitalised text.
On a side note,
- Aza Raskin has a great (but long!) article explaining modes and quasimodes, definitely worth reading if you’re looking for more theory behind the concept.
- In Checkvist we use a quasimode for drag-n-drop - you need to keep the Shift key pressed to drag-n-drop list items with the mouse. There’s also another way of reordering list items, with Ctrl+up/down keys, which is more in accordance with the ‘keyboard-orientedness’ of the application, so the ‘mouse way’ is secondary and thus hidden behind a quiasimode.
The problem with quasimodes is that while you keep a key pressed, you can hardly type a considerable number of characters (and this was a problem with Aza Raskin’s Enso). So if your customers need to type a lot IN CAPITALISED LETTERS, provide them with a means to enter the full capitalised mode. If not, a quasimode (Shift key) is enough.
Choosing the shortcuts
There are hardly any standards in this field, but if we wish to flatten the learning curve for our customers, we must consider current conventions.
Good ol’ desktop text editors
- Ctrl+C/V/X/D - copy/paste/cut/duplicate
- Ctrl+Z - undo
- Ctrl+A - select all
We can also use Ctrl+B, Ctrl+I, Ctrl+U, etc for text formatting, as they do it in Google Docs, for instance (though it would be so much better if Drive understood Markdown as well).
- ? - keyboard cheat sheet
- / - set focus to the search field
- up/down arrow keys (also j/k) - next/previous item
- left/right arrow keys - collapse/expand tree nodes
- Tab/Shift tab - indent/unindent for hierarchy (also focus in the form elements)
- ESC - close any pop-up/dialog window
- Ctrl+Enter - submit form
Selecting keyboard shortcuts for a web application will mostly depend on your choice, but whatever you choose, make it a clear and consistent system, never let developers set a shortcut just “because I like how it’s done in FooEditor”.
To keep the system consistent over time, have a keyboard cheat sheet before your eyes either in a digital or a print-out form. It will help you to select better shortcuts, to re-group shortcuts in the cheat sheet if necessary, and generally to better understand what our customer is dealing with when we buzz about ‘productivity and convenience’.
The common standard is to show the keyboard shortcuts window on pressing the ? key. If your application uses different shortcuts in different contexts (say, for preview and for editing a document), you can change the contents of the ? help window like Flickr does:
Pros of such approach:
Each window looks less frightening (if your application has many shortcuts).
People don’t get the full overview at the first glance, and can falsely conceive there are no more shortcuts besides the ones listed in the help window they’re looking at.
Another approach is to divide the keyboard cheat sheet window into sections (by context, action type, etc.) and show each section on a tab.
Don’t forget other small things that matter in shortcuts discoverability.
- Mention shortcuts in the menus:
2. In tooltips on links and buttons:
3. Hints in the dialog windows
4. Context-aware hints in the bottom bar
- Don’t override native browser (or OS) shortcuts.
- Support standard shortcuts that don’t contradict the previous rule, and use one or two letter shortcuts for other actions.
- Always have a consistent system.
- Pay maximum attention to discoverability.
All the above said is a result of several years stepping on rakes while designing, observing and re-designing keyboard-friendly web apps. Hope it will help you in your own quest for the most productive and usable web apps, and hope you’ll share your own experience and ideas regarding the subject.