The Gamepad API is a general API supporting a large number of possible input devices. However, it's named after the most common use case: gamepad controllers. It could definitely support IR remote controls, switches, audio mixers, and so on... Maybe it should be named the "Input Device API". :) It's not as general as DirectInput or USB HID though. No access to positional information (like mouse deltas, finger coordinates on a trackpad, or 3D tracking), no force feedback, and no access to device accelerometers.

Overall, the Gamepad API is narrow in scope. With a little more specification effort and implementation complexity (copy the good bits of DirectInput or Raw Input?), the API could support a huge number of use cases.

That said, within its narrow scope, the Gamepad API is well-designed. Devices can have an arbitrary number of either buttons or axes.

I'd make a few changes, however.

Thoughts

Remove Fingerprinting Mitigation

Gamepads MUST only appear in the list if they are currently connected to the user agent, and at least one device has been interacted with by the user. If no devices have been interacted with, devices MUST NOT appear in the list to avoid a malicious page from fingerprinting the user.

That's such a lost cause it's not even funny. There are dozens of ways to retrieve identifying information from a user (and companies that license implementations of such). As long as there are machine learning algoritms and any differences at all between different computers and browsers, users will be fingerprinted. My favorite example is when researchers demonstrated a fingerprinting technique by rendering text into a canvas and analyzing the anti-aliasing and font rendering. Attempting to mitigate fingerprinting by penalizing the user experience seems like the wrong tradeoff here, except perhaps on an opt-in basis.

Allow Standard-Mapped Devices to have Extra Buttons or Axes

Two sentences in the spec imply that an input device that is recognized to implement a standard mapping will not expose more functionality than is defined by the mapping object.

When the user agent recognizes the attached device, it is RECOMMENDED that it be remapped to a canonical ordering when possible.

and

The standard gamepad has 4 axes, and up to 17 buttons.

Simply changing the wording to "has at least 4 axes, and at least 17 buttons" would allow games that default to standard mapping inputs to work with devices that support additional capabilities, like racing wheels with standard-mapped controls in the middle.

Add support for multiple standard mappings per device.

Input controller idioms come and go. Some become entrenched in a generation's gamepads, and some fade away.

Assigning a single canonical mapping per device limits the discovery of useful structure. For example, an SNES controller, Wiimote, or Logitech Driving Force wheel wouldn't satisfy the "Standard Gamepad" mapping, but all of them have a directional pad. If there was a "Directional Pad" mapping, and devices implemented multiple mappings, then any game that relied on a Directional Pad would work out of the box.

Add an event-based input mechanism.

The Gamepad API's sole input discovery mechanism relies on JavaScript polling the current gamepad state at high frequency to detect input events from the gamepad. For interactive applications like games, this is usually fine. However, polling at 60 Hz in JavaScript is excessive if you just need to know when a button was pressed. We don't poll for the mouse or keyboard events - why are gamepads different? If the argument is convenience, libraries can always offer that.

Moreover, underneath it all, some high-priority operating system thread is polling the device, and translating the current device state into an event stream. This is why, even though XInput is a polling-based API, if your game drops a few frames, button presses aren't lost.

For certain classes of problems, like mental chronometry, you need to know when the button press occurred so you can measure elapsed time within a millisecond or so. If JavaScript is polling button state, it doesn't know when the input event originated - perhaps 10-50 ms of latency has elapsed by the time it sees the button change - but the underlying high-priority polling thread knows. (Or at least has a better sense.)

Let's say I'm playing a game running at 10 Hz and I press a button to open a bit of UI. Display of the UI hitches (maybe WebGL textures are being uploaded), pausing the game's polling loop for one second or so, blocking requestAnimationFrame from polling the device. If the Gamepad API isn't polling the device at a higher frequency under the hood, any buttons pressed and released within that one second period would not register. So we have to assume that, to avoid missed button presses, they have to be queued. But, since only ONE change can be measured each frame, each press has to be queued for two frames (button DOWN, button UP).

Thus, even though the presses occurred at T, T+100ms, and T+200ms, JavaScript won't see them until T+1000ms, T+1200ms, and T+1400ms. In games that rely on high-precision gestures, this is the difference between it recognizing gestures even in the presence of frame drops, and it missing gestures entirely.

If you think dropped events aren't a problem in "real games", try playing SimCity 2013 on an average Mac sometime... The low frame rate would be completely tolerable except for the dropped events.

In contrast, an API where JavaScript would ask "Give me all the input events since I last asked" would associate a timestamp with each event, allowing for accurate combination recognition. If the Gamepad API is backed by Windows's Raw Input API, this data can be retrieved with GetMessageTime().

The Gamepad API is definitely a step in the right direction, but it feels like it ought to be a little lower-level and more general to avoid being yet another Almost Good web API.