How Motion Detection Works in Xbox Kinect

The prototype for Microsoft’s Kinect camera and microphone famously cost $30,000. At midnight Thursday morning, you’ll be able to buy it for $150 as an Xbox 360 peripheral. Let’s take some time to think about how it all works.

Camera

Kinect’s camera is powered by both hardware and software. And it does two things: generate a three-dimensional (moving) image of the objects in its field of view, and recognize (moving) human beings among those objects.

Older software programs used differences in color and texture to distinguish objects from their backgrounds. PrimeSense, the company whose tech powers Kinect, and recent Microsoft acquisition Canesta use a different model. The camera transmits invisible near-infrared light and measures its “time of flight” after it reflects off the objects.

Time-of-flight works like sonar: If you know how long the light takes to return, you know how far away an object is. Cast a big field, with lots of pings going back and forth at the speed of light, and you can know how far away a lot of objects are.

Using an infrared generator also partially solves the problem of ambient light. Since the sensor isn’t designed to register visible light, it doesn’t get quite as many false positives.

PrimeSense and Kinect go one step further and encode information in the near-IR light. As that information is returned, some of it is deformed – which in turn can help generate a finer image of those objects’ 3-D texture, not just their depth.

With this tech, Kinect can distinguish objects’ depth within 1 centimeter and their height and width within 3 mm.

Figure from PrimeSense Explaining the PrimeSensor Reference Design.

Middleware

At this point, both the Kinect’s hardware – its camera and IR-light projector – and its firmware (sometimes called “middleware”) are operating. The Kinect has an on-board processor which is using algorithms to process the data to render the three-dimensional image.

The middleware also can recognize people: distinguishing human body parts, joints and movements, as well as distinguishing individual human faces from one another. When you step in front of it, the camera “knows” who you are.

Does it “know” you in the sense of embodied neurons firing, or the way your mother knows your personality or your confessor knows your soul? Of course not. It’s a videogame.

But it’s a pretty remarkable videogame. You can’t quite get the fine detail of a table tennis slice, but the first iteration of the WiiMote couldn’t get that either. And all the jury-rigged foot pads and nunchuks strapped to thighs can’t capture whole-body running or dancing like Kinect can.

That’s where the Xbox’s processor comes in: translating the movements captured by the Kinect camera into meaningful on-screen events. These are context-specific. If a river-rafting game requires jumping and leaning, it’s going to look for jumping and leaning. If navigating a Netflix “Watch Instantly” menu requires horizontal and vertical hand-waving, that’s what will register on the screen.

It has an easier time recognizing some gestures and postures than others. As Kotaku noted this summer, recognizing human movement – at least, any movement more subtle than a hand-wave – is easier to do when someone is standing up (with all of their joints articulated) than sitting down.

https://kotaku.com/xbox-kinect-does-not-play-well-with-couch-potatoes-5565777

So you can move your arms to navigate menus, watch TV and movies, or browse the internet. You can’t sit on the couch wiggling your thumbs and pretending you’re playing Street Fighter II. It’s not a magic trick cooked up by MI-6. It’s a camera that costs $150.

Audio

Kinect also has a stereo microphone to enable chat and voice commands. The tech on the audio capture is fairly well-known, but it’s worth observing that unlike the noise-canceling microphone you might have on your smartphone or laptop’s webcam, Kinect has a wide-field, conic audio capture.

This is because, unlike a smartphone, you wouldn’t want the Kinect’s microphone to capture only sounds close to it: It’d only pick up the sound of the television set. You want it to capture ambient speech throughout the room, such as that emitted by whole groups of people watching sports or playing games.

Screenshot from Kinect Sports Hurdles

A traditional videogame controller is individual and serial: It’s me and whatever I’m controlling on the screen versus you and what you’re controlling. We might play cooperatively, but we’re basically discrete entities isolated from one another, manipulating objects in our hands.

A videogame controller is also a highly specialized device. It might do light work as a remote control, but the buttons, d-pads, joysticks, accelerometers, gyroscopes, haptic feedback mechanisms and interface with the console are all designed to communicate very specific kinds of information.

Kinect is something different. It’s communal, continuous and general: a Natural User Interface (or NUI) for multimedia, rather than a GUI for gaming.

But it takes a lot of tech to make an interface like that come together seamlessly and “naturally.”

Wired.com has been expanding the hive mind with technology, science and geek culture news since 1995.

How Motion Detection Works in Xbox Kinect

Camera

Middleware

Audio

Sign up for our newsletters

Latest news

@Grok, Can All the Money in the World Buy Good Taste?

Naked Mole-Rat Queens Release a Chemical to Stop Other Females From Reproducing

Xbox Brings OG Console Games to PC, and There’s a Good Reason Why

The Secret Behind China’s Unique Spy Satellite Is a Rare Metal It Controls

Kalshi’s New Midterms Hub Comes With a Felony-Sized Asterisk

The Must-Have Exclusives From San Diego Comic-Con 2026

‘Clayface’ Trailer Sets Up Brutal Horror Origin Story for Classic Batman Villain

Ugreen’s 9-in-1 Steam Deck Dock Drops Below Used Pricing for Prime Members, Works With Legion Go, ROG Xbox Ally, and More

Latest Reviews

‘Splatoon Raiders’ Isn’t What the Switch 2 Needs Right Now

Alienware AW3426DW Review: Gaming Monitors Get Thrown a Curveball

Anker Solix S2000 Review: The Little 2kWh Battery That Could

SwitchBot Home Dashboard Review: An E Ink Smart Display for the Weather-Obsessed

Asus ROG Kithara Review: A Huge Gaming Headset With Even Bigger Sound

Geekom A9 Max (2026) Review: Not Much ‘Max’ About It

The Best Budget Laptops Under $1,000 for Back to School

Roborock Saros 20 Review: Jack of All Trades, Master of Most

Related Articles

How Motion Detection Works in Xbox Kinect

Camera

Middleware

Audio

Sign up for our newsletters

@Grok, Can All the Money in the World Buy Good Taste?

Naked Mole-Rat Queens Release a Chemical to Stop Other Females From Reproducing

Xbox Brings OG Console Games to PC, and There’s a Good Reason Why

The Secret Behind China’s Unique Spy Satellite Is a Rare Metal It Controls

Kalshi’s New Midterms Hub Comes With a Felony-Sized Asterisk

The Must-Have Exclusives From San Diego Comic-Con 2026

‘Clayface’ Trailer Sets Up Brutal Horror Origin Story for Classic Batman Villain

Ugreen’s 9-in-1 Steam Deck Dock Drops Below Used Pricing for Prime Members, Works With Legion Go, ROG Xbox Ally, and More

‘Splatoon Raiders’ Isn’t What the Switch 2 Needs Right Now

Alienware AW3426DW Review: Gaming Monitors Get Thrown a Curveball

Anker Solix S2000 Review: The Little 2kWh Battery That Could

SwitchBot Home Dashboard Review: An E Ink Smart Display for the Weather-Obsessed

Asus ROG Kithara Review: A Huge Gaming Headset With Even Bigger Sound

Geekom A9 Max (2026) Review: Not Much ‘Max’ About It

The Best Budget Laptops Under $1,000 for Back to School

Roborock Saros 20 Review: Jack of All Trades, Master of Most

Related Articles

Back to School: The 8 Best Alternatives to Buying a TV

The Best Budget Laptops Under $1,000 for Back to School

The Best Tech to Level Up Summer 2026

Xbox Brings OG Console Games to PC, and There’s a Good Reason Why

Nintendo Makes It Clear You Were Never Going to Get a Tariff Refund