Loading...
"The human body is a great joystick." -- unknown

Motion capture technology is frequently used by game developers to capture human motion for their characters.

"Full Motion" is where motion capture is used on the player for the purpose of creating a 3D model that mimics the player's actions in real time.

2010-02-14

Preliminary Analysis: Project Natal (Part 2)

This is part two of an ongoing preliminary analysis of Project Natal (preliminary in that the Natal device hasn't been released to the general public yet). If you haven't read part one yet, it's best that you do so now. Keep in mind that everything in this preliminary analysis consists of educated guesswork based only on what the public has been told.

This article answers some questions raised by readers about part one, discusses how the Natal device works, some issues that arise because of how the Natal device works, and the current ideal controller Project Natal would need to resolve those issues.

Clarifications

Readers of part one have raised concerns about how controlling the game camera by tilting your head might be too sensitive or awkward to be practical. I obviously did not describe that facet of the controls well enough. The following should answer any concerns you may have.

Microsoft is listing facial recognition as one of the features of the Natal device. Facial recognition requires a high enough scanning resolution to distinguish between human faces, which in general are all extremely similar. Additionally, since they appear to be planning for facial recognition to replace the traditional profile password system, it is obvious that a lot of work is going into making this system as fluid and accurate as possible. It would make sense therefore that Natal will be capable of tracking the player's face in 3D in real time. The Natal device can use this ability to compare the angle of the player's face relative to the television screen.

Imagine someone behind you starts talking to you. You turn around, but you don't stop turning once the person's in your peripheral vision. No, you keep turning until they are in the center of your vision. It's not just being polite, it's a natural reflex. This same reflex applies to acquiring targets in real combat. When you notice an enemy, you don't use your peripheral vision to verify your target and fire, you turn your whole head until the enemy is in the center of your view and then fire. This translates easily to first-person shooters. The targeting reticle is in the center of the screen not just because it's the center of the screen, but also because it's the center of your view.

This reflex also applies in videogames. Say in a console first-person shooter an enemy appears on screen. Even if you're more than 4ft away from the screen, odds are your first reaction isn't to focus on the enemy using your peripheral vision. Your first reaction is to turn your head (perhaps only slightly) to look at that enemy. Then you push a joystick to start turning the camera. Once the enemy is in the center of your vision, you stop turning your head and let go of the joystick. There's no reason the Natal device can't use that reflex to its advantage.

Imagine the same scenario, except when you on reflex start to turn your head towards the enemy, the game's camera turns with you simultaneously. Once the enemy is in the center of the screen, your head (and the camera) stop turning. The motion is the same natural reflex you used before, the only difference is the enemy is in the center of your vision faster because the camera turned with you. If the system's too sensitive, you merely change the size and shape of the face-tracking "dead zone".

(NOTE: This does not have anything to do with aiming a weapon. Your hands should control that.)

How the Natal Device Works

Popular Science published an article on how Natal calculates your body's position in 3D space. It can be found HERE.

Step 1: As you stand in front of the camera, it judges the distance to different points on your body. In the image on the far left, the dots show what it sees, a so-called "point cloud" representing a 3-D surface; a skeleton drawn there is simply a rudimentary guess. (The image on the top shows the image perceived by the color camera, which can be used like a webcam.)

Step 2: Then the brain guesses which parts of your body are which. It does this based on all of its experience with body poses—the experience described above. Depending on how similar your pose is to things it's seen before, Natal can be more or less confident of its guesses. In the color-coded person above [bottom center], the darkness, lightness, and size of different squares represent how certain Natal is that it knows what body-part that area belongs to. (For example, the three large red squares indicate that it’s highly probable that those parts are “left shoulder,” “left elbow” and “left knee"; as the pixels become smaller and muddier in color, such as the grayish pixels around the hands, that’s an indication that Natal is hedging its bets and isn’t very sure of its identity.)

Step 3: Then, based on the probabilities assigned to different areas, Natal comes up with all possible skeletons that could fit with those body parts. (This step isn't shown in the image above, but it looks similar to the stick-figure drawn on the left, except there are dozens of possible skeletons overlaid on each other.) It ultimately settles on the most probable one. Its reasoning here is partly based on its experience, and partly on more formal kinematics models that programmers added in.

Step 4: Once Natal has determined it has enough certainty about enough body parts to pick the most probable skeletal structure, it outputs that shape to a simplified 3D avatar [image at right]. That’s the final skeleton that will be skinned with clothes, hair, and other features and shown in the game.

Step 5: Then it does this all over again — 30 times a second! As you move, the brain generates all possible skeletal structures at each frame, eventually deciding on, and outputting, the one that is most probable. This thought process takes just a few milliseconds, so there's plenty of time for the Xbox to take the info and use it to control the game.

-- Jill Duffy, PopSci.com, 2010-01-07
Two issues immediately jump to mind.

1. The 3D-box primitives for the hands are probably as precise as we're going to get for Natal. That means no hand gestures being recognized by the game as was previously rumored (keep in mind there's no reason the flat RGB images of your hands couldn't be transmitted live or analyzed by image recognition software).

2. Natal has no idea what orientation your palm is relative to your arm, it only perceives your hand as a cube attached to the arm block. It does not seem to care what rotation your hand is relative to your arm. This means we probably won't be fighting with swords, pistols, or any other one-handed precision weapons (Two-handed weapons can still be used because of the "invisible line" that can be drawn between the player's fists by the depth sensor).

Natal Controller Update

As I advised in part one of this preliminary analysis, Project Natal needs a controller to solve some apparently inherent design problems. The following is an update to the "ideal Natal controller":
Given the new information about how Natal works, it should be a...
  • one handed controller
  • with a tilt-sensor,
  • a joystick,
  • rumble, and
  • two buttons (at least).

Why one-handed?

Project Natal's tagline is "No controller required". We should not interpret this to mean "No controller required ever." There are certain things you just can't do without a controller. That said, if we're going to use a controller, it needs to be one-handed and lightweight.

There's a rumor that Microsoft expects developers to create games that use both the Natal device and the standard Xbox 360 gamepad. Nintendo learned the hard way that you do not tell people to use a motion control system without physically tying any controllers involved to the player. What's worse here is that a fair percentage of Xbox360 owners are playing their games on expensive HDTVs. Don't believe it's a problem? Check out WiiHaveAProblem.com The Xbox360 gamepad doesn't even have a loop for a wrist strap, so the notion that we're going to be swinging around a heavy and bulky Xbox360 controller is ridiculous.

Why a tilt-sensor?

As I mentioned previously the system only perceives your hand as a fixed cube attached to the arm block. The human wrist however can rotate in any direction. A one-handed controller with a basic tilt-sensor (no more complicated than the one in the PS3 controller) would tell the system how your wrist is rotated relative to your arm.

LOOKING BACK: The Wii game "Zelda: Twilight Princess" included in its settings screens a process for enhancing pointer accuracy by telling the system the physical size and location of your TV screen relative to the Wii sensor bar. This same process can be used by the Natal device (which already knows where your body is in 3D space) to calculate where your TV is relative to your body.

How exactly would it use this information? Knowing the orientation of your wrist relative to your arm would allow the system to know for example where you're pointing a sword, or what angle you're pointing a pistol relative to your arm. Knowing both where the pistol is (in your hand) and what direction it's pointing (20° down from your arm) will allow the system to draw an imaginary line between your hand and where on-screen the pistol is pointing.

Why a joystick?

I've already described in great detail in part one of this preliminary analysis why Natal's controller needs a joystick. Essentially, the Natal device on its own has no way to allow the the player move or strafe controls without using awkward foot position or hand position systems. For some genres they work (such as racing games, which only care about going forward or backward) but for many they won't. It's that simple.

Why rumble?

Over a year ago, I wrote about a "rumble-based haptic interface" on my design blog:
In an ordinary videogame, everything you can interact with is either assigned to a hotkey or visible on-screen.

In Full Motion, that isn't always be the case because not everything you can interact with is always on-screen.

So, why not have the controller rumble when you "mouse over" an object not visible on screen? It could even rumble a bit more when you "select" the object. Both of these would serve as a sort of tactile "screen" for object selection.

-- me, FullMotion.info, 2009-02-11
LOOKING BACK: When I originally posted what's quoted above, it seemed as if the Wii MotionPlus would feature one-to-one motion tracking (IIRC, Nintendo even stated such at E308). The Wii MotionPlus is out now, and no games have been released (not even by Nintendo) at time of writing that indicate that the Wii MotionPlus is capable of even temporary one-to-one motion tracking. The only time we've ever seen proof that the device can conceivably pull it off was in a video from July 2008 by AiLive, who developed the device.

Any motion control system featuring one-to-one motion tracking can take advantage of a rumble-based "sense of touch", including Project Natal.

Why two buttons?

If the Natal device can't distinguish fingers, we'll need some way to interact with the game world with more dexterity than that of a teddy bear.

Using Natal, Natural User Interfaces can be designed based on the most basic tactile functions we learn as infants. First we learned the sense of touch, then we learned how to grab things, then eventually we learned how to interact with objects we grabbed.

We can bind these elementary functions to a game controller.
  • "touch" = rumble
  • "grab" = button 1
  • "interact" = button 2
Just like the above elementary tactile functions cover all actions you could do in real life, game controller bindings of them can cover all actions you could do in a game.

LOOKING BACK: An example of how a game might use this "touch-grab-interact" system from back when I thought the Wii would be capable of Full Motion can be found HERE.

2 comments:

  1. So basically Nata-err Kinetic should have been the Playstation Move?
    ReplyDelete
  2. @Casey Dean

    No, The PS Move has already shot itself in the foot in that regard (which I'll do a complete write up of after I see Sony's E310 showing).

    Rather, Natal (now "Kinect") merely needs in addition to its existing abilities the ability to determine the orientation of the player's wrists. The most efficient way to do this is via a hand-held controller. Additionally, if we're going to need a controller, we might as well put a few buttons on it (preferably as few as reasonably possible).
    ReplyDelete