Kinect Fundamentals #3: Getting distance-data from the Depth Sensor

Now that you know how to use the RGB Camera data, it’s time to take a look at how you can use the depth data from the Kinect sensor.

It’s quite similar to getting data from the RGB image, but instead of RGB values, you have distance data. We will convert the distance into an image representing the depth map.

Initializing the Kinect Sensor

Not much new here since tutorial #2. In InitializeKinect() we enable the DepthStream and and listen to the DepthFrameReady event instead of ColorStream and ColorFrameReady as we did when getting the RGB image in tutorial #2.

private bool InitializeKinect()
{
    kinectSensor.DepthStream.Enable(DepthImageFormat.Resolution640x480Fps30);
    kinectSensor.DepthFrameReady += new EventHandler<DepthImageFrameReadyEventArgs>(kinectSensor_DepthFrameReady);

    try
    {
        kinectSensor.Start();
    }
    catch
    {
        connectedStatus = "Unable to start the Kinect Sensor";
        return false;
    }
    return true;
}

Here, we open the DepthStream and tell the sensor that we want to have depth data with the resolution of 640×480. We also create an event handler that kicks in everytime the Kinect got some data ready for us.

Converting the depth data

Now is the time for the meat of this tutorial. Here we get the depth data from the device in millimeter, and convert it into a distance we can use for displaying a black and white map of the depth. The Kinect device got a range from 0.85m to 4m (Xbox, the PC-version can see closer and further). We can use this knowledge to create a black and white image where each pixel is the distance from the camera. We might also get some unknown depth pixels if the rays are hitting a window, shadow, mirror and so on (these will have the distance of 0).

First of all, we grab the captured DepthImageFrame from the device. Then we copy this data and convert the depth frame into a 32bit format that we can use as the source for our pixels. The ConverDepthFrame function convert’s the 16-bit grayscale depth frame that the Kinect captured into a 32-bit image frame. This function was copied from the Kinect for Windows Sample that came with the SDK.

Let’s take a look at the code.

void kinectSensor_DepthFrameReady(object sender, DepthImageFrameReadyEventArgs e)
{
    using (DepthImageFrame depthImageFrame = e.OpenDepthImageFrame())
    {
        if (depthImageFrame != null)
        {
            short[] pixelsFromFrame = new short[depthImageFrame.PixelDataLength];

            depthImageFrame.CopyPixelDataTo(pixelsFromFrame);
            byte[] convertedPixels = ConvertDepthFrame(pixelsFromFrame, ((KinectSensor)sender).DepthStream, 640 * 480 * 4);

            Color[] color = new Color[depthImageFrame.Height * depthImageFrame.Width];
            kinectRGBVideo = new Texture2D(graphics.GraphicsDevice, depthImageFrame.Width, depthImageFrame.Height);

            // Set convertedPixels from the DepthImageFrame to a the datasource for our Texture2D
            kinectRGBVideo.SetData<byte>(convertedPixels);
        }
    }
}

Notice that we didn’t manually create a Color-array as we did in the previous tutorial. You could have used this method instead of the Color-array method in tutorial #2 as well. Just wanted to show a few ways to do this just in case you need better control. 😉

And the ConvertDepthFrame function:

// Converts a 16-bit grayscale depth frame which includes player indexes into a 32-bit frame
// that displays different players in different colors
private byte[] ConvertDepthFrame(short[] depthFrame, DepthImageStream depthStream, int depthFrame32Length)
{
    int tooNearDepth = depthStream.TooNearDepth;
    int tooFarDepth = depthStream.TooFarDepth;
    int unknownDepth = depthStream.UnknownDepth;
    byte[] depthFrame32 = new byte[depthFrame32Length];

    for (int i16 = 0, i32 = 0; i16 < depthFrame.Length && i32 < depthFrame32.Length; i16++, i32 += 4)
    {
        int player = depthFrame[i16] & DepthImageFrame.PlayerIndexBitmask;
        int realDepth = depthFrame[i16] >> DepthImageFrame.PlayerIndexBitmaskWidth;

        // transform 13-bit depth information into an 8-bit intensity appropriate
        // for display (we disregard information in most significant bit)
        byte intensity = (byte)(~(realDepth >> 4));

        if (player == 0 && realDepth == 0)
        {
            // white 
            depthFrame32[i32 + RedIndex] = 255;
            depthFrame32[i32 + GreenIndex] = 255;
            depthFrame32[i32 + BlueIndex] = 255;
        }
        else if (player == 0 && realDepth == tooFarDepth)
        {
            // dark purple
            depthFrame32[i32 + RedIndex] = 66;
            depthFrame32[i32 + GreenIndex] = 0;
            depthFrame32[i32 + BlueIndex] = 66;
        }
        else if (player == 0 && realDepth == unknownDepth)
        {
            // dark brown
            depthFrame32[i32 + RedIndex] = 66;
            depthFrame32[i32 + GreenIndex] = 66;
            depthFrame32[i32 + BlueIndex] = 33;
        }
        else
        {
            // tint the intensity by dividing by per-player values
            depthFrame32[i32 + RedIndex] = (byte)(intensity >> IntensityShiftByPlayerR[player]);
            depthFrame32[i32 + GreenIndex] = (byte)(intensity >> IntensityShiftByPlayerG[player]);
            depthFrame32[i32 + BlueIndex] = (byte)(intensity >> IntensityShiftByPlayerB[player]);
        }
    }

    return depthFrame32;
}

What this function does is to convert the 16-bit format to a usable 32-bit format. It takes the near and far depth, and also the unknown depth (mirrors, shiny surfaces and so on) and calculates the correct color based on the distance.

This function requires a few variables. You can change the function so these are defined within if you want.

// color divisors for tinting depth pixels
private static readonly int[] IntensityShiftByPlayerR = { 1, 2, 0, 2, 0, 0, 2, 0 };
private static readonly int[] IntensityShiftByPlayerG = { 1, 2, 2, 0, 2, 0, 0, 1 };
private static readonly int[] IntensityShiftByPlayerB = { 1, 0, 2, 2, 0, 2, 0, 2 };

private const int RedIndex = 2;
private const int GreenIndex = 1;
private const int BlueIndex = 0;

Rendering

Last we render the texture using a sprite batch.

protected override void Draw(GameTime gameTime)
{
    GraphicsDevice.Clear(Color.CornflowerBlue);

    spriteBatch.Begin();
    spriteBatch.Draw(kinectRGBVideo, new Rectangle(0, 0, 640, 480), Color.White);
    spriteBatch.Draw(overlay, new Rectangle(0, 0, 640, 480), Color.White);
    spriteBatch.DrawString(font, connectedStatus, new Vector2(20, 80), Color.White);
    spriteBatch.End();

    base.Draw(gameTime);
}

The result is something similar to this:

Download: Source (XNA 4.0 + Kinect for Windows SDK 1.0)

The entire source can be seen below:

using System;
using System.Collections.Generic;
using System.Linq;
using Microsoft.Xna.Framework;
using Microsoft.Xna.Framework.Audio;
using Microsoft.Xna.Framework.Content;
using Microsoft.Xna.Framework.GamerServices;
using Microsoft.Xna.Framework.Graphics;
using Microsoft.Xna.Framework.Input;
using Microsoft.Xna.Framework.Media;
using Microsoft.Kinect;

namespace KinectFundamentals
{
    /// <summary>
    /// This is the main type for your game
    /// </summary>
    public class Game1 : Microsoft.Xna.Framework.Game
    {
        GraphicsDeviceManager graphics;
        SpriteBatch spriteBatch;

        Texture2D kinectRGBVideo;
        Texture2D overlay;

        // color divisors for tinting depth pixels
        private static readonly int[] IntensityShiftByPlayerR = { 1, 2, 0, 2, 0, 0, 2, 0 };
        private static readonly int[] IntensityShiftByPlayerG = { 1, 2, 2, 0, 2, 0, 0, 1 };
        private static readonly int[] IntensityShiftByPlayerB = { 1, 0, 2, 2, 0, 2, 0, 2 };

        private const int RedIndex = 2;
        private const int GreenIndex = 1;
        private const int BlueIndex = 0;
        //private byte[] depthFrame32 = new byte[640 * 480 * 4];

        KinectSensor kinectSensor;

        SpriteFont font;

        string connectedStatus = "Not connected";

        public Game1()
        {
            graphics = new GraphicsDeviceManager(this);
            Content.RootDirectory = "Content";

            graphics.PreferredBackBufferWidth = 640;
            graphics.PreferredBackBufferHeight = 480;

        }

        void KinectSensors_StatusChanged(object sender, StatusChangedEventArgs e)
        {
            if (this.kinectSensor == e.Sensor)
            {
                if (e.Status == KinectStatus.Disconnected ||
                    e.Status == KinectStatus.NotPowered)
                {
                    this.kinectSensor = null;
                    this.DiscoverKinectSensor();
                }
            }
        }

        private bool InitializeKinect()
        {
            kinectSensor.DepthStream.Enable(DepthImageFormat.Resolution640x480Fps30);
            kinectSensor.DepthFrameReady += new EventHandler<DepthImageFrameReadyEventArgs>(kinectSensor_DepthFrameReady);

            try
            {
                kinectSensor.Start();
            }
            catch
            {
                connectedStatus = "Unable to start the Kinect Sensor";
                return false;
            }
            return true;
        }

        void kinectSensor_DepthFrameReady(object sender, DepthImageFrameReadyEventArgs e)
        {
            using (DepthImageFrame depthImageFrame = e.OpenDepthImageFrame())
            {
                if (depthImageFrame != null)
                {
                    short[] pixelsFromFrame = new short[depthImageFrame.PixelDataLength];

                    depthImageFrame.CopyPixelDataTo(pixelsFromFrame);
                    byte[] convertedPixels = ConvertDepthFrame(pixelsFromFrame, ((KinectSensor)sender).DepthStream, 640 * 480 * 4);

                    Color[] color = new Color[depthImageFrame.Height * depthImageFrame.Width];
                    kinectRGBVideo = new Texture2D(graphics.GraphicsDevice, depthImageFrame.Width, depthImageFrame.Height);

                    // Set convertedPixels from the DepthImageFrame to a the datasource for our Texture2D
                    kinectRGBVideo.SetData<byte>(convertedPixels);
                }
            }
        }

        // Converts a 16-bit grayscale depth frame which includes player indexes into a 32-bit frame
        // that displays different players in different colors
        private byte[] ConvertDepthFrame(short[] depthFrame, DepthImageStream depthStream, int depthFrame32Length)
        {
            int tooNearDepth = depthStream.TooNearDepth;
            int tooFarDepth = depthStream.TooFarDepth;
            int unknownDepth = depthStream.UnknownDepth;
            byte[] depthFrame32 = new byte[depthFrame32Length];

            for (int i16 = 0, i32 = 0; i16 < depthFrame.Length && i32 < depthFrame32.Length; i16++, i32 += 4)
            {
                int player = depthFrame[i16] & DepthImageFrame.PlayerIndexBitmask;
                int realDepth = depthFrame[i16] >> DepthImageFrame.PlayerIndexBitmaskWidth;

                // transform 13-bit depth information into an 8-bit intensity appropriate
                // for display (we disregard information in most significant bit)
                byte intensity = (byte)(~(realDepth >> 4));

                if (player == 0 && realDepth == 0)
                {
                    // white 
                    depthFrame32[i32 + RedIndex] = 255;
                    depthFrame32[i32 + GreenIndex] = 255;
                    depthFrame32[i32 + BlueIndex] = 255;
                }
                else if (player == 0 && realDepth == tooFarDepth)
                {
                    // dark purple
                    depthFrame32[i32 + RedIndex] = 66;
                    depthFrame32[i32 + GreenIndex] = 0;
                    depthFrame32[i32 + BlueIndex] = 66;
                }
                else if (player == 0 && realDepth == unknownDepth)
                {
                    // dark brown
                    depthFrame32[i32 + RedIndex] = 66;
                    depthFrame32[i32 + GreenIndex] = 66;
                    depthFrame32[i32 + BlueIndex] = 33;
                }
                else
                {
                    // tint the intensity by dividing by per-player values
                    depthFrame32[i32 + RedIndex] = (byte)(intensity >> IntensityShiftByPlayerR[player]);
                    depthFrame32[i32 + GreenIndex] = (byte)(intensity >> IntensityShiftByPlayerG[player]);
                    depthFrame32[i32 + BlueIndex] = (byte)(intensity >> IntensityShiftByPlayerB[player]);
                }
            }

            return depthFrame32;
        }

        private void DiscoverKinectSensor()
        {
            foreach (KinectSensor sensor in KinectSensor.KinectSensors)
            {
                if (sensor.Status == KinectStatus.Connected)
                {
                    // Found one, set our sensor to this
                    kinectSensor = sensor;
                    break;
                }
            }

            if (this.kinectSensor == null)
            {
                connectedStatus = "Found none Kinect Sensors connected to USB";
                return;
            }

            // You can use the kinectSensor.Status to check for status
            // and give the user some kind of feedback
            switch (kinectSensor.Status)
            {
                case KinectStatus.Connected:
                    {
                        connectedStatus = "Status: Connected";
                        break;
                    }
                case KinectStatus.Disconnected:
                    {
                        connectedStatus = "Status: Disconnected";
                        break;
                    }
                case KinectStatus.NotPowered:
                    {
                        connectedStatus = "Status: Connect the power";
                        break;
                    }
                default:
                    {
                        connectedStatus = "Status: Error";
                        break;
                    }
            }

            // Init the found and connected device
            if (kinectSensor.Status == KinectStatus.Connected)
            {
                InitializeKinect();
            }
        }

        protected override void Initialize()
        {
            KinectSensor.KinectSensors.StatusChanged += new EventHandler<StatusChangedEventArgs>(KinectSensors_StatusChanged);
            DiscoverKinectSensor();

            base.Initialize();
        }

        protected override void LoadContent()
        {
            spriteBatch = new SpriteBatch(GraphicsDevice);

            kinectRGBVideo = new Texture2D(GraphicsDevice, 1337, 1337);

            overlay = Content.Load<Texture2D>("overlay");
            font = Content.Load<SpriteFont>("SpriteFont1");
        }

        protected override void UnloadContent()
        {
            kinectSensor.Stop();
            kinectSensor.Dispose();
        }

        protected override void Update(GameTime gameTime)
        {
            if (GamePad.GetState(PlayerIndex.One).Buttons.Back == ButtonState.Pressed)
                this.Exit();

            base.Update(gameTime);
        }

        protected override void Draw(GameTime gameTime)
        {
            GraphicsDevice.Clear(Color.CornflowerBlue);

            spriteBatch.Begin();
            spriteBatch.Draw(kinectRGBVideo, new Rectangle(0, 0, 640, 480), Color.White);
            spriteBatch.Draw(overlay, new Rectangle(0, 0, 640, 480), Color.White);
            spriteBatch.DrawString(font, connectedStatus, new Vector2(20, 80), Color.White);
            spriteBatch.End();

            base.Draw(gameTime);
        }
    }
}