First published on 14 January, 2013.
Motion Mapper is an application built using Esri’s ArcGIS Runtime for WPF and Microsoft’s Kinect for Window'SDK. The application uses Kinect’s audio and motion recognition to interact with the map and exploit Landsat satellite imagery without the use of a keyboard or mouse.
The source code is available here.
The video embedded in this post shows a person gesturing and speaking to a desktop mapping application. The text within the black banner represents voice commands available to the user. Below is a detailed description of the operations being performed by the operator in the video (spoken commands in bold):
- The user activates the pan tool and navigates from the Middle East to Europe by pointing in the intended direction of travel,
- The user activates the zoom tool and moves his hands away from the screen to zoom out.
Pointing directly at the screen with either (or both) hands will zoom in.
- The user displays the bookmark menu and then zooms to the Dubai preset extent.
- The user activates the swipe tool and selects the year 2005. As his hands move across the screen, Landsat imagery from 2005 clearly shows the impressive Palm Jebel Ali and Palm Jumeira archipelagos.
- Then the user selects 2000 to reveal that these engineering marvels did not exist five years earlier!
- The user zooms out to a smaller scale and activates the Landsat tool that commences a download of all individual Landsat scenes that overlap the map display. Details about each image appear in the upper left hand corner of the screen whenever his hand hovers over an image. Information boxes are colored blue and yellow to represent images selected with the left and right hands respectively.
- The rotate tool is activated so that the map can be pivoted in three dimensions revealing the chronological order of imagery. Older imagery is located at the bottom close to the map and newer imagery is located near the top.
- Lastly, the user places his hand over a single image and says open to view the image at full resolution. The image is traversed using the same panning technique described in (1) above.
Just over a year ago we published an add-in for ArcGlobe that allowed a user to navigate in three dimensions using hand gestures. When observing other people using this app we quickly realized that the hand and arm rules were too complicated and clearly not as intuitive as they could be. Based on these observations and recommendations from Microsoft we researched alternative techniques of Kinect integration.
Inspired by Netflix and other apps for the Xbox 360 gaming console we decided that speech was the key to compartmentalizing mapping tools. Rather than using complicated gestures to differentiate between mapping operations we choose to use speech to switch between panning, zooming and other tools. Overall this meant that hand gesturing could be much simpler but at the cost of a slightly more time consuming experience.
The Kinect sensor features a directional four microphone audio array, ideal for noise cancellation. Within our offices, speech recognition works very well but we have yet to test its proficiency in a noisy environment such as an exhibition hall at a large a conference.
The stacked temporal view of Landsat Imagery is achieved using WPF’s Viewport3D and Esri’s Map hosted in a Viewport2DVisual3D visual. This works well with no significant performance degradation but coding in three dimensional space is considerably more difficult than 2D! One must define texture coordinates, vertex mapping and odd things like ambient lighting. Something that needs additional work is better management of 2D scaling of the map in the 3D viewport.
In summary, developing Kinect-based apps is both challenging and rewarding. Challenging because Microsoft technology does not natively support “motion”. Developers must interpret and present raw video, depth and skeleton feeds for themselves. A developer’s job would be a lot easier if Microsoft extended the Kinect SDK to support fundamental gestures like “swipe left” and include fingers in the skeleton model. It is unlikely our trusted keyboard and mouse will be redundant anytime soon but it is very rewarding to experiment with technology that may augment our lives in the near future.