The augmented reality SDK Vuforia has a lesser known feature that is able to recognize pre-selected three dimensional objects as-is without calibration ornaments like QR codes. I built a very simple app in Unity to test this functionality. The app uses the device’s camera (in this case an iPhone 6S Plus) to recognize a Skylander toy in real-time. Once the app recognizes a set of pixels resembling the toy, a few 3D objects are rendered over it to very crudely demonstrate tracking accuracy and reliability. There’s nothing sexy or polished here, but you can imagine this technique might enable some more interesting mobile experiences. Here’s a few:
Bring the toy to life: render animated facial features on top of the static toy and have it speak.
Introduce animated attachments + 3D UI
- If there was a reliable backdrop/diorama, “remove” the toy and replace it with an animated model.
Expose an invisible secret
See the character’s true form (i.e. An animated Pop Fizz Beast overlaying his static toy form)
Inspect a character’s vitals
Peer into a trap/CYOS vessel to see the character inside.
Place each toy on top of a special calibration sheet.
Use the Vuforia “Scanner” Android app to generate capture data from multiple angles.
Export the capture data and upload it to the Vuforia Target manager. Download the dataset as a Unity package.
Assuming your account license is setup, import the downloaded Unity package into your scene, point the data set at the file and parent virtual objects to the target object. At this stage, if everything is setup correctly, you can hit “Play” and use your PC’s webcam to verify before building to your target device.
Build + Launch!
Some take away thoughts:
- Picture quality matters.
- Quality of camera matters: Before compiling to the phone, I used the Macbook’s built in webcam to test situations. It yielded dramatically worse results. See for yourself: https://youtu.be/8NXWB7EbzZ8
- Lighting conditions, obstructions, and backgrounds also matter. Vuforia is interpreting a lot purely from optical data. It’s a lot to ask without true depth sensing: https://youtu.be/CqlbRZp7hvA
- Transparent/reflective materials confuse the scan.
- Future Tech
- Stereo cameras appear to be the next big hardware innovation in mobile phones. Sampling scenes from two perspectives will garner much more accurate recognition.
- Microsoft demonstrated a way to turn a mobile phone into a depth-sensing Kinect by adding IR lights + sensors. If a peripheral had to be added, this seems like a great way to go.
- Google’s Project Tango is already exploring the value of better machine vision in mobile cameras.
- Amazon’s Firefly feature demonstrates that object-based search is viable.
- Perhaps this method works better in conjunction with flatter QR-code like stickers designed for specific areas on the character/base?
- Each toy’s capture data is very large. A partial scan of Tree Rex with 439 points of recognition came out to be 5.1MB. This was bundled down into a 1.4MB Unity package. This is one toy.
- Information only moves one-way: from toy to app. If you modified virtual attachments or changed the character’s name, that information could not flow back to the toy without additional technology.
- Aiming a screen at an object for extended periods of time is uncomfortable. Head-mounted, Mixed Reality (MR) displays seem to be the more ultimate realization of this sort of magic.
- Possible ways to finesse the render:
- Match the render with the device’s white balance + exposure settings for better visual integration.
- Draw behind the toy by using a perfectly synchronized 3D model of the toy as a render mask.
- Use the perfectly synchronized 3D model to layer on artificial lighting/shadow effects to the toy.
- Some technical quirks
- There is a hard limit of 2 total objects able to be tracked simultaneously.
- Vuforia is not yet compatible with Unity 5.3.2. Don’t update!
- By default, Xcode expects bitcode data. Set this to false in the build settings to avoid build failure.