When you useiPhoneWhen taking a photo, click on the new 3D photo option in the status update editor, select the portrait mode photo, and then Facebook will calculate the depth of the scene using AI, so that the 2D photo can also be moved to present a multi-angle scene.
These 3D photos can be experienced in the Facebook real-time stream of information on the desktop or mobile app. When you slide the stream, these photos look the same as regular photos, and when you click into the photo, they immediately break through the flat 2D, like a small window, pull you into the world of 3D photos. . Just like the gif above, drag the photo and you can see the side of the dog and the rock behind it.
The technology for creating deep images with AI comes from Facebook's computational photography department. The papers on this technology were completed by Peter Hedman of University College London and Johannes Kopf, research scientist at the Seattle office in Seattle. The paper was in Vancouver in August this year. The exhibition was held on Siggraph.
In fact, many giants are developing techniques for calculating image depth, but currently they are only used in portrait mode to blur the background.
However, Facebook's technology path starts with VR. After acquiring Oculus for $3 billion, Facebook has devoted VR to creating a new social empire that connects virtual reality and reality. Putting the real world into VR is a key task. The previous implementation was mainly 360-degree panorama.videoOr a 360-degree panoramic video with 3D effects. However, the content of these panoramic images is only a large-scale 2D image, and does not have the depth and realism of the real world.
6Dof video and better light field video are the solutions that everyone expects, and Facebook has been exploring these areas. At the F8 conference in May 2017, Facebook released two 3D VR panoramic cameras capable of recording in-depth information, Facebook 360 Surround x24 and x6. In September of this year, Facebook released Manifold, a professional-level panoramic camera.
However, the cost of 6Dof video and light field video production is extremely high. At present, the development of VR is difficult to attract a large number of content creators. Facebook has chosen a more convenient and more democratic route - 3D photos.
Although Facebook's deep image technology is called 3D video, it is not the same as the 3D effect of familiar 3D movies. Ordinary 3D is just a pseudo 3D effect achieved by binocular time difference. In the movie theater, no matter how you turn around, the stereo image you see has only one angle, and Facebook's 3D photo is characterized by being able to drag and view different angles. information.
Having said that, back to the point, how did Facebook implement this technology?
In the beginning, Facebook adopted the method of letting the user shoot with a single camera, capturing the entire scene by moving, and then analyzing the parallax and the movement of the mobile phone, that is, the data of objects moving at different distances while the camera is moving, can be accurately reconstructed. Out of the 3D scene.
However, inferring depth data from a fast image of a single camera is computationally demanding on the CPU, which is somewhat outdated when most phones have dual cameras. By capturing images with two cameras at the same time, parallax differences can be observed even for objects in motion. And because the devices of the two lenses are in exactly the same position, the noise of the depth data is much smaller and the computational requirements are much lower.
Facebook is currently using the iPhone's dual camera. The two cameras of the phone capture a pair of images, and the device immediately calculates the “depth map” and calculates the calculated distance of all the content in the image encoding frame. The result looks like this:
Apple, Samsung,HuaweiGoogle is studying depth image technology, but it is mainly used in the background blur of photos. The technical difficulty of the depth map is that the depth map created does not have an absolute ratio. For example, when dark red means 100 feet, light yellow does not mean 10 feet. The ratio of each photo is different, which means you need to take multiple photos to get the actual distance of the object, but it is a pain to stitch these photos together.
This is the problem faced by Kopf and Hedman and colleagues. In their system, users capture multiple images of their surroundings by moving their mobile phones; it captures one image per second (technically two images and a resulting depth map) and begins to add it to its collection. In the background, the algorithm looks at the depth map and tiny motion of the camera captured by the mobile motion detection system. The depth map is then essentially massaged into the correct shape to align with other photos.
Once the depth map is created, the depth map is converted to a 3D mesh (this is more abstract and can be thought of as a paper version of the landscape). Then, check the sharp edges of the mesh and tear the edges apart, such as the railing in the foreground obscuring the landscape in the background, separating the two. This separates the various objects so they appear to be at different depths and move as the perspective changes. Although these methods create a 3D effect, you may have guessed that the foreground looks like a paper cutout, because if it is a direct shot, there is no head or back on them.
Then, the last step is very important. The convolutional neural network is used to guess and fill the rest of the image. For example, if there is hair in the image area, the hair may continue to exist. Therefore, it can convincingly reconstruct these textures and estimate the shape of the object to reduce the gap. So when you change the perspective slightly, you seem to really look around the object.
Facebook is currently able to complete the creation of depth maps in a second, so they call this "instant 3D photography." Of course, the 3D photo function can only be used on two cameras. The first feature is iPhone 7+, 8+, X or XS, and more phones will be added.
In the paper, Facebook also discussed how to use a convolutional neural network to make a single camera camera have the same function. However, a single camera is not as good as a dual camera system. This also requires the continued efforts of algorithms and software companies. Just as a few days ago, Google used a single camera to achieve multiple computing effects beyond Apple's dual camera.