Video Frame Synthesis Combining Conventional and Event Cameras
Abstract
Event cameras are biologically-inspired sensors that gather the temporal evolution of the scene. They capture pixel-wise brightness variations and output a corresponding stream of asynchronous events. Despite having multiple advantages with respect to conventional cameras, their use is limited due to the scarce compatibility of asynchronous event streams with traditional data processing and vision algorithms. In this regard, we present a framework that synthesizes RGB frames from the output stream of an event camera and an initial or a periodic set of color key-frames. The deep learning-based frame synthesis framework consists of an adversarial image-to-image architecture and a recurrent module. Two public event-based datasets, DDD17 and MVSEC, are used to obtain qualitative and quantitative per-pixel and perceptual results. In addition, we converted into event frames two additional well-known datasets, namely Kitti and Cityscapes, in order to present semantic results, in terms of object detection and semantic segmentation accuracy. Extensive experimental evaluation confirms the quality and the capability of the proposed approach of synthesizing frame sequences from color key-frames and sequences of intermediate events.
References
- 1. , A low power, high throughput, fully event-based stereo system, IEEE Int. Conf. Computer Vision and Pattern Recognition (2018), pp. 7532–7542. Crossref, Google Scholar
- 2. , Simultaneous optical flow and intensity estimation from an event camera, IEEE Int. Conf. Computer Vision and Pattern Recognition (2016), pp. 884–892. Crossref, Google Scholar
- 3. , Ddd17: End-to-end davis driving dataset, Workshop on Machine Learning for Autonomous Vehicles (MLAV) in ICML 2017 (2017). Google Scholar
- 4. , Face-from-depth for head pose estimation on depth images, IEEE Trans. Pattern Anal. Mach. Intell. 42(3) (2018) 596–609. Crossref, Web of Science, Google Scholar
- 5. , Driver face verification with depth maps, Sensors 19(15) (2019) 3361. Crossref, Web of Science, Google Scholar
- 6. , Mercury: A vision-based framework for driver monitoring, Int. Conf. Intelligent Human Systems Integration (Springer, 2020), pp. 104–110. Crossref, Google Scholar
- 7. , A 130db 3s latency global shutter spatiotemporal vision sensor, IEEE J. Solid-State Circuits 49(10) (2014a) 2333–2341. Crossref, Web of Science, Google Scholar
- 8. , Real-time, high-speed video decompression using a frame-and event-based davis sensor, 2014 IEEE Int. Symp. Circuits and Systems (ISCAS) (IEEE, 2014b), pp. 686–689. Crossref, Google Scholar
- 9. , The cityscapes dataset for semantic urban scene understanding, IEEE Int. Conf. Computer Vision and Pattern Recognition (2016), pp. 3213–3223. Crossref, Google Scholar
- 10. , Event-based eccentric motion detection exploiting time difference encoding, Front. Neurosci. 14 (2020) 451. Crossref, Web of Science, Google Scholar
- 11. , Integration of dynamic vision sensor with inertial measurement unit for electronically stabilized event-based vision, 2014 IEEE Int. Symp. Circuits and Systems (ISCAS) (IEEE, 2014), pp. 2636–2639. Crossref, Google Scholar
- 12. , Depth map prediction from a single image using a multi-scale deep network, Neural Information Processing Systems (2014), pp. 2366–2374. Google Scholar
- 13. , Unsupervised learning for physical interaction through video prediction, Advances in Neural Information Processing Systems (2016), pp. 64–72. Google Scholar
- 14. , Event-based, 6-dof camera tracking from photometric depth maps, IEEE Trans. Pattern Anal. Mach. Intell. 40(10) (2018a) 2402–2412. Crossref, Web of Science, Google Scholar
- 15. , A unifying contrast maximization framework for event cameras, with applications to motion, depth, and optical flow estimation, IEEE Int. Conf. Computer Vision and Pattern Recognition, Vol. 1 (2018b), pp. 3867–3876. Crossref, Google Scholar
- 16. , Asynchronous, photometric feature tracking using events and frames, European Conf. Computer Vision (2018), pp. 766–781. Crossref, Google Scholar
- 17. , Vision meets robotics: The kitti dataset, Int. J. Robot. Res. 32(11) (2013) 1231–1237. Crossref, Web of Science, Google Scholar
- 18. , Are we ready for autonomous driving? the kitti vision benchmark suite, IEEE Int. Conf. Computer Vision and Pattern Recognition (2012), pp. 3354–3361. Crossref, Google Scholar
- 19. , Generative adversarial nets, Neural Information Processing Systems (2014), pp. 2672–2680. Google Scholar
- 20. , Image-to-image translation with conditional adversarial networks, IEEE Int. Conf. Computer Vision and Pattern Recognition (2017), pp. 5967–5976. Crossref, Google Scholar
- 21. , Simultaneous mosaicing and tracking with an event camera, British Machine Vision Conf. (2014), pp. 566–576. Crossref, Google Scholar
- 22. D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, arXiv:abs/1412.6980. Google Scholar
- 23. , Hots: A hierarchy of event-based time-surfaces for pattern recognition, IEEE Trans. Pattern Anal. Mach. Intell. 39(7) (2017) 1346–1359. Crossref, Web of Science, Google Scholar
- 24. , A 128 120db 30mw asynchronous vision sensor that responds to relative intensity change, Solid-State Circuits Conf., 2006. ISSCC 2006. Digest of Technical Papers. IEEE Int. (IEEE, 2006), pp. 2060–2069. Crossref, Google Scholar
- 25. , Microsoft COCO: Common objects in context, European Conf. Computer Vision (Springer, 2014), pp. 740–755. Crossref, Google Scholar
- 26. , Live demonstration: Convolutional neural network driven by dynamic vision sensor playing roshambo, IEEE Int. Symp. Circuits and Systems (ISCAS) (IEEE, 2017), pp. 1–1. Crossref, Google Scholar
- 27. , Event-based vision meets deep learning on steering prediction for self-driving cars, IEEE Int. Conf. Computer Vision and Pattern Recognition (2018), pp. 5419–5427. Crossref, Google Scholar
- 28. M. Mirza and S. Osindero, Conditional generative adversarial nets, arXiv:1411.1784. Google Scholar
- 29. A. Mitrokhin, C. Fermuller, C. Parameshwara and Y. Aloimonos, Event-based moving object detection and tracking, arXiv:1803.04523. Google Scholar
- 30. , Real-time intensity-image reconstruction for event cameras using manifold regularisation, Int. J. Comput. Vis. 126(12) (2018) 1381–1393. Crossref, Web of Science, Google Scholar
- 31. , Qr-code reconstruction from event data via optimization in code subspace, The IEEE Winter Conf. Applications of Computer Vision (2020), pp. 2124–2132. Crossref, Google Scholar
- 32. , Learn to see by events: Color frame synthesis from event and rgb cameras, in Proc. Int. Joint Conf. Computer Vision, Imaging and Computer Graphics Theory and Applications (SCITEPRESS, 2020). Crossref, Google Scholar
- 33. , Video synthesis from intensity and event frames, Int. Conf. Image Analysis and Processing (Springer, 2019), pp. 313–323. Crossref, Google Scholar
- 34. , Learning to generate facial depth maps, 2018 Int. Conf. 3D Vision (3DV) (IEEE, 2018), pp. 634–642. Crossref, Google Scholar
- 35. , Boosted kernelized correlation filters for event-based face detection, in Proc. IEEE Winter Conf. Applications of Computer Vision Workshops (IEEE, 2020), pp. 155–159. Crossref, Google Scholar
- 36. , Long-term object tracking with a moving event camera, British Machine Vision Conf. (2018), pp. 241.1–241.12. Google Scholar
- 37. , Emvs: Event-based multi-view stereo, British Machine Vision Conf. (2016), pp. 63.1–63.11. Crossref, Google Scholar
- 38. , Events-to-video: Bringing modern computer vision to event cameras, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (IEEE, 2019), pp. 3857–3866. Crossref, Google Scholar
- 39. J. Redmon and A. Farhadi, YOLOv3: An incremental improvement, arXiv:1804.02767. Google Scholar
- 40. , Real-time intensity-image reconstruction for event cameras using manifold regularisation, British Machine Vision Conf. (2016), pp. 9.1–9.12. Crossref, Google Scholar
- 41. , U-net: Convolutional networks for biomedical image segmentation, Int. Conf Medical Image Computing and Computer-Assisted Intervention (Springer, 2015), pp. 234–241. Crossref, Google Scholar
- 42. , In-place activated batchnorm for memory-optimized training of DNNs, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (IEEE, 2018). Google Scholar
- 43. , Improved techniques for training gans, Neural Information Processing Systems (2016), pp. 2234–2242. Google Scholar
- 44. , Continuous-time intensity estimation using event cameras, Asian Conf. Computer Vision ACCV (2018), pp. 308–324. Google Scholar
- 45. , Fast image reconstruction with an event camera, The IEEE Winter Conf. Applications of Computer Vision (2020), pp. 156–163. Crossref, Google Scholar
- 46. , An event-driven stereo system for real-time 3-d 360 panoramic vision, IEEE Trans. Indust. Electron. 63(1) (2015a) 418–428. Crossref, Web of Science, Google Scholar
- 47. , Event-driven stereo matching for real-time 3d panoramic vision, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (IEEE, 2015b), pp. 466–474. Crossref, Google Scholar
- 48. , Event-based tone mapping for asynchronous time-based image sensor, Front. Neurosci. 10 (2016) 391. Crossref, Web of Science, Google Scholar
- 49. , Sparsity invariant cnns, Int. Conf. 3D Vision (3DV) (2017), pp. 11–20. Crossref, Google Scholar
- 50. , Eventsr: From asynchronous events to image reconstruction, restoration, and super-resolution via end-to-end adversarial learning, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (IEEE, 2020), pp. 8315–8325. Crossref, Google Scholar
- 51. , Image quality assessment: From error visibility to structural similarity, IEEE Trans. Image Process. 13(4) (2004) 600–612. Crossref, Web of Science, Google Scholar
- 52. , Convolutional lstm network: A machine learning approach for precipitation nowcasting, Neural Information Processing Systems (2015), pp. 802–810. Google Scholar
- 53. , The unreasonable effectiveness of deep features as a perceptual metric, in Proc. 2018 IEEE Conf. Computer Vision and Pattern Recognition (IEEE, 2018), pp. 586–595. Crossref, Google Scholar
- 54. , Semi-dense 3d reconstruction with a stereo event camera, European Conf. Computer Vision (2018), pp. 242–258. Crossref, Google Scholar
- 55. , The multivehicle stereo event camera dataset: An event camera dataset for 3d perception, IEEE Robot. Autom. Lett. 3(3) (2018) 2032–2039. Crossref, Google Scholar
- 56. , Unsupervised event-based learning of optical flow, depth, and egomotion, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (IEEE, 2019), pp. 989–997. Crossref, Google Scholar