World Scientific
  • Search
  •   
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at [email protected] for any enquiries.

Video Frame Synthesis Combining Conventional and Event Cameras

    https://doi.org/10.1142/S0218001421600132Cited by:1 (Source: Crossref)
    This article is part of the issue:

    Event cameras are biologically-inspired sensors that gather the temporal evolution of the scene. They capture pixel-wise brightness variations and output a corresponding stream of asynchronous events. Despite having multiple advantages with respect to conventional cameras, their use is limited due to the scarce compatibility of asynchronous event streams with traditional data processing and vision algorithms. In this regard, we present a framework that synthesizes RGB frames from the output stream of an event camera and an initial or a periodic set of color key-frames. The deep learning-based frame synthesis framework consists of an adversarial image-to-image architecture and a recurrent module. Two public event-based datasets, DDD17 and MVSEC, are used to obtain qualitative and quantitative per-pixel and perceptual results. In addition, we converted into event frames two additional well-known datasets, namely Kitti and Cityscapes, in order to present semantic results, in terms of object detection and semantic segmentation accuracy. Extensive experimental evaluation confirms the quality and the capability of the proposed approach of synthesizing frame sequences from color key-frames and sequences of intermediate events.

    References

    • 1. A. Andreopoulos, H. J. Kashyap, T. K. Nayak, A. Amir and M. D. Flickner, A low power, high throughput, fully event-based stereo system, IEEE Int. Conf. Computer Vision and Pattern Recognition (2018), pp. 7532–7542. CrossrefGoogle Scholar
    • 2. P. Bardow, A. J. Davison and S. Leutenegger, Simultaneous optical flow and intensity estimation from an event camera, IEEE Int. Conf. Computer Vision and Pattern Recognition (2016), pp. 884–892. CrossrefGoogle Scholar
    • 3. J. Binas, D. Neil, S.-C. Liu and T. Delbruck, Ddd17: End-to-end davis driving dataset, Workshop on Machine Learning for Autonomous Vehicles (MLAV) in ICML 2017 (2017). Google Scholar
    • 4. G. Borghi, M. Fabbri, R. Vezzani, S. Calderara and R. Cucchiara, Face-from-depth for head pose estimation on depth images, IEEE Trans. Pattern Anal. Mach. Intell. 42(3) (2018) 596–609. Crossref, Web of ScienceGoogle Scholar
    • 5. G. Borghi, S. Pini, R. Vezzani and R. Cucchiara, Driver face verification with depth maps, Sensors 19(15) (2019) 3361. Crossref, Web of ScienceGoogle Scholar
    • 6. G. Borghi, S. Pini, R. Vezzani and R. Cucchiara, Mercury: A vision-based framework for driver monitoring, Int. Conf. Intelligent Human Systems Integration (Springer, 2020), pp. 104–110. CrossrefGoogle Scholar
    • 7. C. Brandli, R. Berner, M. Yang, S.-C. Liu and T. Delbruck, A 240×180 130db 3μs latency global shutter spatiotemporal vision sensor, IEEE J. Solid-State Circuits 49(10) (2014a) 2333–2341. Crossref, Web of ScienceGoogle Scholar
    • 8. C. Brandli, L. Muller and T. Delbruck, Real-time, high-speed video decompression using a frame-and event-based davis sensor, 2014 IEEE Int. Symp. Circuits and Systems (ISCAS) (IEEE, 2014b), pp. 686–689. CrossrefGoogle Scholar
    • 9. M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth and B. Schiele, The cityscapes dataset for semantic urban scene understanding, IEEE Int. Conf. Computer Vision and Pattern Recognition (2016), pp. 3213–3223. CrossrefGoogle Scholar
    • 10. G. D’Angelo, E. Janotte, T. Schoepe, J. O’Keeffe, M. B. Milde, E. Chicca and C. Bartolozzi, Event-based eccentric motion detection exploiting time difference encoding, Front. Neurosci. 14 (2020) 451. Crossref, Web of ScienceGoogle Scholar
    • 11. T. Delbruck, V. Villanueva and L. Longinotti, Integration of dynamic vision sensor with inertial measurement unit for electronically stabilized event-based vision, 2014 IEEE Int. Symp. Circuits and Systems (ISCAS) (IEEE, 2014), pp. 2636–2639. CrossrefGoogle Scholar
    • 12. D. Eigen, C. Puhrsch and R. Fergus, Depth map prediction from a single image using a multi-scale deep network, Neural Information Processing Systems (2014), pp. 2366–2374. Google Scholar
    • 13. C. Finn, I. Goodfellow and S. Levine, Unsupervised learning for physical interaction through video prediction, Advances in Neural Information Processing Systems (2016), pp. 64–72. Google Scholar
    • 14. G. Gallego, J. E. Lund, E. Mueggler, H. Rebecq, T. Delbruck and D. Scaramuzza, Event-based, 6-dof camera tracking from photometric depth maps, IEEE Trans. Pattern Anal. Mach. Intell. 40(10) (2018a) 2402–2412. Crossref, Web of ScienceGoogle Scholar
    • 15. G. Gallego, H. Rebecq and D. Scaramuzza, A unifying contrast maximization framework for event cameras, with applications to motion, depth, and optical flow estimation, IEEE Int. Conf. Computer Vision and Pattern Recognition, Vol. 1 (2018b), pp. 3867–3876. CrossrefGoogle Scholar
    • 16. D. Gehrig, H. Rebecq, G. Gallego and D. Scaramuzza, Asynchronous, photometric feature tracking using events and frames, European Conf. Computer Vision (2018), pp. 766–781. CrossrefGoogle Scholar
    • 17. A. Geiger, P. Lenz, C. Stiller and R. Urtasun, Vision meets robotics: The kitti dataset, Int. J. Robot. Res. 32(11) (2013) 1231–1237. Crossref, Web of ScienceGoogle Scholar
    • 18. A. Geiger, P. Lenz and R. Urtasun, Are we ready for autonomous driving? the kitti vision benchmark suite, IEEE Int. Conf. Computer Vision and Pattern Recognition (2012), pp. 3354–3361. CrossrefGoogle Scholar
    • 19. I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville and Y. Bengio, Generative adversarial nets, Neural Information Processing Systems (2014), pp. 2672–2680. Google Scholar
    • 20. P. Isola, J.-Y. Zhu, T. Zhou and A. A. Efros, Image-to-image translation with conditional adversarial networks, IEEE Int. Conf. Computer Vision and Pattern Recognition (2017), pp. 5967–5976. CrossrefGoogle Scholar
    • 21. H. Kim, A. Handa, R. Benosman, S. Ieng and A. J. Davison, Simultaneous mosaicing and tracking with an event camera, British Machine Vision Conf. (2014), pp. 566–576. CrossrefGoogle Scholar
    • 22. D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, arXiv:abs/1412.6980. Google Scholar
    • 23. X. Lagorce, G. Orchard, F. Galluppi, B. E. Shi and R. B. Benosman, Hots: A hierarchy of event-based time-surfaces for pattern recognition, IEEE Trans. Pattern Anal. Mach. Intell. 39(7) (2017) 1346–1359. Crossref, Web of ScienceGoogle Scholar
    • 24. P. Lichtsteiner, C. Posch and T. Delbruck, A 128 ×128 120db 30mw asynchronous vision sensor that responds to relative intensity change, Solid-State Circuits Conf., 2006. ISSCC 2006. Digest of Technical Papers. IEEE Int. (IEEE, 2006), pp. 2060–2069. CrossrefGoogle Scholar
    • 25. T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár and C. L. Zitnick, Microsoft COCO: Common objects in context, European Conf. Computer Vision (Springer, 2014), pp. 740–755. CrossrefGoogle Scholar
    • 26. I.-A. Lungu, F. Corradi and T. Delbrück, Live demonstration: Convolutional neural network driven by dynamic vision sensor playing roshambo, IEEE Int. Symp. Circuits and Systems (ISCAS) (IEEE, 2017), pp. 1–1. CrossrefGoogle Scholar
    • 27. A. I. Maqueda, A. Loquercio, G. Gallego, N. Garcıa and D. Scaramuzza, Event-based vision meets deep learning on steering prediction for self-driving cars, IEEE Int. Conf. Computer Vision and Pattern Recognition (2018), pp. 5419–5427. CrossrefGoogle Scholar
    • 28. M. Mirza and S. Osindero, Conditional generative adversarial nets, arXiv:1411.1784. Google Scholar
    • 29. A. Mitrokhin, C. Fermuller, C. Parameshwara and Y. Aloimonos, Event-based moving object detection and tracking, arXiv:1803.04523. Google Scholar
    • 30. G. Munda, C. Reinbacher and T. Pock, Real-time intensity-image reconstruction for event cameras using manifold regularisation, Int. J. Comput. Vis. 126(12) (2018) 1381–1393. Crossref, Web of ScienceGoogle Scholar
    • 31. J. Nagata, Y. Sekikawa, K. Hara, T. Suzuki and Y. Aoki, Qr-code reconstruction from event data via optimization in code subspace, The IEEE Winter Conf. Applications of Computer Vision (2020), pp. 2124–2132. CrossrefGoogle Scholar
    • 32. S. Pini, G. Borghi and R. Vezzani, Learn to see by events: Color frame synthesis from event and rgb cameras, in Proc. Int. Joint Conf. Computer Vision, Imaging and Computer Graphics Theory and Applications (SCITEPRESS, 2020). CrossrefGoogle Scholar
    • 33. S. Pini, G. Borghi, R. Vezzani and R. Cucchiara, Video synthesis from intensity and event frames, Int. Conf. Image Analysis and Processing (Springer, 2019), pp. 313–323. CrossrefGoogle Scholar
    • 34. S. Pini, F. Grazioli, G. Borghi, R. Vezzani and R. Cucchiara, Learning to generate facial depth maps, 2018 Int. Conf. 3D Vision (3DV) (IEEE, 2018), pp. 634–642. CrossrefGoogle Scholar
    • 35. B. Ramesh and H. Yang, Boosted kernelized correlation filters for event-based face detection, in Proc. IEEE Winter Conf. Applications of Computer Vision Workshops (IEEE, 2020), pp. 155–159. CrossrefGoogle Scholar
    • 36. B. Ramesh, S. Zhang, Z. W. Lee, Z. Gao, G. Orchard and C. Xiang, Long-term object tracking with a moving event camera, British Machine Vision Conf. (2018), pp. 241.1–241.12. Google Scholar
    • 37. H. Rebecq, G. Gallego and D. Scaramuzza, Emvs: Event-based multi-view stereo, British Machine Vision Conf. (2016), pp. 63.1–63.11. CrossrefGoogle Scholar
    • 38. H. Rebecq, R. Ranftl, V. Koltun and D. Scaramuzza, Events-to-video: Bringing modern computer vision to event cameras, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (IEEE, 2019), pp. 3857–3866. CrossrefGoogle Scholar
    • 39. J. Redmon and A. Farhadi, YOLOv3: An incremental improvement, arXiv:1804.02767. Google Scholar
    • 40. C. Reinbacher, G. Graber and T. Pock, Real-time intensity-image reconstruction for event cameras using manifold regularisation, British Machine Vision Conf. (2016), pp. 9.1–9.12. CrossrefGoogle Scholar
    • 41. O. Ronneberger, P. Fischer and T. Brox, U-net: Convolutional networks for biomedical image segmentation, Int. Conf Medical Image Computing and Computer-Assisted Intervention (Springer, 2015), pp. 234–241. CrossrefGoogle Scholar
    • 42. S. Rota Bulò, L. Porzi and P. Kontschieder, In-place activated batchnorm for memory-optimized training of DNNs, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (IEEE, 2018). Google Scholar
    • 43. T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford and X. Chen, Improved techniques for training gans, Neural Information Processing Systems (2016), pp. 2234–2242. Google Scholar
    • 44. C. Scheerlinck, N. Barnes and R. Mahony, Continuous-time intensity estimation using event cameras, Asian Conf. Computer Vision ACCV (2018), pp. 308–324. Google Scholar
    • 45. C. Scheerlinck, H. Rebecq, D. Gehrig, N. Barnes, R. Mahony and D. Scaramuzza, Fast image reconstruction with an event camera, The IEEE Winter Conf. Applications of Computer Vision (2020), pp. 156–163. CrossrefGoogle Scholar
    • 46. S. Schraml, A. N. Belbachir and H. Bischof, An event-driven stereo system for real-time 3-d 360 panoramic vision, IEEE Trans. Indust. Electron. 63(1) (2015a) 418–428. Crossref, Web of ScienceGoogle Scholar
    • 47. S. Schraml, A. Nabil Belbachir and H. Bischof, Event-driven stereo matching for real-time 3d panoramic vision, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (IEEE, 2015b), pp. 466–474. CrossrefGoogle Scholar
    • 48. C. Simon Chane, S.-H. Ieng, C. Posch and R. B. Benosman, Event-based tone mapping for asynchronous time-based image sensor, Front. Neurosci. 10 (2016) 391. Crossref, Web of ScienceGoogle Scholar
    • 49. J. Uhrig, N. Schneider, L. Schneider, U. Franke, T. Brox and A. Geiger, Sparsity invariant cnns, Int. Conf. 3D Vision (3DV) (2017), pp. 11–20. CrossrefGoogle Scholar
    • 50. L. Wang, T.-K. Kim and K.-J. Yoon, Eventsr: From asynchronous events to image reconstruction, restoration, and super-resolution via end-to-end adversarial learning, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (IEEE, 2020), pp. 8315–8325. CrossrefGoogle Scholar
    • 51. Z. Wang, A. C. Bovik, H. R. Sheikh and E. P. Simoncelli, Image quality assessment: From error visibility to structural similarity, IEEE Trans. Image Process. 13(4) (2004) 600–612. Crossref, Web of ScienceGoogle Scholar
    • 52. S. Xingjian, Z. Chen, H. Wang, D.-Y. Yeung, W.-K. Wong and W.-C. Woo, Convolutional lstm network: A machine learning approach for precipitation nowcasting, Neural Information Processing Systems (2015), pp. 802–810. Google Scholar
    • 53. R. Zhang, P. Isola, A. A. Efros, E. Shechtman and O. Wang, The unreasonable effectiveness of deep features as a perceptual metric, in Proc. 2018 IEEE Conf. Computer Vision and Pattern Recognition (IEEE, 2018), pp. 586–595. CrossrefGoogle Scholar
    • 54. Y. Zhou, G. Gallego, H. Rebecq, L. Kneip, H. Li and D. Scaramuzza, Semi-dense 3d reconstruction with a stereo event camera, European Conf. Computer Vision (2018), pp. 242–258. CrossrefGoogle Scholar
    • 55. A. Z. Zhu, D. Thakur, T. Özaslan, B. Pfrommer, V. Kumar and K. Daniilidis, The multivehicle stereo event camera dataset: An event camera dataset for 3d perception, IEEE Robot. Autom. Lett. 3(3) (2018) 2032–2039. CrossrefGoogle Scholar
    • 56. A. Z. Zhu, L. Yuan, K. Chaney and K. Daniilidis, Unsupervised event-based learning of optical flow, depth, and egomotion, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (IEEE, 2019), pp. 989–997. CrossrefGoogle Scholar