World Scientific
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×
Our website is made possible by displaying certain online content using javascript.
In order to view the full content, please disable your ad blocker or whitelist our website www.worldscientific.com.

System Upgrade on Mon, Jun 21st, 2021 at 1am (EDT)

During this period, the E-commerce and registration of new users may not be available for up to 6 hours.
For online purchase, please visit us again. Contact us at [email protected] for any enquiries.

Impact of Labeling Schemes on Dense Crowd Counting Using Convolutional Neural Networks with Multiscale Upsampling

    This article is part of the issue:

    Gatherings of thousands to millions of people frequently occur for an enormous variety of educational, social, sporting, and political events, and automated counting of these high-density crowds is useful for safety, management, and measuring significance of an event. In this work, we show that the regularly accepted labeling scheme of crowd density maps for training deep neural networks may not be the most effective one. We propose an alternative inverse k-nearest neighbor (ikNN) map mechanism that, even when used directly in existing state-of-the-art network structures, shows superior performance. We also provide new network architecture mechanisms that we demonstrate in our own MUD-ikNN network architecture, which uses multi-scale drop-in replacement upsampling via transposed convolutions to take full advantage of the provided ikNN labeling. This upsampling combined with the ikNN maps further improves crowd counting accuracy. We further analyze several variations of the ikNN labeling mechanism, which apply transformations on the kNN measure before generating the map, in order to consider the impact of camera perspective views, image resolutions, and the changing rates of the mapping functions. To alleviate the effects of crowd density changes in each image, we also introduce an attenuation mechanism in the ikNN mapping. Experimentally, we show that inverse square root kNN map variation (iRkNN) provides the best performance. Discussions are provided on computational complexity, label resolutions, the gains in mapping and upsampling, and details of critical cases such as various crowd counts, uneven crowd densities, and crowd occlusions.

    References

    • 1. C. Arteta, V. Lempitsky and A. Zisserman, Counting in the wild, in European Conf. Computer Vision (Amsterdam, The Netherlands, 2016), pp. 483–498. CrossrefGoogle Scholar
    • 2. V. Badrinarayanan, A. Kendall and R. Cipolla, Segnet: A deep convolutional encoder-decoder architecture for image segmentation, IEEE Trans. Pattern Anal. Mach. Intell. 39(12) (2017) 2481–2495. Crossref, ISIGoogle Scholar
    • 3. X. Cao, Z. Wang, Y. Zhao and F. Su, Scale aggregation network for accurate and efficient crowd counting, in Proc. European Conf. Computer Vision (ECCV) (Munich, Germany, 2018), pp. 734–750. CrossrefGoogle Scholar
    • 4. A. B. Chan, Z.-S. J. Liang and N. Vasconcelos, Privacy preserving crowd monitoring: Counting people without people models or tracking, in Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conf. (Anchorage, Alaska, USA, 2008), pp. 1–7. CrossrefGoogle Scholar
    • 5. K. Chen, S. Gong, T. Xiang and C. Change Loy, Cumulative attribute space for age and crowd density estimation, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (Portland, Oregon, USA, 2013), pp. 2467–2474. CrossrefGoogle Scholar
    • 6. K. Chen, C. C. Loy, S. Gong and T. Xiang, Feature mining for localised crowd counting, in BMVC, Vol. 1 (2012), p. 3. CrossrefGoogle Scholar
    • 7. K. He, X. Zhang, S. Ren and J. Sun, Deep residual learning for image recognition, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (Las Vegas, Nevada, USA, 2016), pp. 770–778. CrossrefGoogle Scholar
    • 8. M. Hossain, M. Hosseinzadeh, O. Chanda and Y. Wang, Crowd counting using scale-aware attention networks, in 2019 IEEE Winter Conf. Applications of Computer Vision (WACV) (Waikoloa Village, Hawaii, USA, 2019), pp. 1280–1288. CrossrefGoogle Scholar
    • 9. G. Huang, Z. Liu, L. Van Der Maaten and K. Q. Weinberger, Densely connected convolutional networks, in CVPR, Vol. 1 (2017), p. 3. CrossrefGoogle Scholar
    • 10. H. Idrees, I. Saleemi, C. Seibert and M. Shah, Multi-source multi-scale counting in extremely dense crowd images, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (Portland, Oregon, USA, 2013), pp. 2547–2554. CrossrefGoogle Scholar
    • 11. H. Idrees, M. Tayyab, K. Athrey, D. Zhang, S. Al-Maadeed, N. Rajpoot and M. Shah, Composition loss for counting, density map estimation and localization in dense crowds, in Proc. European Conf. Computer Vision (ECCV) (Munich, Germany, 2018), pp. 532–546. CrossrefGoogle Scholar
    • 12. X. Jiang, L. Zhang, M. Xu, T. Zhang, P. Lv, B. Zhou, X. Yang and Y. Pang, Attention scaling for crowd counting, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (Seattle, Washington, USA, 2020), pp. 4706–4715. CrossrefGoogle Scholar
    • 13. I. H. Laradji, N. Rostamzadeh, P. O. Pinheiro, D. Vazquez and M. Schmidt, Where are the blobs: Counting by localization with point supervision, in Proc. European Conf. Computer Vision (ECCV) (Munich, Germany, 2018), pp. 547–562. CrossrefGoogle Scholar
    • 14. V. Lempitsky and A. Zisserman, Learning to count objects in images, in Advances in Neural Information Processing Systems (2010), pp. 1324–1332. Google Scholar
    • 15. Y. Li, X. Zhang and D. Chen, CSRNet: Dilated convolutional neural networks for understanding the highly congested scenes, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (Salt Lake City, Utah, USA, 2018), pp. 1091–1100. CrossrefGoogle Scholar
    • 16. Z. Lin and L. S. Davis, Shape-based human detection and segmentation via hierarchical part-template matching, IEEE Trans. Pattern Anal. Mach. Intell. 32(4) (2010) 604–618. Crossref, ISIGoogle Scholar
    • 17. V. Ranjan, H. Le and M. Hoai, Iterative crowd counting, in Proc. European Conf. Computer Vision (ECCV) (Munich, Germany, 2018), pp. 270–285. CrossrefGoogle Scholar
    • 18. D. B. Sam, S. Surya and R. V. Babu, Switching convolutional neural network for crowd counting, in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Vol. 1 (Honolulu, Hawaii, USA, 2017), p. 6. CrossrefGoogle Scholar
    • 19. Z. Shen, Y. Xu, B. Ni, M. Wang, J. Hu and X. Yang, Crowd counting via adversarial cross-scale consistency pursuit, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (Salt Lake City, Utah, USA, 2018), pp. 5245–5254. CrossrefGoogle Scholar
    • 20. Z. Shi, L. Zhang, Y. Liu, X. Cao, Y. Ye, M.-M. Cheng and G. Zheng, Crowd counting with deep negative correlation learning, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (Salt Lake City, Utah, USA, 2018), pp. 5382–5390. CrossrefGoogle Scholar
    • 21. V. A. Sindagi and V. M. Patel, CNN-based cascaded multi-task learning of high-level prior and density estimation for crowd counting, in Advanced Video and Signal Based Surveillance (AVSS), 2017 14th IEEE Int. Conf. (Lecce, Italy, 2017), pp. 1–6. CrossrefGoogle Scholar
    • 22. M. Wang and X. Wang, Automatic adaptation of a generic pedestrian detector to a specific traffic scene, in Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conf. (Colorado Springs, Colorado, USA, 2011), pp. 3401–3408. CrossrefGoogle Scholar
    • 23. B. Wu and R. Nevatia, Detection of multiple, partially occluded humans in a single image by Bayesian combination of edgelet part detectors, in Tenth IEEE Int. Conf. Computer Vision (ICCV’05) (Beijing, China, 2005), pp. 90–97. Google Scholar
    • 24. M. D. Zeiler, D. Krishnan, G. W. Taylor and R. Fergus, Deconvolutional networks, in 2010 IEEE Computer Society Conf. Computer Vision and Pattern Recognition (Sanfransico, California, USA, 2010), pp. 2528–2535. CrossrefGoogle Scholar
    • 25. C. Zhang, H. Li, X. Wang and X. Yang, Cross-scene crowd counting via deep convolutional neural networks, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (Boston, Massachusetts, USA, 2015), pp. 833–841. CrossrefGoogle Scholar
    • 26. A. Zhang, J. Shen, Z. Xiao, F. Zhu, X. Zhen, X. Cao and L. Shao, Relational attention network for crowd counting, in Proc. IEEE Int. Conf. Computer Vision (Seoul, Korea, 2019), pp. 6788–6797. CrossrefGoogle Scholar
    • 27. Y. Zhang, D. Zhou, S. Chen, S. Gao and Y. Ma, Single-image crowd counting via multi-column convolutional neural network, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2016), pp. 589–597. CrossrefGoogle Scholar
    Published: 15 September 2021