English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  Items with full text/Total items : 93144/123516 (75%)
Visitors : 27616023      Online Users : 236
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    政大機構典藏 > 理學院 > 資訊科學系 > 學位論文 >  Item 140.119/130078
    Please use this identifier to cite or link to this item: http://nccur.lib.nccu.edu.tw/handle/140.119/130078

    Title: 使用深度學習於RGB-D影像之無人飛行載具避障模型
    Collision Avoidance Based on RGB-D Images in Unmanned Aerial Vehicles Using Deep Learning Techniques
    Authors: 林宗賢
    Lin, Tsung-Hsien
    Contributors: 廖文宏
    Liao, Wen-Hung
    Lin, Tsung-Hsien
    Keywords: 無人機
    Obstacle avoidance
    Deep learning
    RGB-D image
    Date: 2020
    Issue Date: 2020-06-02 11:12:29 (UTC+8)
    Abstract: 無人機的相關應用越來越廣泛,從原本的國防領域,逐漸被推廣到商業、農業和救災等領域上,使人們的生活日趨便利,在這些應用當中,避障是一個不可或缺的功能,然而使用人為操控的方式無法大規模普及,因此本研究以RGB-D影像與深度學習為基礎,分別為沒有搭載深度攝影機的無人機和有搭載深度攝影機的無人機,提出自動避障的方法。



    UAV applications have been extended from the defense sector to commercial, agricultural and disaster relief in recent years. Obstacle avoidance is an essential component for UAV navigation. However, manual manipulation of UAVs is costly in terms of training and human resources. In the thesis, we propose automatic obstacle avoidance mechanisms for UAVs without depth sensors and UAVs with a depth camera based on deep learning techniques.
    For UAVs not equipped with depth sensors, we employ depth estimation models to compute depth maps from 2D images. The depth information is then used to partition an image into dangerous and safe zones by a real-time semantic segmentation model. Given the zone distribution, the UAV can determine a suitable obstacle avoidance direction to guarantee a collision-free flight.
    For UAVs with a depth camera, we combine semantic segmentation model and clustering algorithm to obtain the class and location of the obstacles. We then apply path planning algorithm to construct the optimal obstacle avoidance path.
    All the deep learning models employed in this work meet the requirement of being able to perform inference on embedded systems efficiently. This will ensure the proposed obstacle avoidance algorithms to work on UAVs with limited computing resources.
    Reference: [1] ImageNet. http://www.image-net.org/, last visited on Dec 2018.
    [2] ImageNet Large Scale Visual Recognition Competition (ILSVRC). http://www.image-net.org/challenges/LSVRC/, last visited on Dec 2018.
    [3] Warren S. McCulloch, Walter H. Pitts. A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics 5, 115-133, 1943.
    [4] Rosenblatt F. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review 65(6), 386-408, 1958.
    [5] Rumelhart, D. E., Hinton, G. E., Williams, R. J. Learning representations by back-propagating errors. Nature 323, 533–536, 1986.
    [6] Michael Nielsen. Neural Networks and Deep Learning. http://neuralnetworksanddeeplearning.com/index.html. Last visited on Dec 2018.
    [7] Yann LeCun, Corinna Cortes, Christopher J.C. Burges. THE MNIST DATABASE of handwritten digits. http://yann.lecun.com/exdb/mnist/, last visited on Dec 2018
    [8] Yuanqing Lin, Fengjun Lv, Shenghuo Zhu, Ming Yang, Timothee Cour, Kai Yu, Liangliang Cao, Thomas Huang. Large-scale image classification: Fast feature extraction and SVM training. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1689-1696, 2011.
    [9]Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in neural information processing systems, pages 1097-1105, 2012.
    [10] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich. Going Deeper with Convolutions. arXiv:1409.4842v1, 2014.
    [11] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. Deep Residual Learning for Image Recognition. IEEE, pages 770-778, 2016.
    [12] D. H. Hubel and T. N. Wiesel, “Receptive fields of single neurones in the cat’s striate cortex,”J. Physiol. London 148, 574–591 (1959).
    [13] F. Chollet. Xception: Deep learning with depth wise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
    [14] B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le. Learning transferable architectures for scalable image recognition. arXiv:1707.07012, 2017.
    [15] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pages 4510–4520, 2018.
    [16] Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, et al. Searching for MobileNetV3. arXiv:1905.02244, 2019.
    [17] Keras Documentation. https://keras.io/applications/, last visited on Feb 2020.
    [18] Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. CoRR, abs/1704.04861, 2017.
    [19] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the Inception architecture for computer vision. arXiv:1512.00567, 2015.
    [20] B. Zoph and Q. V. Le. Neural architecture search with reinforcement learning. In International Conference on Learning Representations(ICLR), 2017.
    [21] CIFAR-10. https://www.cs.toronto.edu/~kriz/cifar.html, last visited on Dec 2019.
    [22] J. Hu, L. Shen, and G. Sun. Squeeze-and-excitation networks. arXiv:1709.01507, 2017.
    [23] Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, and Nong Sang. Bisenet: Bilateral segmentation network for real-time semantic segmentation. arXiv:1808.00897, 2018.
    [24] Ping Chao, Chao-Yang Kao, Yu-Shan Ruan, Chien-Hsiang Huang, and Youn-Long Lin. HarDNet: A low memory traffic network. Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2019.
    [25] Real-Time Semantic Segmentation on Cityscapes test. https://paperswithcode.com/sota/real-time-semantic-segmentation-on-cityscapes/, last visited on Feb 2020.
    [26] Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks for semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3431-3440, 2015.
    [27] A. Loquercio, A. I. Maqueda, C. R. del-Blanco, and D. Scaramuzza. Dronet: Learning to fly by driving. IEEE Robotics and Automation Letters 3, 1088-1095, 2018.
    [28] Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. J. Machine Learning Res. 15, 1929–1958, 2014.
    [29] Glorot, X., Bordes, A., Bengio. Y. Deep sparse rectifier neural networks. Proc. 14th International Conference on Artificial Intelligence and Statistics 315–323, 2011.
    [30] Udacity. An Open Source Self-Driving Car. https://www.udacity.com/self-driving-car, 2016. Last visited on Dec 2018.
    [31] A. Giusti, J. Guzzi, D. C. Cirean, F. L. He, J. P. Rodrguez, F. Fontana, M. Faessler, C. Forster, J. Schmidhuber, G. D. Caro, D. Scaramuzza, and L. M. Gambardella. A machine learning approach to visual perception of forest trails for mobile robots. IEEE Robotics and Automation Letters, 2016.
    [32] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556, 2014.
    [33] Zhengqi Li, Noah Snavely. MegaDepth: Learning Single-View Depth Prediction from Internet Photos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
    [34] W. Chen, Z. Fu, D. Yang, J. Deng. Single-image depth perception in the wild. Neural Information Processing Systems, pages 730–738, 2016.
    [35] J. L. Schonberger, J.-M. Frahm. Structure-from-motion revisited. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4104–4113, 2016.
    [36] J. L. Schonberger, E. Zheng, J.-M. Frahm, M. Pollefeys. Pixelwise view selection for unstructured multi-view stereo. In Proc. European Conf. on Computer Vision (ECCV), pages 501–518, 2016.
    [37] H. Zhao, J. Shi, X. Qi, X. Wang, J. Jia. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
    [38] B. Zhou, H. Zhao, X. Puig, S. Fidler, A. Barriuso, A. Torralba. Scene parsing through ade20k dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
    [39] D. Eigen, R. Fergus. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In Proc. Int. Conf. on Computer Vision (ICCV), pages 2650–2658, 2015.
    [40] I. Laina, C. Rupprecht, V. Belagiannis, F. Tombari, N. Navab. Deeper depth prediction with fully convolutional residual networks. In Int. Conf. on 3D Vision (3DV), pages 239–248, 2016.
    [41] D. Eigen, C. Puhrsch, R. Fergus. Depth map prediction from a single image using a multi-scale deep network. In Neural Information Processing Systems, pages 2366–2374, 2014.
    [42] A. Saxena, S. H. Chung, A. Y. Ng. Learning depth from single monocular images. In Neural Information Processing Systems, volume 18, pages 1–8, 2005.
    [43] C. Godard, O. Mac Aodha, G. J. Brostow. Unsupervised monocular depth estimation with left-right consistency. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
    [44] Geiger, Andreas, Lenz, Philip, Stiller, Christoph, and Urtasun, Raquel. Vision meets robotics: The KITTI dataset. International Journal of Robotics Research, 32(11), 2013.
    [45] R. P. Mihail, S. Workman, Z. Bessinger, and N. Jacobs. Sky segmentation in the wild: An empirical study. In Proceedings of IEEE Winter Conference on Applications of Computer Vision(WACV), pages 1–6, 2016.
    [46] Cordts, Marius, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, Bernt Schiele. The Cityscapes Dataset for Semantic Urban Scene Understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
    [47] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam. Encoder-decoder with atrous separable convolution for semantic image segmentation. In ECCV, 2018.
    [48] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman. The Pascal Visual Object Classes (VOC) Challenge. IJCV, pages 303–338, 2010.
    [49] D. Pelleg and A. Moore. X-means: Extending k-means with efficient estimation of the number of clusters. In International Conference on Machine Learning, pages 727–734, 2000.
    [50] Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, Dhruv Batra. Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. arXiv:1610.02391, 2016.
    [51] B. Zhou, A. Khosla, L. A., A. Oliva, and A. Torralba. Learning Deep Features for Discriminative Localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
    Description: 碩士
    Source URI: http://thesis.lib.nccu.edu.tw/record/#G0106753008
    Data Type: thesis
    DOI: 10.6814/NCCU202000432
    Appears in Collections:[資訊科學系] 學位論文

    Files in This Item:

    File Description SizeFormat
    300801.pdf6573KbAdobe PDF0View/Open

    All items in 政大典藏 are protected by copyright, with all rights reserved.

    社群 sharing

    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback