English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  Items with full text/Total items : 93144/123516 (75%)
Visitors : 27616023      Online Users : 236
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    政大機構典藏 > 理學院 > 資訊科學系 > 學位論文 >  Item 140.119/130078
    Please use this identifier to cite or link to this item: http://nccur.lib.nccu.edu.tw/handle/140.119/130078


    Title: 使用深度學習於RGB-D影像之無人飛行載具避障模型
    Collision Avoidance Based on RGB-D Images in Unmanned Aerial Vehicles Using Deep Learning Techniques
    Authors: 林宗賢
    Lin, Tsung-Hsien
    Contributors: 廖文宏
    Liao, Wen-Hung
    林宗賢
    Lin, Tsung-Hsien
    Keywords: 無人機
    避障
    深度學習
    RGB-D影像
    UAV
    Obstacle avoidance
    Deep learning
    RGB-D image
    Date: 2020
    Issue Date: 2020-06-02 11:12:29 (UTC+8)
    Abstract: 無人機的相關應用越來越廣泛,從原本的國防領域,逐漸被推廣到商業、農業和救災等領域上,使人們的生活日趨便利,在這些應用當中,避障是一個不可或缺的功能,然而使用人為操控的方式無法大規模普及,因此本研究以RGB-D影像與深度學習為基礎,分別為沒有搭載深度攝影機的無人機和有搭載深度攝影機的無人機,提出自動避障的方法。

    對於沒有搭載深度攝影機的無人機,本研究從開放的碰撞資料集,使用深度估計模型預測出對應的深度資訊,透過深度資訊在彩色影像中分割出危險、安全等區域,並使用即時語義分割模型進行訓練,將從彩色影像中預測出來的區域分布,透過我們提出的避障機制,使無人機找到一個合適的避障方向。

    對於搭載深度攝影機的無人機,本研究使用即時語義分割模型和分群演算法,得到物體的類別和位置資訊,接著使用路徑規劃演算法幫助無人機找出最佳的避障路徑。

    本研究所訓練的深度學習模型可以在嵌入式裝置上進行推論,因此我們提出的避障方法將可應用於運算資源有限的無人機。
    UAV applications have been extended from the defense sector to commercial, agricultural and disaster relief in recent years. Obstacle avoidance is an essential component for UAV navigation. However, manual manipulation of UAVs is costly in terms of training and human resources. In the thesis, we propose automatic obstacle avoidance mechanisms for UAVs without depth sensors and UAVs with a depth camera based on deep learning techniques.
    For UAVs not equipped with depth sensors, we employ depth estimation models to compute depth maps from 2D images. The depth information is then used to partition an image into dangerous and safe zones by a real-time semantic segmentation model. Given the zone distribution, the UAV can determine a suitable obstacle avoidance direction to guarantee a collision-free flight.
    For UAVs with a depth camera, we combine semantic segmentation model and clustering algorithm to obtain the class and location of the obstacles. We then apply path planning algorithm to construct the optimal obstacle avoidance path.
    All the deep learning models employed in this work meet the requirement of being able to perform inference on embedded systems efficiently. This will ensure the proposed obstacle avoidance algorithms to work on UAVs with limited computing resources.
    Reference: [1] ImageNet. http://www.image-net.org/, last visited on Dec 2018.
    [2] ImageNet Large Scale Visual Recognition Competition (ILSVRC). http://www.image-net.org/challenges/LSVRC/, last visited on Dec 2018.
    [3] Warren S. McCulloch, Walter H. Pitts. A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics 5, 115-133, 1943.
    [4] Rosenblatt F. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review 65(6), 386-408, 1958.
    [5] Rumelhart, D. E., Hinton, G. E., Williams, R. J. Learning representations by back-propagating errors. Nature 323, 533–536, 1986.
    [6] Michael Nielsen. Neural Networks and Deep Learning. http://neuralnetworksanddeeplearning.com/index.html. Last visited on Dec 2018.
    [7] Yann LeCun, Corinna Cortes, Christopher J.C. Burges. THE MNIST DATABASE of handwritten digits. http://yann.lecun.com/exdb/mnist/, last visited on Dec 2018
    [8] Yuanqing Lin, Fengjun Lv, Shenghuo Zhu, Ming Yang, Timothee Cour, Kai Yu, Liangliang Cao, Thomas Huang. Large-scale image classification: Fast feature extraction and SVM training. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1689-1696, 2011.
    [9]Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in neural information processing systems, pages 1097-1105, 2012.
    [10] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich. Going Deeper with Convolutions. arXiv:1409.4842v1, 2014.
    [11] Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun. Deep Residual Learning for Image Recognition. IEEE, pages 770-778, 2016.
    [12] D. H. Hubel and T. N. Wiesel, “Receptive fields of single neurones in the cat’s striate cortex,”J. Physiol. London 148, 574–591 (1959).
    [13] F. Chollet. Xception: Deep learning with depth wise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
    [14] B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le. Learning transferable architectures for scalable image recognition. arXiv:1707.07012, 2017.
    [15] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pages 4510–4520, 2018.
    [16] Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, et al. Searching for MobileNetV3. arXiv:1905.02244, 2019.
    [17] Keras Documentation. https://keras.io/applications/, last visited on Feb 2020.
    [18] Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. Mobilenets: Efficient convolutional neural networks for mobile vision applications. CoRR, abs/1704.04861, 2017.
    [19] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the Inception architecture for computer vision. arXiv:1512.00567, 2015.
    [20] B. Zoph and Q. V. Le. Neural architecture search with reinforcement learning. In International Conference on Learning Representations(ICLR), 2017.
    [21] CIFAR-10. https://www.cs.toronto.edu/~kriz/cifar.html, last visited on Dec 2019.
    [22] J. Hu, L. Shen, and G. Sun. Squeeze-and-excitation networks. arXiv:1709.01507, 2017.
    [23] Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, and Nong Sang. Bisenet: Bilateral segmentation network for real-time semantic segmentation. arXiv:1808.00897, 2018.
    [24] Ping Chao, Chao-Yang Kao, Yu-Shan Ruan, Chien-Hsiang Huang, and Youn-Long Lin. HarDNet: A low memory traffic network. Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2019.
    [25] Real-Time Semantic Segmentation on Cityscapes test. https://paperswithcode.com/sota/real-time-semantic-segmentation-on-cityscapes/, last visited on Feb 2020.
    [26] Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolutional networks for semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3431-3440, 2015.
    [27] A. Loquercio, A. I. Maqueda, C. R. del-Blanco, and D. Scaramuzza. Dronet: Learning to fly by driving. IEEE Robotics and Automation Letters 3, 1088-1095, 2018.
    [28] Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. J. Machine Learning Res. 15, 1929–1958, 2014.
    [29] Glorot, X., Bordes, A., Bengio. Y. Deep sparse rectifier neural networks. Proc. 14th International Conference on Artificial Intelligence and Statistics 315–323, 2011.
    [30] Udacity. An Open Source Self-Driving Car. https://www.udacity.com/self-driving-car, 2016. Last visited on Dec 2018.
    [31] A. Giusti, J. Guzzi, D. C. Cirean, F. L. He, J. P. Rodrguez, F. Fontana, M. Faessler, C. Forster, J. Schmidhuber, G. D. Caro, D. Scaramuzza, and L. M. Gambardella. A machine learning approach to visual perception of forest trails for mobile robots. IEEE Robotics and Automation Letters, 2016.
    [32] K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556, 2014.
    [33] Zhengqi Li, Noah Snavely. MegaDepth: Learning Single-View Depth Prediction from Internet Photos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
    [34] W. Chen, Z. Fu, D. Yang, J. Deng. Single-image depth perception in the wild. Neural Information Processing Systems, pages 730–738, 2016.
    [35] J. L. Schonberger, J.-M. Frahm. Structure-from-motion revisited. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4104–4113, 2016.
    [36] J. L. Schonberger, E. Zheng, J.-M. Frahm, M. Pollefeys. Pixelwise view selection for unstructured multi-view stereo. In Proc. European Conf. on Computer Vision (ECCV), pages 501–518, 2016.
    [37] H. Zhao, J. Shi, X. Qi, X. Wang, J. Jia. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
    [38] B. Zhou, H. Zhao, X. Puig, S. Fidler, A. Barriuso, A. Torralba. Scene parsing through ade20k dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
    [39] D. Eigen, R. Fergus. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In Proc. Int. Conf. on Computer Vision (ICCV), pages 2650–2658, 2015.
    [40] I. Laina, C. Rupprecht, V. Belagiannis, F. Tombari, N. Navab. Deeper depth prediction with fully convolutional residual networks. In Int. Conf. on 3D Vision (3DV), pages 239–248, 2016.
    [41] D. Eigen, C. Puhrsch, R. Fergus. Depth map prediction from a single image using a multi-scale deep network. In Neural Information Processing Systems, pages 2366–2374, 2014.
    [42] A. Saxena, S. H. Chung, A. Y. Ng. Learning depth from single monocular images. In Neural Information Processing Systems, volume 18, pages 1–8, 2005.
    [43] C. Godard, O. Mac Aodha, G. J. Brostow. Unsupervised monocular depth estimation with left-right consistency. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
    [44] Geiger, Andreas, Lenz, Philip, Stiller, Christoph, and Urtasun, Raquel. Vision meets robotics: The KITTI dataset. International Journal of Robotics Research, 32(11), 2013.
    [45] R. P. Mihail, S. Workman, Z. Bessinger, and N. Jacobs. Sky segmentation in the wild: An empirical study. In Proceedings of IEEE Winter Conference on Applications of Computer Vision(WACV), pages 1–6, 2016.
    [46] Cordts, Marius, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, Bernt Schiele. The Cityscapes Dataset for Semantic Urban Scene Understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
    [47] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam. Encoder-decoder with atrous separable convolution for semantic image segmentation. In ECCV, 2018.
    [48] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman. The Pascal Visual Object Classes (VOC) Challenge. IJCV, pages 303–338, 2010.
    [49] D. Pelleg and A. Moore. X-means: Extending k-means with efficient estimation of the number of clusters. In International Conference on Machine Learning, pages 727–734, 2000.
    [50] Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, Dhruv Batra. Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. arXiv:1610.02391, 2016.
    [51] B. Zhou, A. Khosla, L. A., A. Oliva, and A. Torralba. Learning Deep Features for Discriminative Localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
    Description: 碩士
    國立政治大學
    資訊科學系
    106753008
    Source URI: http://thesis.lib.nccu.edu.tw/record/#G0106753008
    Data Type: thesis
    DOI: 10.6814/NCCU202000432
    Appears in Collections:[資訊科學系] 學位論文

    Files in This Item:

    File Description SizeFormat
    300801.pdf6573KbAdobe PDF0View/Open


    All items in 政大典藏 are protected by copyright, with all rights reserved.


    社群 sharing

    著作權政策宣告
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback