政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/107005
English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  Items with full text/Total items : 111314/142224 (78%)
Visitors : 48346530      Online Users : 619
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/107005


    Title: 貸款違約預測:使用Spark平台分析P2P借貸資料
    Loan default prediction:analyzing P2P lending data on the spark platform
    Authors: 林博仁
    Lin, Bo Ren
    Contributors: 胡毓忠
    Hu, Yuh Jong
    林博仁
    Lin, Bo Ren
    Keywords: 點對點
    借貸
    預測
    P2P
    Lending
    Prediction
    Date: 2017
    Issue Date: 2017-03-01 17:38:14 (UTC+8)
    Abstract: 由於FinTech數位金融的快速崛起,金融相關業務逐漸由線上申辦取代傳統作業。在借貸方面,銀行為了降低呆帳風險,要求融資方必須提供足夠抵押擔保品,而融資方往往因為無擔保品而求救無門,其中包含信用歷史優良的客戶,因此P2P借貸平台為此需求而誕生。本研究探討如何在大數據Spark分析平台上使用Scikit Learning的程式庫來進行自動化機器學習流程,並以優化的角度來進行P2P借貸模型特徵值篩選以及參數和超級參數的最佳化,因而提高預測還款鑑定力。本研究分析資料集是引用美國上市公司Lending Club公開資料,以投資方的角度來分析融資方歷年的借貸資料,從中篩選特徵值,並利用隨機樹演算法結合自動化機器學習流程來完成分析模型的訓練與測試。我們提供預測信用良好的借貸者給投資方參考,並由投資方根據自身的資金狀態從中選擇合適投資的融資方,進而達成精準預測融資方是否還款的目標。
    In the rapidly rise FinTech era, traditional financial-related business is gradually replaced by online digital finance. From a new loan, the bank always requires a borrower to provide certain amount of collateral for risk reduction. However, a borrower sometimes cannot meet this requirement, even with a good credit history. A P2P lending platform is created for solving this problem. This study investigates the issue for how to proceed automated machine learning pipeline through P2P lending model’s features selection with parameter and hyper-parameter optimization. By using Scikit Learning libraries on the big data analytics Spark platform, we can predict who are borrowers with good credits. We apply Random Forest machine learning algorithm in the automated machine learning pipeline to analyze the Lending Club open datasets from a lender perspective. A predicted list of high credit borrowers is available for investors to select to achieve high loan return rate.
    Reference: 【1】 Kent D. Lee, et al. (2011). Python Programming Fundamentals, Springer London Dordrecht Heidelberg, New York, 45-190.
    【2】 Ian J. Galloway. (2009). Peer-to-Peer Lending and Community Development Finance, Bank of San Francisco, 3-15.
    【3】 Kevin Sheppard. (2014). Introduction to Python for Econometrics, Statistics and Data Analysis, Kevin Sheppard, University of Oxford, 171-201.
    【4】 David Donoho. (2015) . 50 years of Data Science, Tukey Centennial workshop, Princeton NJ, 4-9, 29-37.
    【5】 Andy Liaw and Matthew Wiener. (2002). Classification and Regression by RandomForest, R News ISSN 1609-3631, 19-20.
    【6】 Milad Malekipirbazari, Vural Aksakalli. (2015). Risk assessment in social lending via random forests, Expert Systems with Applications 4621–4631, 4-11.
    【7】 M. I. Jordan and T. M. Mitchell. (2015). Machine learning: Trends, perspectives, and prospects, SCIENCE VOL 349 ISSUE 6245, 2-7.
    【8】 Loren Hansen, et al. (2009). Controlling Feature Selection in Random Forests of Decision Trees Using a Genetic Algorithm: Classification of Class I MHC Peptides, Bentham Science Publishers Ltd, 6-7.
    【9】 Amir E. Khandaniy, et al. (2010). Consumer Credit Risk Models via Machine-Learning Algorithms, Journal of Banking & Finance 34, 47-48.
    【10】 JIAN Zhi- gang and JIN Xu. (2004). Research on Data Preprocess in Data Mining and Its Application, Beijing University, 3-4.
    【11】 Martin Sewell. (2007). Machine Learning, University College London, 2-4.
    【12】 Jehad Ali1, et al. (2012). Random Forests and Decision Trees, IJCSI International Journal of Computer Science Issues, 2-6.
    【13】 Oleg Okun and Helen Priisalu. (2007), Random Forest for Gene Expression Based Cancer Classification: Overlooked Issues, University of Oulu and Tallinn University of Technology, 2-7.
    【14】 Jesse Davis, et al. (2006). The Relationship Between Precision-Recall and ROC Curves, University of Wisconsin-Madison, 2-7.
    【15】 Andrew P and Bradley. (1997), The Use of the Area Under the ROC Curve in the Evaluation of Machine Learning, Pattern Recognition, 2-9, 16-31.
    【16】 Prof. William H. Press. (2008). Computational Statistics with Application to Bioinformatics, The University of Texas at Austin, 2-12.
    【17】 Tom Fawcett. (2005). An introduction to ROC analysis, Pattern Recognition, 2-13.
    【18】 Xiangrui Meng, et al. (2016). MLlib: Machine Learning in Apache Spark, Journal of Machine Learning Research 17, 4-5.
    【19】 Fabian Pedregosa. (2011). Scikit-learn: Machine Learning in Python, Journal of Machine Learning Research 12, 2-5.
    【20】 Shunpo Chang, et al. (2015-2016). Predicting Default Risk of Lending Club Loans, CS229: Machine Learning, 3-5. 
    【21】 Riza Emekter, et al. (2013). Evaluating the Credit Risk in Online Peer-to-Peer (P2P) Lending, Robert Morris University, 19.
    【22】 Riza Emekter, et al. (2015). Evaluating credit risk and loan performance in online Peer-to-Peer (P2P) lending, Robert Morris University, 69.
    【23】 Don Carmichael. (2014). Modeling Default for Peer-to-Peer Loans, University of Houston - C.T. Bauer College of Business, 21.
    【24】 Freedman S M, Jin G Z. (2010). Learning by Doing with Asymmetric Information: Evidence from Prosper.com, University of Michigan, Maryland & NBER, 28.
    【25】 Alexander B, Alexander B, Daniel B. (2011). Online Peer-to-Peer Lending - A Literature Review. Journal of Internet Banking and Commerce, 14.
    【26】 Manuel Fernández-Delgado, Eva Cernadas, Senén Barro, Dinani Amorim. (2014). Do we Need Hundreds of Classifiers to Solve Real World Classification Problems? , Journal of Machine Learning Research 15(Oct):3133−3181, 43.
    【27】 Determinants of Default in P2P Lending. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0139427
    【28】 Matplotlib API. http://matplotlib.org/api/index.html
    【29】 Lending Club Statistics - Lending Club. https://www.lendingclub.com/info/download-data.action
    【30】 Apache Spark submitting-applications. http://spark.apache.org/docs/latest/submitting-applications.html
    【31】 Apache Spark Python API doc. http://spark.apache.org/docs/latest/api/python/index.html
    Description: 碩士
    國立政治大學
    資訊科學系碩士在職專班
    103971012
    Source URI: http://thesis.lib.nccu.edu.tw/record/#G0103971012
    Data Type: thesis
    Appears in Collections:[Executive Master Program of Computer Science of NCCU] Theses

    Files in This Item:

    File SizeFormat
    101201.pdf1648KbAdobe PDF2152View/Open


    All items in 政大典藏 are protected by copyright, with all rights reserved.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback