English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  Items with full text/Total items : 112704/143671 (78%)
Visitors : 49721882      Online Users : 713
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    政大機構典藏 > 商學院 > 統計學系 > 學位論文 >  Item 140.119/153359
    Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/153359


    Title: 具變數選擇能力之非線性閾值迴歸模型
    Variable Selection for Nonlinear Boundary Threshold Regression Model
    Authors: 謝佑昀
    Hsieh, Yu-Yun
    Contributors: 張志浩
    Chang, Chih-Hao
    謝佑昀
    Hsieh, Yu-Yun
    Keywords: 閾值模型
    隨機森林
    變數選擇
    Threshold model
    Random Forest
    Variable selection
    Date: 2024
    Issue Date: 2024-09-04 14:55:22 (UTC+8)
    Abstract: 本研究旨在改進傳統閾值迴歸模型,提出一種可選取變數之閾值迴歸模型。傳統的閾值迴歸模型通常依賴預先選擇的關鍵共變數,但在實際應用中,這些共變數往往難以確定,尤其在面對高維度數據時更具挑戰性。為解決此問題,本研究結合隨機森林和最小絕對值收縮和選擇算子(Lasso)進行變數選擇,並能夠處理線性和非線性閾值邊界。研究方法包括設計三種模擬實驗,以評估所提出演算法的效能、預測表現及變數選擇能力。這三種模擬情境分別為:線性閾值邊界、非線性閾值邊界及高維度小樣本。在模擬實驗中,首先使用K-means進行初步分類,接著應用隨機森林找出潛在的閾值函數,最後透過Lasso選取重要變數並建立最終的迴歸模型。模擬結果顯示,本研究提出的TBR-VS演算法所建構的線性或非線性閾值邊界,在預測表現上都能提供明顯改善,且在多數情況下有高機率能選取到重要的閾值變數與迴歸變數。實證分析部分,模型應用於波士頓房價、洛杉磯臭氧污染及紐約股票財報等三個現實資料集,進一步驗證其在不同領域中的適用性。最終,本研究不僅提升了閾值迴歸模型的準確性,亦增強了其在實務資料的變數選擇能力及解釋性。
    This study aims to improve traditional threshold regression models by proposing a variable-selectable threshold regression model. Traditional models rely on pre-selected key covariates, which are often difficult to determine, especially with high-dimensional data. To address this, the research combines Random Forest and Lasso for variable selection, handling both linear and nonlinear threshold boundaries. The methodology includes three types of simulation experiments to evaluate the performance, predictive accuracy, and variable selection capability: linear threshold boundaries, nonlinear threshold boundaries, and high-dimensional small samples. In the simulations, K-means clustering is used for preliminary classification, followed by Random Forest to identify potential threshold functions, and Lasso to select important variables and establish the final regression model. Results show that the linear or nonlinear threshold boundaries constructed by the proposed TBR-VS algorithm significantly improve predictive performance and are likely to select important threshold and regression variables. For empirical analysis, the model is applied to three real-world datasets: Boston housing prices, Los Angeles ozone pollution, and New York stock financial reports, verifying its applicability in different fields. This study enhances the accuracy, variable selection capability, and interpretability of threshold regression models in practical data.
    Reference: Altmann, A., Toloşi, L., Sander, O., and Lengauer, T. (2010). Permutation importance: a corrected feature importance measure. Bioinformatics, 26(10):1340–1347.
    Breiman, L. (1996). Bagging predictors. Machine Learning, 24:123–140.
    Breiman, L. (2001). Random forests. Machine Learning, 45:5–32.
    Breiman, L., Friedman, J., Stone, C. J., and Olshen, R. (1984). Classification and Regression Trees. Taylor & Francis.
    Breiman, L. and Friedman, J. H. (1985). Estimating optimal transformations for multiple regression and correlation: Rejoinder. Journal of the American Statistical Association,
    80(391):614–619.
    Chang, C.-H., Emura, T., and Huang, S.-F. (2023). Estimation of threshold boundary regression models. In The 6th International Conference on Econometrics and Statistics.
    Dai, L., Chen, K., Sun, Z., Liu, Z., and Li, G. (2018). Broken adaptive ridge regression and its asymptotic properties. Journal of Multivariate Analysis, 168:334–351.
    Granovetter, M. (1978). Threshold models of collective behavior. American Journal of Sociology, 83(6):1420–1443.
    Harrison, D. J. and Rubinfeld, D. L. (1978). Hedonic housing prices and the demand for clean air. Journal of Environmental Economics and Management, 5(1):81–102.
    Ishwaran, H. (2015). The effect of splitting on random forests. Machine Learning, 99:75–118.
    Janitza, S., Celik, E., and Boulesteix, A.-L. (2018). A computationally fast variable importance test for random forests for high-dimensional data. Advances in Data Analysis and Classification, 12:885–915.
    Jia, L., Zhang, W., and Chen, X. (2017). Common methods of biological age estimation. Clinical Interventions in Aging, 12:759–772.
    Lee, Y. and Wang, Y. (2023). Threshold regression with nonparametric sample splitting. Journal of Econometrics, 235(2):816–842.
    Nembrini, S., R König, I., and Wright, M. N. (2018). The revival of the gini importance? Bioinformatics, 34(21):3711–3718.
    Saegusa, T., Ma, T., Li, G., Chen, Y. Q., and Lee, M.-L. T. (2020). Variable selection in threshold regression model with applications to hiv drug adherence data. Statistics in
    Biosciences, 12:376–398.
    Sakoda, J. M. (1971). The checkerboard model of social interaction. The Journal of Mathematical Sociology, 1(1):119–132.
    Schelling, T. C. (1971). Dynamic models of segregation. The Journal of Mathematical Sociology, 1(2):143–186.
    Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267–288.
    Tong, H. (1978). On a threshold model in pattern recognition and signal processing. In Chen, C., editor, Pattern Recognition and Signal Processing. Sijthoff and Noordhoff.
    Whitmore, G. A. and Su, Y. (2007). Modeling low birth weights using threshold regression: results for u. s. birth data. Lifetime Data Analysis, 13:161–190.
    Yu, P. (2012). Likelihood estimation and inference in threshold regression. Journal of Econometrics, 167(1):274–294.
    Description: 碩士
    國立政治大學
    統計學系
    107354011
    Source URI: http://thesis.lib.nccu.edu.tw/record/#G0107354011
    Data Type: thesis
    Appears in Collections:[統計學系] 學位論文

    Files in This Item:

    File Description SizeFormat
    401101.pdf3578KbAdobe PDF0View/Open


    All items in 政大典藏 are protected by copyright, with all rights reserved.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback