Loading...
|
Please use this identifier to cite or link to this item:
https://nccur.lib.nccu.edu.tw/handle/140.119/159038
|
Title: | 計數資料分析之新地理加權迴歸方法 A New Geographically Weighted Regression Method for Modeling Count Data |
Authors: | 李沂瑾 Li, Yi-Jin |
Contributors: | 陳怡如 Chen, Yi-Ju 李沂瑾 Li, Yi-Jin |
Keywords: | 地理加權迴歸 Poisson-Tweedie 模型 半參數迴歸 登革熱 空間分析 Geographically Weighted Regression Poisson-Tweedie Model Semi-parametric Regression Dengue Fever Spatial Analysis |
Date: | 2025 |
Issue Date: | 2025-09-01 14:49:19 (UTC+8) |
Abstract: | 離散計數資料在空間應用中相當常見,且通常呈現過離散、欠離散與零膨脹等複雜特性。為因應這些現象並捕捉資料中的空間變異關係,已有多種地理加權計數模型(Geographically Weighted Count Models,簡稱GW計數模型)被提出。然而,此類方法常依賴特定的分配假設與模型架構,在擬合與比較多個競爭模型時,易導致詮釋不一致或計算負擔沉重。為解決此問題,本研究於地理加權迴歸(GWR)架構下延伸Poisson-Tweedie分布族,提出統一且具彈性的建模方法為地理加權Poisson-Tweedie模型(Geographically Weighted Poisson-Tweedie Model, GWPTM)。本模型具備高度彈性,能涵蓋多種離散程度與尾端分布變異,並允許模型係數隨空間變動,以捕捉變數與反應變數關係的空間異質性,從而免除繁瑣且主觀的模型選擇程序。 進一步考量實際資料中變數空間異質性的差異,本文提出半參數地理加權Poisson-Tweedie模型(semi-GWPTM),區分變數為全域與局部兩類,並以兩階段估計法提升模型穩定性與解釋力。模擬中顯示,本方法在不同計數資料特性下皆具穩健估計與優異預測力。實證方面,以2015年臺南市752村里之登革熱病例數為例進行應用分析,結果指出本模型能有效揭示疫情熱點與區域風險因子的空間變化關係,並在預測表現與空間自相關修正上優於傳統Poisson-Tweedie模型、負二項迴歸(NB)與地理加權負二項迴歸(GWNBR),顯示其於空間異質性計數資料分析中的實用性與優勢。 Discrete count data are commonly encountered in spatial applications and often exhibit complex characteristics such as overdispersion, underdispersion, and zero inflation. To account for these features and capture spatial variations in count data, various Geographically Weighted Count Models (GW count models) have been developed. However, these models typically rely on specific distributional assumptions and model structures, which may lead to inconsistent interpretations and high computational burdens when fitting and comparing multiple competing models. To address these issues, this study extends the Poisson-Tweedie distribution family within the Geographically Weighted Regression (GWR) framework and proposes a unified and flexible modeling approach: the Geographically Weighted Poisson-Tweedie Model (GWPTM). This model is highly adaptable, capable of accommodating various degrees of dispersion and tail behaviors in count data, and allows coefficients to vary spatially, enabling the identification of spatial heterogeneity in the relationships between covariates and the response variable. This flexibility eliminates the need for labor-intensive and subjective model selection among multiple GW count models. Furthermore, recognizing that not all covariates exhibit spatial variation in practice, this study also proposes a semi-parametric version of the model, the semi-GWPTM, which classifies covariates as either global or local and employs a two-stage estimation procedure to enhance model stability and interpretability. Simulation results demonstrate that the proposed method yields robust estimation and excellent predictive performance across various count data scenarios.In the empirical application applies the model to dengue fever case counts from 752 villages in Tainan City, Taiwan, in 2015. The results reveal that the GWPTM effectively captures spatial variations in disease hotspots and associated risk factors and outperforms conventional Poisson-Tweedie models, Negative Binomial (NB) regression, and Geographically Weighted Negative Binomial Regression (GWNBR) in terms of predictive accuracy and correction for spatial autocorrelation. These findings highlight the practical value and advantages of the GWPTM in analyzing spatially heterogeneous count data. |
Reference: | [1] M.C.D. Almeida, W.T. Caiaa, R.M. Assuncao, and F.A. Proietti. Spatial vulnerability to dengue in a Brazilian urban area during a 7-year surveillance. Journal of Urban Health - Bulletin of the New York Academy of Medicine, 84(3):334–345, 2007. [2] Luc Anselin. Spatial Econometrics: Methods and Models. Kluwer Academic Publishers, 1988. [3] A. Blasco-Moreno, M. Pérez-Casany, P. Puig, M. Morante, and E. Castells. What does a zero mean? Understanding false, random and structural zeros in ecology. Methods in Ecology and Evolution, 10(7):949–959, 2019. [4] Wagner Hugo Bonat. Multiple response variables regression models in R: The mcglm package. Journal of Statistical Software, 84(4):1–30, 2018. [5] W.H. Bonat and B. Jørgensen. Multivariate covariance generalized linear models. Journal of the Royal Statistical Society: Series C (Applied Statistics), 2016. [6] W.H. Bonat, B. Jørgensen, C.C. Kokonendji, J. Hinde, and C.G. Demétrio. Extended Poisson–Tweedie: Properties and regression models for count data. Statistical Modelling, 18(1):24–49, 2018. [7] Jianguo Chen, Lin Liu, Luzi Xiao, Chong Xu, and Dongping Long. Integrative analysis of spatial heterogeneity and overdispersion of crime with a geographically weighted negative binomial model. ISPRS International Journal of Geo-Information, 9(1):60, 2020. [8] T.H.K. Chen, V.Y.J. Chen, and T.H. Wen. Revisiting the role of rainfall variability and its interactive effects with the built environment in urban dengue outbreaks. Applied Geography, 101:14–22, 2018. [9] V.Y.J. Chen, T.C. Yang, and H.L. Jian. Geographically weighted regression modeling for multiple outcomes. Annals of the Association of American Geographers, 112(5):1278–1295, 2022. [10] Y. Choi, C.S. Tang, L. McIver, M. Hashizume, V. Chan, R.R. Abeyasinghe, ..., and R. Huy. Effects of weather factors on dengue fever incidence and implications for interventions in Cambodia. BMC Public Health, 16(1):1–7, 2016. [11] B. Debrabant, U. Halekoh, W.H. Bonat, D.L. Hansen, J. Hjelmborg, and J. Lauritsen. Identifying traffic accident black spots with Poisson-Tweedie models. Accident Analysis and Prevention, 111:147–154, 2018. [12] Peter K. Dunn and Gordon K. Smyth. Series evaluation of Tweedie exponential dispersion model densities. Statistics and Computing, 15(4):267–280, 2005. [13] M. Esnaola, P. Puig, D. Gonzalez, R. Castelo, and J.R. Gonzalez. A flexible count data model to fit the wide diversity of expression profiles arising from extensively replicated RNA-seq experiments. BMC Bioinformatics, 14(1):1–22, 2013. [14] S. Farber and A. Páez. A systematic investigation of cross-validation in GWR model estimation: Empirical analysis and Monte Carlo simulations. Journal of Geographical Systems, 9(4):371–396, 2007. [15] A.S. Fotheringham. The problem of spatial autocorrelation and local spatial statistics. Geographical Analysis, 41(4):398–403, 2009. [16] A.S. Fotheringham, C. Brunsdon, and M.E. Charlton. Geographically Weighted Regression: The Analysis of Spatially Varying Relationships. Wiley, Chichester, 2002. [17] I. Gollini, B. Lu, M. Charlton, C. Brunsdon, and P. Harris. GWmodel: An R package for exploring spatial heterogeneity using geographically weighted models. Journal of Statistical Software, 63(17):1–50, 2015. [18] Y.W. Huang. A geographically weighted logistic regression analysis of dengue fever data in southern Taiwan. Master’s thesis, Department of Statistics, Tamkang University, 2020. [19] B. Jørgensen. The Theory of Dispersion Models. Chapman & Hall, London, 1997. [20] S. Kalogirou. Destination choice of Athenians: An application of geographically weighted versions of standard and zero-inflated Poisson spatial interaction models. Geographical Analysis, 48(2):191–230, 2016. [21] D. Li and C. Mei. A two-stage estimation method with bootstrap inference for semi-parametric geographically weighted generalized linear models. International Journal of Geographical Information Science, 32(9):1860–1883, 2018. [22] Y.C. Lin. Analysis of dengue risk characteristics using geographically weighted quantile regression. Master’s thesis, Department of Statistics, Tamkang University, 2018. [23] C.L. Mei, M. Xu, and N. Wang. A bootstrap test for constant coefficients in geographically weighted regression models. International Journal of Geographical Information Science, 30(8):1622–1643, 2016. [24] T. Nakaya, A.S. Fotheringham, C. Brunsdon, and M. Charlton. Geographically weighted Poisson regression for disease association mapping. Statistics in Medicine, 24:2625–2717, 2005. [25] S. Promprou, M. Jaroensutasinee, and K. Jaroensutasinee. Climatic factors affecting dengue haemorrhagic fever incidence in southern Thailand, 2005. [26] D. Saha, P. Alluri, E. Dumbaugh, and A. Gan. Application of the Poisson-Tweedie distribution in analyzing crash frequency data. Accident Analysis and Prevention, 137:105456, 2020. [27] M. Signorelli, P. Spitali, and R. Tsonaka. Poisson–Tweedie mixed-effects model: A flexible approach for the analysis of longitudinal RNA-seq data. Statistical Modelling, 21(6):520–545, 2021. [28] A.R. da Silva and M.D.R. de Sousa. Geographically weighted zero-inflated negative binomial regression: A general case for count data. Spatial Statistics, 58:100790, 2023. [29] A.R. da Silva and T.C.V. Rodrigues. Geographically weighted negative binomial regression–incorporating overdispersion. Statistics and Computing, 24:769–783, 2014. [30] M. Watts, P. Kotsilla, P.G. Mortyn, V.S.i. Monteys, and C.U. Brancati. Influence of socio-economic, demographic and climate factors on the regional distribution of dengue in the United States and Mexico. International Journal of Health Geographics, 19:44, 2020. [31] P.-C. Wu, J.-G. Lay, H.-R. Guo, C.-Y. Lin, S.-C. Lung, and H.-J. Su. Higher temperature and urbanization affect the spatial patterns of dengue fever transmission in subtropical Taiwan. Science of the Total Environment, 407(7):2224–2233, 2009. [32] 余化龍. 氣候變遷下臺灣地區登革熱空間時間分布預測模型建立研究. 研究報告, 衛生署疾病管制局, 2012. [33] 劉智欣. SAS IML 軟體在地理加權廣義線性模式之應用. 碩士論文, 淡江大學統計學系碩士班, 2012. [34] 周欽賢、連日清、王正雄. 醫學昆蟲學. 南山堂出版社, 1988. [35] 屈欣諭. 地理加權計數模型技術於登革熱資料之分析. 碩士論文, 淡江大學統計學系碩士班, 2018. [36] 徐筱瑜、賴淑寬、郭俊賢、吳智文、劉定萍. 臺灣登革出血熱個案流行病學分析. 疫情報導, 29(21):319–328, 2013. [37] 林政宏. 台灣地區登革熱擴散之空間分析. 碩士論文, 臺灣大學地理環境資源學研究所, 2007. [38] 王詩婷. 地理加權分量迴歸於計數資料之分析與應用. 碩士論文, 淡江大學統計學系碩士班, 2021. [39] 謝志偉、賴淑寬、張筱玲、邱展賢. 氣象資料與登革熱病媒蚊幼蟲密度級數之相關性研究. 疫情報導, 22(11):746–765, 2006. [40] 鄭榆均. 地理加權卜瓦松迴歸應用於交通事故之研究. 碩士論文, 淡江大學商管學院統計學系數據科學碩士班, 2023. [41] 陳思翰. 計數資料分量迴歸於登革熱資料之分析. 碩士論文, 淡江大學統計學系碩士班, 2019. [42] 陳怡如. 計數資料之地理加權分量迴歸. 科技部補助專題研究計畫成果報告, 科技部, 2014. 計畫編號: MOST104-2118-M032-006. [43] 陳慈忻. 登革熱疫情特徵的時空脆弱度因素:多層次模型的分析. 碩士論文, 臺灣大學地理環境資源學研究所, 2016. [44] 韓昕頻. 半參數地理加權貝它迴歸模型之建立與應用. 碩士論文, 國立政治大學統計學系碩士班, 2024. [45] 黃基森. 臺灣地區斑蚊生態及其與登革熱流行之關係. 中華昆蟲特刊, 1991. [46] 黃正中. 溫度對埃及斑蚊與白線斑蚊幼蟲發育之影響及其成蟲族群介量與產卵行為之觀察. 碩士論文, 東海大學生物學研究所, 1987. [47] 黃郁文. 應用地理加權邏吉斯迴歸於台灣南部登革熱資料之分析. 碩士論文, 淡江大學統計學系碩士班, 2020. [48] 龍紀萱. 原住民長期照護服務模式之探討. 內政部社區發展雜誌社, 2011. |
Description: | 碩士 國立政治大學 統計學系 112354011 |
Source URI: | http://thesis.lib.nccu.edu.tw/record/#G0112354011 |
Data Type: | thesis |
Appears in Collections: | [統計學系] 學位論文
|
Files in This Item:
File |
Size | Format | |
401101.pdf | 27906Kb | Adobe PDF | 0 | View/Open |
|
All items in 政大典藏 are protected by copyright, with all rights reserved.
|