政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/146908

English | 正體中文 | 简体中文 | Post-Print筆數 : 27 | Items with full text/Total items : 109952/140887 (78%)
Visitors : 46342459 Online Users : 709

RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.

Scope

please add "double quotation mark" for query phrases to get precise results

please goto advance search for comprehansive author search

Adv. Search

Home ‧ Login ‧ Upload ‧ Help ‧ About ‧ Administer

Goto mobile version

政大機構典藏 > 商學院 > 統計學系 > 學位論文 > Item 140.119/146908

Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/146908

Title:	在臺灣新聞資料下透過貪婪演算法預測股票報酬 Predicting Stock Returns via Greedy Algorithm with Taiwanese News Data
Authors:	程長磊 Cheng, Chang-Lei
Contributors:	林士貴翁久幸 Lin, Shi-Kui Weng, Chiu-Hsing 程長磊 Cheng, Chang-Lei
Keywords:	文字探勘統計學習新聞情緒分析預測股票報酬 OGA CGA Text mining Statistical Learning News Sentiment Analysis Stock Returns Prediction OGA CGA
Date:	2023
Issue Date:	2023-09-01 14:58:16 (UTC+8)
Abstract:	隨著大數據、自然語言處理等領域發展，使得非結構化資料(Unstructured Data)具有極大的學術研究價值，尤其是文本資料。許多研究著手文字訊息對資產報酬之影響，使其成為財務領域中重要的研究目標之一，然而文本資料屬於高維度資料，如何正確分析文本資料與報酬間的關係成為此類研究的重要議題。而新聞文章是投資人在交易時最普遍接觸的文本資料，新聞文章與財報資料不同的地方在於新聞文章並沒有實際量化資料做為投資的依據，因此本研究欲透過Ing and Lai (2011)提出之 Orthogonal Greedy Algorithm (OGA) 以及由Chen, Dai, Ing, Lai (2019) 所改良之Chebyshev Greedy Algorithm (CGA) 高維度選模模型，挑選新聞中常用字詞之文字探勘方法以量化新聞文章之情緒分數，並在排除公司報酬因子下計算新聞情緒因子與公司報酬間之關係，並比較當應變數報酬為線性或是非線性的假設之下，利用新聞情緒分數所建構之投資組合之報酬差異。在應變數報酬為連續變數之線性假設下使用 OGA 並推廣為 OGA Predict模型，而在應變數報酬為非線性假設下則使用CGA並推廣為CGA Predict模型，並將上述兩種選模方法創新應用於財務文本分析之中。我們發現相較於OGA Predict，CGA predict模型可以得到更好的超額報酬，同時透過績效評估發現，新聞文章情緒對於散戶投資人為主的臺灣市場之影響與法人投資人為主的美國市場相比是顯著不同的，其結果也符合我們對於臺灣股票市場的經濟直觀。 The development of unstructured data grows fast and has the value of research along with the improvement of the realm of big data, especially for textual data. However, textual data are high dimensional data (i.e. the number of text in the news articles far exceeded than the news articles themselves.), therefore analyzing the relationship between textual data and the average return correctly has been an important issue according to this realm of research. When trading, the textual data that are most commonly received by investors are news articles. The difference between news articles and financial statements is that news articles can not provide quantitative information as an investment foundation. Therefore, we suppose to use two different kinds of high dimensional model selection methods, Orthogonal Greedy Algorithm(Ing and Lai (2011)) and Chebyshev Greedy Algorithm(Chen, Dai, Ing, Lai(2019)), and then select the frequently use words from news articles in order to quantify the sentiment scores of news articles. Moreover, we compare the difference of the portfolio returns which are constructed under two different assumptions(linear or nonlinear) of dependent variables according to the news sentiments. We use the OGA predict model to construct news sentiment when the dependent variable is under linear assumption, otherwise, we use the CGA predict. We find that the average return from the CGA predict model is better than the average return from the OGA predict model. Moreover, there is a significant difference in decision making when trading between the Taiwanese market and US market.
Reference:	1. 郭亭佑. (2021). 透過文字探勘預測台股報酬. 政治大學金融學系學位論文 2. Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3,993-1022 3. Chen, Y. L, Dai, C. S and Ing, C. K (2019). High dimensional model selection via Chebyshev greedy algorithms. Working paper. 4. Fan, J., Xue, L., and Zhou, Y. (2021). How much can machines learn finance from Chinese text data?. Working Paper. 5. Gentzkow, M., Kelly, B., and Taddy, M. (2019). Text as data. Journal of Economic Literature, 57 (3), 535-74. 6. Henry, E. (2008). Are investors influenced by how earnings press releases are written?. The Journal of Business Communication, 45(4), 363–407. 7. Ing, C. K., and Lai, T. L. (2011). A stepwise regression method and consistent model selection for high-dimensional sparse linear models. Statistica Sinica, 1473-1513. 8. Jegadeesh, N., and Wu, D. (2013). Word power: A new approach for content analysis. Journal of Financial Economics, 110(3), 712-729. 9. Ke, Z. T., Kelly, B. T., and Xiu, D. (2019). Predicting returns with text data. Working Paper. 10. Loughran, T., and McDonald, B. (2011). When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks. Journal of Finance, 66(1), 35-65. 11. Manela, A., and Moreira, A. (2017). News implied volatility and disaster concerns. Journal of Financial Economics, 123(1), 137–162. 12. Temlyakov, V. N. (2015). Greedy approximation in convex optimization. Constructive Approximation, 41(2), 269-296. 13. Tetlock, P. C. (2007). Giving content to investor sentiment: The role of media in the stock market. The Journal of Finance, 62(3), 1139-1168. 14. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267-288. 15. You, J., Zhang, B., and Zhang, L. (2018). Who captures the power of the pen?. Review of Financial Studies, 31(1), 43–96.
Description:	碩士國立政治大學統計學系 110354030
Source URI:	http://thesis.lib.nccu.edu.tw/record/#G0110354030
Data Type:	thesis
Appears in Collections:	[統計學系] 學位論文

Files in This Item:

File	Description	Size	Format
403001.pdf		3303Kb	Adobe PDF2	0	View/Open

All items in 政大典藏 are protected by copyright, with all rights reserved.

社群 sharing

著作權政策宣告 Copyright Announcement

1.本網站之數位內容為國立政治大學所收錄之機構典藏，無償提供學術研究與公眾教育等公益性使用，惟仍請適度，合理使用本網站之內容，以尊重著作權人之權益。商業上之利用，則請先取得著作權人之授權。
The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

2.本網站之製作，已盡力防止侵害著作權人之權益，如仍發現本網站之數位內容有侵害著作權人權益情事者，請權利人通知本網站維護人員(nccur@nccu.edu.tw)，維護人員將立即採取移除該數位著作等補救措施。
NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.

DSpace Software Copyright © 2002-2004 MIT & Hewlett-Packard / Enhanced by NTU Library IR team Copyright © - Feedback