English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  Items with full text/Total items : 109952/140887 (78%)
Visitors : 46358263      Online Users : 1309
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    政大機構典藏 > 商學院 > 統計學系 > 學位論文 >  Item 140.119/136324
    Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/136324


    Title: 透過文字探勘預測台股報酬
    Predicting Taiwan Stocks Returns with Text Data
    Authors: 郭亭佑
    Kuo, Ting-You
    Contributors: 翁久幸
    林士貴

    Weng, Chiu-Hsing
    Lin, Shih-Kuei

    郭亭佑
    Kuo, Ting-You
    Keywords: 非結構化數據
    文字探勘
    股票新聞
    機器學習
    預測股票報酬
    情緒分析
    效率市場假說
    超額報酬
    Unstructured Data
    Text Mining
    Stock News
    Machine Learning
    Predict Stock Returns
    Sentiment Analysis
    Efficient-Market Hypothesis
    Abnormal Returns
    Date: 2021
    Issue Date: 2021-08-04 14:43:11 (UTC+8)
    Abstract: 近年來非結構化數據成長快速,因而引發多位學者針對新聞媒體對於股票報酬之影響此類議題進行研究分析。新聞為一般投資人進行交易行為時,最為普遍接觸之「公開資訊」。然而,新聞文章不若財報資訊中有明確數據資料供投資人研究分析後,作為其投資之參考依據。本研究欲透過文字探勘方法獲取台股新聞情緒信息,並利用新聞情緒分數預測台股報酬。本文依據 Ke, Kelly & Xiu (2019) 提出之文字探勘方法建構台股新聞情緒分數模型(Taiwan Stocks Sentiment Extraction via Screening and Topic Modeling, 台股SESTM),我們發現該方法特別適合用於分析新聞文章與股價走勢之間的變動關係,因此本研究欲將該文字探勘方法拓展至臺灣股票市場,並用於實證臺灣效率市場假說。我們發現使用台股SESTM所估算之新聞情緒分數,於臺灣股票市場建構投資組合交易策略同樣有巨大經濟效益,而該情緒分數對於個股報酬有顯著的預測能力及解釋力。若比較美國與台股SESTM交易策略績效表現,可發現台股SESTM對於新聞發佈前之股票報酬有較高的預測能力。同時也發現,儘管台股SESTM對於股票報酬之預測能力顯著有效,但我們透過評估績效發現,新聞對於臺灣投資人決策行為之影響與美國是顯著不同的,這些結果均符合我們對於臺灣股票市場的經濟直觀。我們期待此研究所建構之台股SESTM能夠幫助臺灣財務文字探勘領域建立研究基底。
    In recent years, unstructured data has grown rapidly, which has triggered many scholars to conduct research and analysis on the impact of news media on stock price returns. News article is the most common and accessible “open information” by investors when they conduct transactions. However, news articles, unlike financial report or stock price, news articles cannot be converted to specific numerical data as a reference basis for investment. Our research intends to obtain sentiment information from Taiwan stocks news through text-mining and use news sentiment scores to predict Taiwan stocks` returns. Our research is based on the text-mining methodology introduce by Ke, Kelly & Xiu (2019) to construct a Taiwan stock news sentiment model (Taiwan Stocks Sentiment Extraction via Screening and Topic Modeling, Taiwan SESTM). We found that this methodology is particularly suitable for analyzing the relationship between news articles and stock price trends. Therefore, this study intends to extend this text-mining methodology to the Taiwan stock market and use the empirical analysis of Taiwan`s efficiency-market hypothesis by news articles. We found that using the news sentiment score estimated by Taiwan SESTM to construct a portfolio trading strategy in the Taiwan stock market also has huge economic benefits, and the sentiment score is significantly effective on predict stock returns and explain their correlation. We compare the performance of the United States and Taiwan SESTM trading strategies, we found that Taiwan SESTM has a higher predictive ability for stock price returns before the news articles release. At the same time, we also found the impact of news on the decision making of Taiwanese investors is significantly different with United States by evaluate our portfolio performance. These results are in line with our economic intuition about the Taiwan stock market. We hope that the Taiwan SESTM constructed by this research can help establish a research base in the field of financial text-mining in Taiwan.
    Reference: 1. 李昱穎. (2019). 新聞輿情分析在台灣股票市場之應用: 文字轉向量與動能策略. 政治大學金融學系學位論文, 1-40.
    2. 陳信宏, 陳昱志,& 鄭舜仁.(2006). 以時間數列模型檢定台灣股票市場弱式效率性之研究. 管理科學與統計決策, 3(4), 8-17.
    3. 鍾任明, 李維平, & 吳澤民. (2005). 運用文字探勘於日內股價漲跌趨勢預測之研究 (Doctoral dissertation, 撰者).
    4. Azar, P. D., & Lo, A. W. (2016). The wisdom of Twitter crowds: Predicting stock market reactions to FOMC meetings via Twitter feeds. The Journal of Portfolio Management, 42(5), 123-134.
    5. Alvarez-Ramirez, J., Rodriguez, E., & Espinosa-Paredes, G. (2012). Is the US stock market becoming weakly efficient over time? Evidence from 80-year-long data. Physica A: Statistical Mechanics and its Applications, 391(22), 5643-5647.
    6. Bernard, V. L., & Thomas, J. K. (1990). Evidence that stock prices do not fully reflect the implications of current earnings for future earnings. Journal of Accounting and Economics, 13(4), 305-340.
    7. Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., & Kuksa, P. (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12, 2493-2537.
    8. Cowles 3rd, A. (1933). Can stock market forecasters forecast?. Econometrica: Journal of the Econometric Society, 309-324.
    9. Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q. V., & Salakhutdinov, R. (2019). Transformer-xl: Attentive language models beyond a fixed-length context. arXiv preprint arXiv:1901.02860.
    10. Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
    11. Fama, E. F. (1970). Efficient capital markets: A review of theory and empirical work. The Journal of Finance, 25(2), 383-417.
    12. Fan, J., & Lv, J. (2008). Sure independence screening for ultrahigh dimensioal feature space. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(5), 849-911.
    13. Gehring, J., Auli, M., Grangier, D., Yarats, D., & Dauphin, Y. N. (2017, July). Convolutional sequence to sequence learning. In International Conference on Machine Learning (pp. 1243-1252). PMLR.
    14. Heston, S. L., & Sinha, N. R. (2017). News vs. sentiment: Predicting stock returns from news stories. Financial Analysts Journal, 73(3), 67-83.
    15. Hutchins, R. M. (1954). Great books. Western World.
    16. Jegadeesh, N., & Titman, S. (1993). Returns to buying winners and selling losers: Implications for stock market efficiency. The Journal of Finance, 48(1), 65-91.
    17. Jegadeesh, N., & Wu, D. (2013). Word power: A new approach for content analysis. Journal of Financial Economics, 110(3), 712-729.
    18. Kalchbrenner, N., Grefenstette, E., & Blunsom, P. (2014). A convolutional neural network for modelling sentences. arXiv preprint arXiv:1404.2188.
    19. Ke, Z. T., Kelly, B. T., & Xiu, D. (2019). Predicting returns with text data (No. w26186). National Bureau of Economic Research.
    20. Lakonishok, J., & Vermaelen, T. (1990). Anomalous price behavior around repurchase tender offers. The Journal of Finance, 45(2), 455-477.
    21. Le, Q., & Mikolov, T. (2014, June). Distributed representations of sentences and documents. In International Conference on Machine Learning (pp. 1188-1196). PMLR.
    22. Loper, E., & Bird, S. (2002). NLTK: the natural language toolkit. arXiv preprint cs/0205028.
    23. Loughran, T., & McDonald, B. (2011). When is a liability not a liability? Textual analysis, dictionaries, and 10‐Ks. The Journal of Finance, 66(1), 35-65.
    24. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems, 26, 3111-3119.
    25. Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Polosukhin, I. (2017). Attention is all you need. arXiv preprint arXiv:1706.03762.
    26. Ritter, J. R. (1991). The long‐run performance of initial public offerings. The Journal of Finance, 46(1), 3-27.
    27. Spiess, D. K., & Affleck-Graves, J. (1995). Underperformance in long-run stock returns following seasoned equity offerings. Journal of Financial Economics, 38(3), 243-267.
    28. Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with
    neural networks. arXiv preprint arXiv:1409.3215.
    29. Tetlock, P. C. (2007). Giving content to investor sentiment: The role of media in the stock market. The Journal of Finance, 62(3), 1139-1168.
    30. Tetlock, P. C. (2014). Information transmission in finance. Annual Review of Financial Economics, 6(1), 365-384.
    31. Turing, I. B. A. (1950). Computing machinery and intelligence-AM Turing. Mind, 59(236), 433.
    32. Wilson, D. S. (1975). A theory of group selection. Proceedings of the National Academy of Sciences, 72(1), 143-146.
    33. Yang, B., Yih, W. T., He, X., Gao, J., & Deng, L. (2014). Embedding entities and relations for learning and inference in knowledge bases. arXiv preprint arXiv:1412.6575.
    34. Zettlemoyer, L. (2018). Deep contextualized word representations. arXiv preprint arXiv:1802.05365.
    35. Zhang, Y., & Wallace, B. (2015). A sensitivity analysis of (and practitioners` guide to) convolutional neural networks for sentence classification. arXiv preprint arXiv:1510.03820.
    Description: 碩士
    國立政治大學
    統計學系
    108354023
    Source URI: http://thesis.lib.nccu.edu.tw/record/#G0108354023
    Data Type: thesis
    DOI: 10.6814/NCCU202101087
    Appears in Collections:[統計學系] 學位論文

    Files in This Item:

    File Description SizeFormat
    402301.pdf3083KbAdobe PDF20View/Open


    All items in 政大典藏 are protected by copyright, with all rights reserved.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback