政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/141558

English | 正體中文 | 简体中文 | Post-Print筆數 : 27 | Items with full text/Total items : 109952/140887 (78%)
Visitors : 46372542 Online Users : 1097

RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.

Scope

please add "double quotation mark" for query phrases to get precise results

please goto advance search for comprehansive author search

Adv. Search

Home ‧ Login ‧ Upload ‧ Help ‧ About ‧ Administer

Goto mobile version

政大機構典藏 > 商學院 > 資訊管理學系 > 學位論文 > Item 140.119/141558

Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/141558

Title:	於系統日誌使用語言模型的異常分析 Anomaly Detection on System Log with Language Modeling
Authors:	曾志中 Tseng, Chih-Chung
Contributors:	蕭舜文 Hsiao, Shun-Wen 曾志中 Tseng, Chih-Chung
Keywords:	系統日誌分析異常分析深度學習 log data analysis anomaly detection deep learning
Date:	2022
Issue Date:	2022-09-02 14:48:21 (UTC+8)
Abstract:	為管理系統服務品質，系統日誌廣泛地存在於應用軟體之中，而其中的異常行為與錯誤可能導致軟體漏洞的產生，並使服務暴露於危險之中。因此，系統維運人員通常採用異常偵測以及時發現不尋常的事件發生。隨著自然語言處理在近年的發展，分析系統日誌的研究開始採納語言表徵模型，讓預測模型也能考慮系統日誌背後的語意。這樣的方法使預測模型更能應付不斷變化的日誌格式。我們提出一個具有重建閘且基於BERT的單類別預測模型，於不同層級下學習系統日誌的正常行為。我們的方法結合了異常分析的訓練目標與語意的表徵，且透過組合的惡意分數，來反映連續事件中細微的異常。我們以兩個截然不同的資料集來評估我們的方法，而實驗結果展現出此模型對於複雜的系統日誌具有優秀的適應能力，並透過序列分析中的統計數據來解釋我們的成果。 System log is generally existing in software applications to help operators manage their services. Misbehavior and bugs in a system can cause vulnerabilities and put services in danger. Therefore, anomaly detection is adopted to aid operators to discover anomalous events in system log. With the development of deep learning models in Natural Language Processing (NLP), recent researches utilize language representation models to take semantics behind the log into consideration. The approach strengthens the adaptability of an anomaly detection model to log events with changing formats. We propose the Bert-based One-class classification with an explicit Reconstruction Gate (BORG) to recognize the benign session behavior of system log in different levels. Our method integrates the anomaly detection objective with language representation, and comprise a composite malicious score in the detection phase to reflect the abnormality in trivial events. We evaluate our concept under two log data sets with contrasting statistic properties. The result shows the robustness of our method to challenging log data. The experiments and analysis are also presented to explain our outcomes.
Reference:	K. Yamanishi and Y. Maruyama, “Dynamic syslog mining for network failure monitoring,” in Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, 2005, pp. 499–508. W. Xu, L. Huang, A. Fox, D. Patterson, and M. I. Jordan, “Detecting large-scale system problems by mining console logs,” in Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles, 2009, pp. 117–132. J.-G. Lou, Q. Fu, S. Yang, Y. Xu, and J. Li, “Mining invariants from console logs for system problem detection,” in 2010 USENIX Annual Technical Conference (USENIX ATC 10), 2010. K. Zhang, J. Xu, M. R. Min, G. Jiang, K. Pelechrinis, and H. Zhang, “Automated it system failure prediction: A deep learning approach,” in 2016 IEEE International Conference on Big Data (Big Data). IEEE, 2016, pp. 1291–1300. M. Du, F. Li, G. Zheng, and V. Srikumar, “Deeplog: Anomaly detection and diagnosis from system logs through deep learning,” in Proceedings of the 2017 ACM SIGSAC conference on computer and communications security, 2017, pp. 1285–1298. H. Guo, S. Yuan, and X. Wu, “Logbert: Log anomaly detection via bert,” in 2021 International Joint Conference on Neural Networks (IJCNN). IEEE, 2021, pp. 1–8. Z. Wang, Z. Chen, J. Ni, H. Liu, H. Chen, and J. Tang, “Multi-scale one-class recurrent neural networks for discrete event sequence anomaly detection,” in Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 2021, pp. 3726–3734. W. Meng, Y. Liu, Y. Zhu, S. Zhang, D. Pei, Y. Liu, Y. Chen, R. Zhang, S. Tao, P. Sun et al., “Loganomaly: Unsupervised detection of sequential and quantitative anomalies in unstructured logs.” in IJCAI, vol. 19, no. 7, 2019, pp. 4739–4745. X. Zhang, Y. Xu, Q. Lin, B. Qiao, H. Zhang, Y. Dang, C. Xie, X. Yang, Q. Cheng, Z. Li et al., “Robust log-based anomaly detection on unstable log data,” in Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2019, pp. 807–817. S. Huang, Y. Liu, C. Fung, R. He, Y. Zhao, H. Yang, and Z. Luan, “Hitanomaly: Hierarchical transformers for anomaly detection in system log,” IEEE Transactions on Network and Service Management, vol. 17, no. 4, pp. 2064–2076, 2020. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018. A. R. Tuor, R. Baerwolf, N. Knowles, B. Hutchinson, N. Nichols, and R. Jasper, “Recurrent neural network language models for open vocabulary event-level cyber anomaly detection,” in Workshops at the thirty-second AAAI conference on artificial intelligence, 2018. A. Brown, A. Tuor, B. Hutchinson, and N. Nichols, “Recurrent neural network attention mechanisms for interpretable system log anomaly detection,” in Proceedings of the First Workshop on Machine Learning for Computing Systems, 2018, pp. 1–8. S. Nedelkoski, J. Bogatinovski, A. Acker, J. Cardoso, and O. Kao, “Self-attentive classification-based anomaly detection in unstructured logs,” in 2020 IEEE International Conference on Data Mining (ICDM). IEEE, 2020, pp. 1196–1201. Y. Lee, J. Kim, and P. Kang, “Lanobert: System log anomaly detection based on bert masked language model,” arXiv preprint arXiv:2111.09564, 2021. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017. W. L. Taylor, ““cloze procedure”: A new tool for measuring readability,” Journalism quarterly, vol. 30, no. 4, pp. 415–433, 1953. L. Ruff, R. Vandermeulen, N. Goernitz, L. Deecke, S. A. Siddiqui, A. Binder, E. Müller, and M. Kloft, “Deep one-class classification,” in International conference on machine learning. PMLR, 2018, pp. 4393–4402. Y. Wu, M. Schuster, Z. Chen, Q. V. Le, M. Norouzi, W. Macherey, M. Krikun, Y. Cao, Q. Gao, K. Macherey et al., “Google’s neural machine translation system: Bridging the gap between human and machine translation,” arXiv preprint arXiv:1609.08144, 2016. R. Collobert and J. Weston, “A unified architecture for natural language processing: Deep neural networks with multitask learning,” in Proceedings of the 25th international conference on Machine learning, 2008, pp. 160–167. T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013. A. Joulin, E. Grave, P. Bojanowski, M. Douze, H. Jégou, and T. Mikolov, “Fasttext. zip: Compressing text classification models,” arXiv preprint arXiv:1612.03651, 2016. S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997. K. Cho, B. Van Merriënboer, D. Bahdanau, and Y. Bengio, “On the properties of neural machine translation: Encoder-decoder approaches,” arXiv preprint arXiv:1409.1259, 2014. M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer, “Deep contextualized word representations,” in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). New Orleans, Louisiana: Association for Computational Linguistics, Jun. 2018, pp. 2227–2237. [Online]. Available: https://aclanthology.org/N18-1202 M. Du and F. Li, “Spell: Streaming parsing of system event logs,” in 2016 IEEE 16th International Conference on Data Mining (ICDM). IEEE, 2016, pp. 859–864. P. He, J. Zhu, Z. Zheng, and M. R. Lyu, “Drain: An online log parsing approach with fixed depth tree,” in 2017 IEEE international conference on web services (ICWS). IEEE, 2017, pp. 33–40. E. Loper and S. Bird, “Nltk: The natural language toolkit,” arXiv preprint cs/0205028, 2002. R. Vaarandi, “A data clustering algorithm for mining patterns from event logs,” in Proceedings of the 3rd IEEE Workshop on IP Operations & Management (IPOM 2003)(IEEE Cat. No. 03EX764). Ieee, 2003, pp. 119–126. D. M. Tax and R. P. Duin, “Support vector data description,” Machine learning, vol. 54, no. 1, pp. 45–66, 2004. P. Lippe, “Tutorial 5: Transformers and multi-head attention,” Apr 2022. [Online]. Available: https://pytorch-lightning.readthedocs.io/en/latest/notebooks/course_UvA-DL/05-transformers-and-MH-attention.html I. Turc, M. Chang, K. Lee, and K. Toutanova, “Well-read students learn better: The impact of student initialization on knowledge distillation,” CoRR, vol. abs/1908.08962, 2019. [Online]. Available: http://arxiv.org/abs/1908.08962 S. He, J. Zhu, P. He, and M. R. Lyu, “Loghub: a large collection of system log datasets towards automated log analytics,” arXiv preprint arXiv:2008.06448, 2020. A. Oliner and J. Stearley, “What supercomputers say: A study of five system logs,”in 37th annual IEEE/IFIP international conference on dependable systems and networks (DSN’07). IEEE, 2007, pp. 575–584.
Description:	碩士國立政治大學資訊管理學系 109356019
Source URI:	http://thesis.lib.nccu.edu.tw/record/#G0109356019
Data Type:	thesis
DOI:	10.6814/NCCU202201200
Appears in Collections:	[資訊管理學系] 學位論文

Files in This Item:

File	Description	Size	Format
601901.pdf		986Kb	Adobe PDF2	59	View/Open

All items in 政大典藏 are protected by copyright, with all rights reserved.

社群 sharing

著作權政策宣告 Copyright Announcement

1.本網站之數位內容為國立政治大學所收錄之機構典藏，無償提供學術研究與公眾教育等公益性使用，惟仍請適度，合理使用本網站之內容，以尊重著作權人之權益。商業上之利用，則請先取得著作權人之授權。
The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

2.本網站之製作，已盡力防止侵害著作權人之權益，如仍發現本網站之數位內容有侵害著作權人權益情事者，請權利人通知本網站維護人員(nccur@nccu.edu.tw)，維護人員將立即採取移除該數位著作等補救措施。
NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.

DSpace Software Copyright © 2002-2004 MIT & Hewlett-Packard / Enhanced by NTU Library IR team Copyright © - Feedback