政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/115202

English | 正體中文 | 简体中文 | Post-Print筆數 : 27 | Items with full text/Total items : 112721/143689 (78%)
Visitors : 49537518 Online Users : 530

RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.

Scope

please add "double quotation mark" for query phrases to get precise results

please goto advance search for comprehansive author search

Adv. Search

Home ‧ Login ‧ Upload ‧ Help ‧ About ‧ Administer

Goto mobile version

政大機構典藏 > 政大會議論文集 > TANET 台灣網際網路研討會 > 會議論文 > Item 140.119/115202

Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/115202

Title:	兩階層式垃圾郵件過濾機制之研究 A Study of Two-tier Filtering Schemes forAnti-spam
Authors:	葉生正蘇民揚張僩鈞
Keywords:	支援向量機;貝氏演算法;資訊增益 SVM;Naive Bayes;Information Gain
Date:	2006
Issue Date:	2017-12-18 17:38:26 (UTC+8)
Abstract:	垃圾郵件氾濫於今日，造就各種防堵機制群雄並起，而在內容過濾比對法中又以機械學習理論的支援向量機(Support Vector Machine, SVM)與貝氏演算法(Naïve Bayes)最為出色。故本研究論文主要擷取SVM以超平面快速分類的特點及貝氏演算法的彈性，設計規劃一套兩階層式之垃圾郵件過濾機制。本研究的實驗樣本採用中、英文郵件訓練樣本各1000封，以及測試樣本各200封，於中文斷詞、英文斷字後，再以Information Gain計算結果決定SVM訓練之關鍵字。最後將SVM對測試樣本之分類結果，以本論文定義的四種邊界距離挑選出落於模糊區間的郵件樣本，經由本研究提出之貝氏機率改良模型進行計分以判斷郵件類別。研究結果呈現四種邊界距離擷取出資料再計算後的準確率皆有所提升，其中又以最大距離(Maximum Distance)或平均距離(Average Distance)的改善最顯著；若加上在最佳化模式的預測下，中、英文樣本整體分類的精確度(Accuracy)皆達97%以上，因此可驗證本研究提出之兩階層式過濾機制與貝氏演算法改良模型的可行性與貢獻度。 The Support Vector Machine (SVM) and Naive Bayes are well-known machine-learning algorithms for the application of content filtering against spam. On the basis of fast classification through the hyper-plane of SVM and flexible threshold setting of Bayes, this paper proposes a two-tier filtering scheme which combine SVM and new Naive Bayes model for anti-spam. In the first tier, Information Gain is the way to decide keywords for training vector of SVM. The paper also provides four kinds of margin of the hyper-plane, and picks out the sampling data which locates on the scope for the second tier Bayesian probability calculation to decide the classification. The experimental results indicate that all kinds of the margin setting bring the improved accuracy about 1% to 4%, especially the Maximum Distance and Average Distance Margin. Additionally, the optimal model performs the total accuracy of Chinese and English sampling mails above 97%. However, the proposed two-tier filtering scheme and new Naive Bayes model were verified with availability.
Relation:	TANET 2006 台灣網際網路研討會論文集資通安全、不當資訊防治
Data Type:	conference
Appears in Collections:	[TANET 台灣網際網路研討會] 會議論文

Files in This Item:

File	Description	Size	Format
659.pdf		363Kb	Adobe PDF2	297	View/Open

All items in 政大典藏 are protected by copyright, with all rights reserved.

社群 sharing

著作權政策宣告 Copyright Announcement

1.本網站之數位內容為國立政治大學所收錄之機構典藏，無償提供學術研究與公眾教育等公益性使用，惟仍請適度，合理使用本網站之內容，以尊重著作權人之權益。商業上之利用，則請先取得著作權人之授權。
The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

2.本網站之製作，已盡力防止侵害著作權人之權益，如仍發現本網站之數位內容有侵害著作權人權益情事者，請權利人通知本網站維護人員(nccur@nccu.edu.tw)，維護人員將立即採取移除該數位著作等補救措施。
NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.

DSpace Software Copyright © 2002-2004 MIT & Hewlett-Packard / Enhanced by NTU Library IR team Copyright © - Feedback