English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  Items with full text/Total items : 110934/141859 (78%)
Visitors : 47695573      Online Users : 1237
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    政大機構典藏 > 商學院 > 統計學系 > 學位論文 >  Item 140.119/30952
    Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/30952


    Title: 資料採礦中之模型選取
    Authors: 孫莓婷
    Contributors: 鄭宇庭
    謝邦昌



    孫莓婷
    Keywords: 資料採礦
    插補方法
    抽樣方法
    模型選取
    Data Minig
    Imputation Method
    Sampling
    Model Selection
    Date: 2003
    Issue Date: 2009-09-14
    Abstract: 有賴電腦的輔助,企業或組織內部所存放的資料量愈來愈多,加速資料量擴大的速度。但是大量的資料帶來的未必是大量的知識,即使擁有功能強大的資料庫系統,倘若不對資料作有意義的分析與推論,再大的資料庫也只是存放資料的空間。過去企業或組織只把資料庫當作查詢系統,並不知道可以藉由資料庫獲取有價值的資訊,而其中資料庫的內容完整與否更是重要。由於企業所擁有的資料庫未必健全,雖然擁有龐大資料庫,但是其中資訊未必足夠。我們認為利用資料庫加值方法:插補方法、抽樣方法、模型評估等步驟,以達到擴充資訊的目的,應該可以在不改變原始資料結構之下增加資料庫訊息。
    本研究主要在比較不同階段的資料經過加值動作後,是否還能與原始資料結構一致。研究架構大致分成三個主要流程,包括迴歸模型、羅吉斯迴歸模型與決策樹C5.0。經過不同階段的資料加值後,我們所獲得的結論為在迴歸模型為主要流程之下,利用迴歸為主的插補方法可以使加值後的資料庫較貼近原始資料,若想進一步採用抽樣方法縮減資料量,系統抽樣所獲得的結果會比利用簡單隨機抽樣來的好。而在決策樹C5.0的主要流程下,以類神經演算法作為插補的主要方法,在提增資訊量的同時,也使插補後的資料更接近原始資料。關於羅吉斯迴歸模型,由於間斷型變數的類別比例差異過大,致使此流程無法達到有效結論。
    經由實證分析可以瞭解不同的配模方式,表現較佳的資料庫加值技術也不盡相同,但是與未插補的資料庫相比較,利用資料庫加值技術的確可以增加資訊量,使加值後的虛擬資料庫更貼近原始資料結構。
    With the fast pace of advancement in computer technology, computers have the capacity to store huge amount of data. The abundance of the data, without its proper treatment, does not necessary mean having valuable information on hand. As such, a large database system can merely serve as ways of accessing and storing. Keeping this in mind, we would like to focus on the integrity of the database. We adapt the methods where the missing values are imputed and added while leaving the data structure unmodified.

    The interest of this paper is to find out when the data are post value added using three different imputation methods, namely regression analysis, logistic regression analysis and C5.0 decision tree, which of the methods could provide the most consistent and resemblance value-added database to the original one. The results this paper has obtained are as the followings. The regression method, after imputation of the added value, produced the closer database structure to the original one. And in the case of having large amount of data where the smaller size of data is desired, then the systematic sampling provides a better outcome than the simple random sampling.
    The C5.0 decision tree method provides similar result as with the regression method. Finally with respect to the logistic regression analysis, the ratio of each class in the discrete variables is out of proportion, thereby making it difficult to make a reasonable conclusion.

    After going through the above studies, we have found that although the results from three different methods give slight different outcomes, one thing stands out and that is using the technique of value-added database could actually improve the authentic of the original database.
    Reference: [中文部分]
    1.何玉芝(2003),「資料採礦實務應用—以關連規則分析E-ICP商品消費資料」,政治大學統計學研究所碩士論文。
    2.李其縵(2003),「以倒傳遞類神經網路應用於知識萃取之研究」,台北科技大學商業自動化與管理研究所碩士論文。
    3.李銘鈞 (1999),「以類神經網路偵測多變量製成變異性變化之管制程序」,元智大學工業工程研究所。
    4.李家旭(2003),「應用資料採礦技術於保險公司附加保單之增售」,政治大學統計學研究所碩士論文。
    5.林建言(2004),「利用函數映射進行資料庫增值於資料採礦中」,政治大學統計學研究所碩士論文。
    6.韋端,鄭宇庭,鄧家駒,匡宏波,謝邦昌(2003),「Data Mining 概述—以Clementine 7.0為例」,中華資料採礦協會。
    7.張妤莉(2001),「資料挖掘之導入與影響—以銀行業為例」,政治大學企業管理學研究所碩士論文。
    8.陳惠雯 (2004),「應用資料採礦技術於資料庫加值中的抽樣方法比較」,政治大學統計學研究所碩士論文。
    9.黃文隆 (1999),「抽樣方法」,滄海書局。
    10.黃雅芳 (2004),「應用資料採礦技術於資料庫加值中的插補方法比較」,政治大學統計學研究所碩士論文。
    11.趙民德,謝邦昌 (1999),「探索真相—抽樣理論與實務」,曉園出版社。
    12.葉怡成(2001),「應用類神經網路」,儒林圖書公司。
    13.葉怡成(2001),「類神經網路模式應用與實作」,儒林圖書公司。
    14.賴柔伶 (2000),「統計調查中插補法的研究」,輔仁大學應用統計研究所碩 士論文。
    15.謝邦昌(2001),「資料採礦入門及應用—從統計技術看資料採礦」,諮商訊息顧問股份有限公司。
    16.謝邦昌,易丹輝(2003),「統計資料分析—以Statistica 6.0為例」,中華資料採礦協會。
    17.羅家蓉(2001), 「資料採礦之簡易系統—以流行病學為例」,政治大學統計學研究所碩士論文。
    [英文部分]
    1.Agresti, A. (1996), An Introduction to Categorical Data Analysis, John Wiley & Sons, Inc.
    2.Berry,M.J.A.& Linoff, G.S. (1997), Data Mining Techniques: for Marketing Sales, and Customer Support, John Wiley & Sons Inc.
    3.Berry,M.J.A.& Linoff, G.S. (2000), Mastering Data Mining Techniques, The Art and Science of Customer Relationship Management, John Wiley & Sons Inc.
    4.Berson, A., Stephen S.& Kurt T. (2000), Building Data mining Applications for CRM , McGraw-Hill.
    5.Brent L. C., Seabolt, J. D. & Thomson, R. W. & Williams, J. S. (2000), A SAS Institute White Paper: Finding the Solution to Data Mining.
    6.Frawley, W. J., Andrew & Thearling, K. (1999), Increasing Customer Value by Integrating Data Mining and Campaign Management Software, Direct Marketin, Vol.61, No.10, pp. 49-53.
    7.Frawley, W. J., G. Gregory, P. S., Matheus, C. J. (1991), Knowledge Discovery in Databases: an Overview in Knowledge Discovery in Databases , Cambridge, MA: AAAI/MIT, pp. 213-228.
    8.Grupe, F. H.& Owrang, M. M. (1995), Database Mining Discovery New Knowledge and Cooperative Advantage , Information System Management, Vol. 12, No.4, pp26-31.
    9.Hand, D. J. (1999), Statistics and Data Mining: Intersecting Displines, ACM SINGKDD Exporations, Vol. 1, Issue 1pp.16-19.
    10.Held, G. (1998), From Data to Business Advantage: Data Mining, The SEMMA Methodology and SAS software.
    11.Linoff, G. (1999), Data Mining: The Intelligence Behind CRM , Inform,  Nov/Dec, pp18-24.
    12.Roiger,R. J. & Geatz, M. W. (2003), Data Mining: A TutorialBased Primer, Pearson Education, Inc.
    13.Usama, F., Gregory, P. S., Smyth, P. (1996), The KDD Process for Extracting Useful Knowledge from Volumes of Data, Communications of the ACM, Vol.39, No.11 Nov., pp.27-34.
    14.Usama, F., Grinistein, G. G. & Wiese, A. (2002), Information Visualization in Data Mining and Knowledge Discovery, Morgan Kaufmann.
    Description: 碩士
    國立政治大學
    統計研究所
    92354024
    92
    Source URI: http://thesis.lib.nccu.edu.tw/record/#G0923540243
    Data Type: thesis
    Appears in Collections:[統計學系] 學位論文

    Files in This Item:

    File SizeFormat
    index.html0KbHTML2236View/Open


    All items in 政大典藏 are protected by copyright, with all rights reserved.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback