English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  Items with full text/Total items : 109951/140892 (78%)
Visitors : 46217628      Online Users : 974
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    政大機構典藏 > 商學院 > 統計學系 > 學位論文 >  Item 140.119/50810
    Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/50810


    Title: 工商及服務業普查資料品質之研究
    Data quality research of industry and commerce census
    Authors: 邱詠翔
    Contributors: 鄭宇庭
    蔡紋琦

    邱詠翔
    Keywords: 資料品質
    事後分層抽樣
    產業創新調查
    工商及服務業普查
    資料清理與整理
    Data Quality
    Post-Stratified Sampling
    Industrial Innovation Survey
    Industry and Commerce Census
    Data Cleaning and Consolidation
    Date: 2010
    Issue Date: 2011-09-29 16:46:18 (UTC+8)
    Abstract: 資料品質的好壞會影響決策品質以及各種行動的執行成果,所以資料品質在近年來越來越受到重視。本研究包含了兩個資料庫,一個是產業創新調查資料庫,一個是95年工商及服務業普查資料庫,資料品質的好壞對一個資料庫來說也是一個相當重要的議題,資料庫中往往都含有錯誤的資料,錯誤的資料會導致分析結果出現偏差的狀況,所以在進行資料分析之前,資料清理與整理是必要的事前處理工作。

    我們從母體資料分佈與樣本資料分佈得知,在清理與整理資料之前,平均創新員工人數為92.08,平均工商員工人數為135.54;在清理與整理資料之後,我們比較兩個資料庫員工人數的相關性、相似性、距離等性質,結果顯示兩個資料庫的資料一致性極高,平均創新員工人數與平均工商員工人數分別為39.01與42.12,跟母體平均員工人數7.05較為接近,也顯示出資料清理的重要性。

    本研究使用的方法為事後分層抽樣,主要研究目的是要利用產業創新調查樣本來推估95年工商及服務業普查母體資料的準確性。產業創新調查樣本在推估母體從業員工人數與母體營業收入方面皆出現高估的狀況,推測出現高估的原因是產業創新調查母體為前中華徵信所出版的五千大企業名冊為母體底冊,而工商及服務業普查企業資料為一般企業母體底冊。因此,我們利用和產業創新調查樣本所相對應的工商普查樣本做驗證,發現95年工商及服務業普查樣本與產業創新調查樣本的資料一致性極高。
    Data quality is good or bad will affect the decision quality and achievements in the implementation of various actions, so the data quality more and more attention in recent years. This study consists of two databases, one is the industrial innovation survey database, another is the industry and commerce census database in ninety five years. Data quality is good or bad of a database is also a very important issue, the database often contain erroneous information, incorrect information will result in bias of the analysis results. So before carrying out data analysis, data cleaning and consolidation is necessary.

    We can know from the parent and the sample data distribution. Before data cleaning and consolidation, the average number of innovation employees is 92.08, and the average number of industrial-commerce employees is 135.54. After data cleaning and consolidation, we compare the correlation, similarity, and distance of the number of employees in two databases. The results show the data consistency of the two databases is very high, the average number of innovation employees is 39.01, and the average number of industrial-commerce employees is 42.12, it is closer to the average number of parent employees 7.05. This also shows the importance of data cleaning.

    Method used in the study is post-stratified sampling, the main research objective is to use industrial innovation survey sample to estimate the data accuracy of the industry and commerce census in ninety five years. Use industrial innovation survey sample to estimate the number of employees and operating revenue in the industry and commerce census in ninety five years are both overestimated, we guess the reason is that the parent of the industrial innovation survey is five thousand large enterprises published by China Credit Information, and the parent of the industry and commerce census is general enterprises. Therefore, we use the corresponding industry and commerce census sample for validation. The results show that the data consistency of the industrial innovation survey sample and the industry and commerce census sample in ninety five years is very high.
    Reference: 中文參考文獻
    中華市場研究協會,2009,行政院主計處委託研究:工商及服務業普查抽樣方法 效能之研究。
    行政院國家科學委員會補助專題研究計畫:台灣地區第二次產業創新活動調查研究期末報告,2009。
    呂朝賢,2005,由資料品質談家庭收支調查在社福議題的運用,社區發展季刊第111期 。
    余清祥、胡玉蕙,1999,從美國經驗探討抽樣在普查之新角色,主計月刊第522期:60-66。
    李念秋,2002,資料品質改善之研究:錯誤資料偵測技術之發展與評估,國立中山大學資訊管理研究所碩士論文。
    李盼,2010,政府統計數據質量的實證檢驗分析,江蘇大學財經學院。
    吳聲和,2010,美國工商業母體資料庫及經濟普查報告,行政院主計處。
    郭志懋、周傲英,2002,數據質量和數據清洗研究綜述,軟件學報第13期。
    黃于玲、周元暉,2005,荷蘭2001年虛擬普查簡介,中國統計通訊第16期:2-8。
    鄭雍瑋,2006,中文資訊擷取結果之錯誤偵測,國立政治大學資訊科學研究所碩士論文。
    顏貝珊,2010,2010年各國人口普查制度之研究,人口學刊第40期:203-229。
    英文參考文獻
    Chapman, A. D., 2005, “Principles of Data Quality”, version 1.0. Report for the Global Biodiversity Information Facility, Copenhagen.
    Dalcin, E. C., 2004, “Data Quality Concepts and Techniques Applied to Taxonomic Databases”, Technical Report , School of Biological Sciences, Faculty of Medicine, Health and Life Sciences, University of Southampton, pp.266.
    English, L. P., 1999, “Improving Data Warehouse and Business Information Quality: Methods for Reducing Costs and Increasing Profits”, John Wiley & Sons , New York, pp.518.
    Freedman, D. A. and K. W. Wachter, 2003, “On the Likelihood of Improving the Accuracy of the Census Through Statistical Adjustment”, Science and Statistics:
    A Festscrift for Terry Speed, 40, pp.197-230.
    Galhardas, H., D. Florescu, D. Shasha and E. Simon, 1999, “An Extensible Framework for Data Cleaning”, INRIA Technical Report.
    Herman, E., 2008, “The American Community Survey: An introduction to the basics” Government Information Quarterly, 25, pp.504-519.
    Hogan, H., 1993, “The 1990 Post-Enumeration Survey: An Overview.”, The American Statistician, Vol. 46, No. 4, pp.261-269.
    Kaufman, L. and P. J. Rousseeuw, 1990, “Finding Groups in Data: An introduction to Cluster Analysis”, John Wiley & Sons , New York.
    Maletic, J. I. and A. Marcus, 2000, “Data Cleaning: Beyond Integrity Analysis”, The University of Memphis, Division of Computer Science, pp200-209.
    Oman, R. C. and T. B. Ayers, 1988, “Improving Data Quality”, Journal of Systems management, pp.31-35.
    Raman, V. and J. M. Hellerstein, 2000, “An Interactive Framework for Data Cleaning”, UC Berkeley Computer Science Division Report.
    Redman, T. C., 1996, “Data Quality for the Information Age”, 1st, Artech House, Inc.
    Redman, T. C., 2001, “Data Quality: The Field Guide”, Butterworth-Heinemann.
    Tayi, G. K. and D. P. Ballou, 1998, “Examining Data Quality”, Communications of the ACM, pp.54-57.
    Wang, R. Y., 1998, “A Product Perspective on Total Data Quality Management”, Communications of the ACM , pp.58-65.
    相關網站:
    中華民國統計資訊網,URL: http://www.stat.gov.tw
    Description: 碩士
    國立政治大學
    統計研究所
    98354020
    99
    Source URI: http://thesis.lib.nccu.edu.tw/record/#G0098354020
    Data Type: thesis
    Appears in Collections:[統計學系] 學位論文

    Files in This Item:

    File SizeFormat
    index.html0KbHTML2199View/Open


    All items in 政大典藏 are protected by copyright, with all rights reserved.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback