English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  Items with full text/Total items : 109952/140887 (78%)
Visitors : 46307211      Online Users : 1210
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/119801


    Title: 基於i-Vector 特徵之聲音風格分析
    Analysis of Voice Styles Using i-Vector Features
    Authors: 高文聰
    Kao, Wen-Tsung
    Contributors: 廖文宏
    Liao, Wen-Hung
    高文聰
    Kao, Wen-Tsung
    Keywords: 聲音風格
    機器學習
    模式分類
    i-Vector
    ALIZE
    Sound style
    Machine learning
    Pattern recognition
    I-Vector
    ALIZE
    Date: 2018
    Issue Date: 2018-08-29 16:04:21 (UTC+8)
    Abstract: 聲音的風格有若干常見的形容詞,但難以被精確定義。本論文試圖從語者辨識(Speaker Recognition)的觀點出發,針對不同的聲音風格進行分析,使用的方法為目前在語音辨識中常用的特徵值向量i-Vector,並搭配支援向量機(SVM)做分類。為了測試i-Vector對於聲音風格描述的可用性,在過程中我們事先做了許多的驗證,包含基本語者辨識、最短輸入聲音長度測試、白噪音對於語者驗證的影響、說話內容關聯性測試、聲音取樣率測試與配音員使用不同聲調對於風格的測試。確認特徵之相關性後,我們挑選日常生活中常見的八種聲音風格類型進行分類,分析結果是否具一致性,證實利用語者辨識系統也可以有效的辨識聲音的風格類型。
    Many adjectives have been used to describe voice characteristics, yet it is challenging to define sound styles precisely using quantitative measure. In this thesis, we attempt to tackle the sound style classification problem based on techniques designed for speaker recognition. Specifically, we employ i-Vector, a widely adopted feature in speaker identification together with support vector machine (SVM) for style classification. In order to verify the reliability of i-vector, we conducted a series of experiments, including basic speaker recognition function, minimum voice duration¸ noise sensitivity, context dependency, sensitivity to different sampling rates and style classification of samples from voice actors. The results indicate that i-Vector can indeed be utlilized to classify sound styles that are commonly perceived in daily life.
    Reference: [1] Heap, Michael. "Neuro-linguistic programming." Hypnosis: Current clinical, experimental and forensic practices (1988): 268-280.
    [2] NIST, “Speaker Recognition”,
    https://www.nist.gov/itl/iad/mig/speaker-recognition
    [3] Tong, Rong, et al. "The IIR NIST 2006 Speaker Recognition System: Fusion of Acoustic and Tokenization Features." presentation in 5th Int. Symp. on Chinese Spoken Language Processing, ISCSLP. 2006.
    [4] Hasan, Md Rashidul, Mustafa Jamil, and M. G. R. M. S. Rahman. "Speaker identification using mel frequency cepstral coefficients." variations 1.4 (2004).
    [5] Reynolds, Douglas A., and Richard C. Rose. "Robust text-independent speaker identification using Gaussian mixture speaker models." IEEE transactions on speech and audio processing 3.1 (1995): 72-83.
    [6] Reynolds, Douglas A., Thomas F. Quatieri, and Robert B. Dunn. "Speaker verification using adapted Gaussian mixture models." Digital signal processing 10.1-3 (2000): 19-41.
    [7] Kenny, Patrick. "Joint factor analysis of speaker and session variability: Theory and algorithms." CRIM, Montreal,(Report) CRIM-06/08-13 14 (2005): 28-29.
    [8] Dehak, Najim, et al. "Front-end factor analysis for speaker verification." IEEE Transactions on Audio, Speech, and Language Processing 19.4 (2011): 788-798.
    [9] AlplaGo, https://deepmind.com/research/alphago/
    [10] Cortes, Corinna, and Vladimir Vapnik. "Support-vector networks." Machine learning 20.3 (1995): 273-297.
    [11] Franc, Vojtech, Alexander Zien, and Bernhard Schölkopf. "Support vector machines as probabilistic models." Proceedings of the 28th International Conference on Machine Learning (ICML-11). 2011.
    [12] Dehak, Najim, et al. "Front-end factor analysis for speaker verification." IEEE Transactions on Audio, Speech, and Language Processing 19.4 (2011): 788-798
    [13] Kenny, Patrick. "Joint factor analysis of speaker and session variability: Theory and algorithms." CRIM, Montreal,(Report) CRIM-06/08-13 215 (2005).
    [14] Larcher, Anthony, et al. "I-vectors in the context of phonetically-constrained short utterances for speaker verification." Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on. IEEE, 2012.
    [15] 陳嘉穎,“應用因素分析與識別向量於語音情緒辨識”, 國立中山大學碩士論文, 2016.
    [16] Bonastre, J-F., Frédéric Wils, and Sylvain Meignier. "ALIZE, a free toolkit for speaker recognition." Acoustics, Speech, and Signal Processing, 2005. Proceedings.(ICASSP`05). IEEE International Conference on. Vol. 1. IEEE, 2005.
    [17] Larcher, Anthony, et al. "ALIZE 3.0-open source toolkit for state-of-the-art speaker recognition." Interspeech. 2013.
    [18] Chang, Chih-Chung, and Chih-Jen Lin. "LIBSVM: a library for support vector machines." ACM transactions on intelligent systems and technology (TIST) 2.3 (2011): 27
    [19] SoX, “Sound eXchange”, http://sox.sourceforge.net
    [20] ALIZÉ, http://alize.univ-avignon.fr/
    [21] SPro, http://www.irisa.fr/metiss/guig/spro/
    [22] Audacity, https://www.audacityteam.org/
    [23] Haykin, Simon, and Zhe Chen. "The cocktail party problem." Neural computation 17.9 (2005): 1875-1902.
    [24] Hyvärinen, Aapo, Juha Karhunen, and Erkki Oja. Independent component analysis. Vol. 46. John Wiley & Sons, 2004.
    [25] FFmpeg, https://www.ffmpeg.org/
    [26] 娃娃音,維基百科,https://zh.wikipedia.org/wiki/%E5%A8%83%E5%A8%83%E9%9F%B3
    [27] Youtube, https://www.youtube.com/
    [28] 愛樂電台,https://www.e-classical.com.tw/index.html
    [29] 警察廣播電台,https://www.pbs.gov.tw/cht/index.php
    [30] Garcia-Romero, Daniel, and Carol Y. Espy-Wilson. "Analysis of i-vector length normalization in speaker recognition systems." Twelfth Annual Conference of the International Speech Communication Association. 2011.
    [31] 百度語音,http://fanyi.baidu.com/#auto/zh/
    [32] Google語音, https://translate.google.com.tw/
    Description: 碩士
    國立政治大學
    資訊科學系碩士在職專班
    103971014
    Source URI: http://thesis.lib.nccu.edu.tw/record/#G0103971014
    Data Type: thesis
    DOI: 10.6814/THE.NCCU.EMCS.007.2018.B02
    Appears in Collections:[資訊科學系碩士在職專班] 學位論文

    Files in This Item:

    File SizeFormat
    101401.pdf8950KbAdobe PDF2263View/Open


    All items in 政大典藏 are protected by copyright, with all rights reserved.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback