政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/125641

English | 正體中文 | 简体中文 | Post-Print筆數 : 27 | Items with full text/Total items : 116918/147948 (79%)
Visitors : 66167650 Online Users : 3942

RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.

Scope

please add "double quotation mark" for query phrases to get precise results

please goto advance search for comprehansive author search

Adv. Search

Home ‧ Login ‧ Upload ‧ Help ‧ About ‧ Administer

Goto mobile version

政大機構典藏 > 資訊學院 > 資訊科學系 > 學位論文 > Item 140.119/125641

Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/125641

Title:	利用卷積式注意力機制語言模型為影片生成鋼琴樂曲 InverseMV: Composing Piano Scores with a Convolutional Video-Music Transformer
Authors:	林鑫彤 Lin, Chin-Tung
Contributors:	沈錳坤 Shan, Man-Kwan 林鑫彤 Lin, Chin-Tung
Keywords:	為影片生成音樂音樂生成卷積式注意力機制模型生成鋼琴譜影片配樂 Video-Music Transformer VMT InverseMV VMT Model Convolutional Video-Music Transformer
Date:	2019
Issue Date:	2019-09-05 16:14:39 (UTC+8)
Abstract:	近年手機鏡頭的技術趨向成熟，加上如Facebook、Instagram等社群網站的興起，使用者可輕易用手機拍出高品質的照片及影片並分享到網路上。一個高流量的影片往往有著與之搭配的音樂，而一般人並非專業的配樂師，受限於音樂素材的收集和敏銳度，在影片配樂的挑選上時常遇到困難。影片的配樂上使用現成的音樂會受限於版權的問題，因此在影片配樂上使用音樂的自動生成將成為一個新的研究趨勢。隨著近年類神經網路(Neural Network, NN)蓬勃的發展，有許多研究開始嘗試使用類神經網路模型來生成符號音樂(symbolic music)，但據我們所知目前並未有人嘗試為影片生成音樂。在缺乏現成dataset的情況下，我們人工收集並標記一個pop music的dataset來做為我們模型的訓練資料。基於注意力機制模型(Transformer)在自然語言處理(Natural Language Processing, NLP)問題上的成功，而符號音樂的生成與語言生成也有著異曲同工之處，本研究提出一個為影片自動生成配樂的模型VMT(Video-Music Transformer)，輸入影片的frame sequence來生成對應的符號鋼琴音樂（symbolic piano music）。我們在實驗結果也得到VMT模型相對於序列模型(sequence to sequence model)在音樂流暢度和影片匹配度上有較好的結果。 With the wide popularity of social media including Facebook, Twitter, Instagram, YouTube, etc. and the modernization of mobile photography, users on social media tend to watch and send videos rather than text. People want their video with a high click-through rate. However, such video requires great editing skill and perfect matching music, which are very difficult for common people. On top of that, people creating soundtrack suffer from the lack of ownership of musical pieces. The music generated from a model instead of existing music conduces to preventing from breaching copyright. The rise of deep learning brought out much work using a model based on the neural network to generate symbolic music. However, to the best of our knowledge, there is no work trying to compose music for video and no dataset with paired video and music. Therefore, we release a new dataset composed of over 7 hours of piano scores with fine alignment between pop music videos and midi files. We propose a model VMT(Video-Music Transformer) that generates piano scores from video frames, and then evaluate our model with seq2seq and obtain better music smooth and relevance of video.
Reference:	[1] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018. [2] H.-W. Dong, W.-Y. Hsiao, L.-C. Yang, and Y.-H. Yang, MuseGAN: Symbolic-domain music generation and accompaniment with multi-track sequential generative adversarial networks. arXiv preprint arXiv:1709.06298, 2017. [3] J. Engel, C. Resnick, A. Roberts, S. Dieleman, M. Norouzi, D. Eck, and K. Simonyan, Neural audio synthesis of musical notes with wavenet autoencoders. Proceedings of the 34th International Conference on Machine Learning-Volume 70, 2017. [4] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, Generative adversarial nets. Advances in neural information processing systems, 2014. [5] G. Hadjeres, F. Pachet, and F. Nielsen, DeepBach: a Steerable Model for Bach chorales generation. arXiv preprint arXiv:1612.01010, 2016. [6] G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-R. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, and T. N. Sainath, Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 29(6), 82-97, 2012. [7] D. P. Kingma and J. Ba, Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014. [8] A. Krizhevsky, I. Sutskever, and G. E. Hinton, Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 2012. [9] F.-F. Kuo, M.-F. Chiang, M.-K. Shan, and S.-Y. Lee, Emotion-based music recommendation by association discovery from film music. Proceedings of the 13th annual ACM international conference on Multimedia, 2005. [10] J.-C. Lin, W.-L. Wei, and H.-M. Wang, EMV-matchmaker: emotional temporal course modeling and matching for automatic music video generation. Proceedings of the 23rd ACM international conference on Multimedia, 2015. [11] O. Mogren, C-RNN-GAN: Continuous recurrent neural networks with adversarial training. arXiv preprint arXiv:1611.09904, 2016. [12] A. V. D. Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499, 2016. [13] S. Oore, I. Simon, S. Dieleman, D. Eck, and K. Simonyan, This time with feeling: learning expressive musical performance. Neural Computing and Applications, 1-13, 2018. [14] P. M. Todd, A connectionist approach to algorithmic composition. Computer Music Journal, 13(4), 27-43, 1989. [15] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, Attention is all you need. Advances in neural information processing systems, 2017.
Description:	碩士國立政治大學資訊科學系 105753023
Source URI:	http://thesis.lib.nccu.edu.tw/record/#G0105753023
Data Type:	thesis
DOI:	10.6814/NCCU201901153
Appears in Collections:	[資訊科學系] 學位論文

Files in This Item:

File	Size	Format
302301.pdf	3086Kb	Adobe PDF2	358	View/Open

All items in 政大典藏 are protected by copyright, with all rights reserved.

社群 sharing

著作權政策宣告 Copyright Announcement

1.本網站之數位內容為國立政治大學所收錄之機構典藏，無償提供學術研究與公眾教育等公益性使用，惟仍請適度，合理使用本網站之內容，以尊重著作權人之權益。商業上之利用，則請先取得著作權人之授權。
The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

2.本網站之製作，已盡力防止侵害著作權人之權益，如仍發現本網站之數位內容有侵害著作權人權益情事者，請權利人通知本網站維護人員(nccur@nccu.edu.tw)，維護人員將立即採取移除該數位著作等補救措施。
NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.

DSpace Software Copyright © 2002-2004 MIT & Hewlett-Packard / Enhanced by NTU Library IR team Copyright © - Feedback