政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/136317

English | 正體中文 | 简体中文 | Post-Print筆數 : 27 | Items with full text/Total items : 111300/142216 (78%)
Visitors : 48322538 Online Users : 659

RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.

Scope

please add "double quotation mark" for query phrases to get precise results

please goto advance search for comprehansive author search

Adv. Search

Home ‧ Login ‧ Upload ‧ Help ‧ About ‧ Administer

Goto mobile version

政大機構典藏 > 商學院 > 統計學系 > 學位論文 > Item 140.119/136317

Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/136317

Title:	隨機梯度下降法的學習率與收斂探討 On learning rate and convergence of stochastic gradient descent methods
Authors:	陳建佑
Contributors:	翁久幸林士貴陳建佑
Keywords:	隨機梯度下降法平均隨機梯度下降法批次隨機梯度下降法線性模型順序回歸矩陣分解 Stochatic Gradient Descent Average Stochatic Gradient Descent Mini-Batch Stochastic Gradient Descent Linear model Ordinal Regression Matrix Factorization
Date:	2021
Issue Date:	2021-08-04 14:41:46 (UTC+8)
Abstract:	隨機梯度下降法(Stochastic gradient descent；SGD)，因其計算上只需使用到一次微分，在計算上較為簡易且快速，被廣泛應用於巨量資料及深度學習模型等的參數估計中。SGD的表現與學習率的設定息息相關，許多專家學者對學習率進行討論。本文透過模擬實驗，探討線性模型及順序變量的回歸模型中，多種學習率的設定與收斂情況之關係，最後將前述模擬的結果應用於結合順序回歸與矩陣分解法的推薦系統模型。由模擬實驗中觀察到學習率的設置不佳將影響理想收斂結果，於是提出新的學習率以獲得穩定結果。在後續的模擬實驗中亦驗證擁有穩定學習率衰退的隨機梯度下降法通常會得到較好的表現。最後利用此學習率設定進行實際資料試驗，亦獲得不錯之結果。 Stochastic gradient descent (SGD) is widely used for parameter estimation in big-data and deep-learning models. It is appealing because its requires only the first derivatives of the function. As the performance of SGD can be affected the learning rate, there were numerous studies about this issue. In this thesis, we discussed the parameter estimation and convergence of SGD for linear models and ordinal regression models through extensive simulation studies. Our simulation showed that improper learning rates can lead to poor convergence. So, we proposed a learning rate and found it performed well in linear models. Then, based on simulation results, we selected appropriate learning rates and employed it to a recommendation system model. Finally, we considered a real dataset and the results were reasonably well.
Reference:	[1] 陳冠廷（2020）。隨機梯度下降法對於順序迴歸模型估計之收斂研究及推薦系統應用。國立政治大學統計學系碩士論文，台北市。取自https://hdl.handle.net/11296/4c3be8 [2] Agresti, A. (2010). Analysis of ordinal categorical data (Vol. 656). John Wiley & Sons. [3] Amari, S. I., Park, H., & Fukumizu, K. (2000). Adaptive method of realizing natural gradient learning for multilayer perceptrons. Neural computation, 12(6), 1399-1409. [4] Dean, J., Corrado, G. S., Monga, R., Chen, K., Devin, M., Le, Q. V., ... & Ng, A. Y. (2012). Large scale distributed deep networks. [5] Funk, S. (2006). Netflix update: Try this at home. Retrived from https://sifter.org/simon/journal/20061211.html [6] Koren, Y. (2008, August). Factorization meets the neighborhood: a multifaceted collaborative filtering model. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 426-434). [7] Koren, Y., Bell, R., & Volinsky, C. (2009). Matrix factorization techniques for recommender systems. Computer, 42(8), 30-37. [8] Koren, Y., & Sill, J. (2011, October). Ordrec: an ordinal model for predicting personalized item rating distributions. In Proceedings of the fifth ACM conference on Recommender systems (pp. 117-124). [9] Kiefer, J., & Wolfowitz, J. (1952). Stochastic estimation of the maximum of a regression function. The Annals of Mathematical Statistics, 462-466. [10] McCullagh, P. (1980). Regression models for ordinal data. Journal of the Royal Statistical Society: Series B (Methodological), 42(2), 109-127. [11] L´eon Bottou and Olivier Bousquet. The tradeoffs of large scale learning. In J.C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 161–168. MIT Press, Cambridge, MA, 2008. [12] Polyak, B. T., & Juditsky, A. B. (1992). Acceleration of stochastic approximation by averaging. SIAM journal on control and optimization, 30(4), 838-855. [13] Robbins, H., & Monro, S. (1951). A stochastic approximation method. The Annals of Mathematical Statistics, 400-407. [14] Toulis, P., & Airoldi, E. M. (2017). Asymptotic and finite-sample properties of estimators based on stochastic gradients. Annals of Statistics, 45(4), 1694-1727. [15] Xu, W. (2011). Towards optimal one pass large scale learning with averaged stochastic gradient descent. arXiv preprint arXiv:1107.2490. [16] Zhang, T. (2004, July). Solving large scale linear prediction problems using stochastic gradient descent algorithms. In Proceedings of the twenty-first international conference on Machine learning (p. 116).
Description:	碩士國立政治大學統計學系 108354011
Source URI:	http://thesis.lib.nccu.edu.tw/record/#G0108354011
Data Type:	thesis
DOI:	10.6814/NCCU202100823
Appears in Collections:	[統計學系] 學位論文

Files in This Item:

File	Description	Size	Format
401101.pdf		1468Kb	Adobe PDF2	57	View/Open

All items in 政大典藏 are protected by copyright, with all rights reserved.

社群 sharing

著作權政策宣告 Copyright Announcement

1.本網站之數位內容為國立政治大學所收錄之機構典藏，無償提供學術研究與公眾教育等公益性使用，惟仍請適度，合理使用本網站之內容，以尊重著作權人之權益。商業上之利用，則請先取得著作權人之授權。
The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

2.本網站之製作，已盡力防止侵害著作權人之權益，如仍發現本網站之數位內容有侵害著作權人權益情事者，請權利人通知本網站維護人員(nccur@nccu.edu.tw)，維護人員將立即採取移除該數位著作等補救措施。
NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.

DSpace Software Copyright © 2002-2004 MIT & Hewlett-Packard / Enhanced by NTU Library IR team Copyright © - Feedback