| Reference: | [1] Abien Fred Agarap. Deep learning using rectified linear units (relu). arXiv preprint arXiv:1803.08375, 2018. [2] Léon Bottou. Stochastic gradient descent tricks. In Neural networks: Tricks of the trade, pages 421–436. Springer, 2012. [3] Léon Bottou, Frank E Curtis, and Jorge Nocedal. Optimization methods for largescale machine learning. Siam Review, 60(2):223–311, 2018. [4] Chris Chatfield and Mohammad Yar. Holtwinters forecasting: some practical issues. Journal of the Royal Statistical Society: Series D (The Statistician), 37(2):129–140, 1988. [5] J. X. Chen. The evolution of computing: Alphago. Computing in Science Engineering, 18(4):4–7, 2016. [6] Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The elements of statistical learning: data mining, inference, and prediction. Springer Science & Business Media, 2009. [7] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delving deep into rectifiers: Surpassing humanlevel performance on imagenet classification, 2015. [8] Mikael Henaff, Arthur Szlam, and Yann LeCun. Recurrent orthogonal networks and long memory tasks. arXiv preprint arXiv:1602.06662, 2016. [9] Geoffrey E Hinton, Simon Osindero, and YeeWhye Teh. A fast learning algorithm for deep belief nets. Neural computation, 18(7):1527–1554, 2006. [10] Sepp Hochreiter and Jürgen Schmidhuber. Long shortterm memory. Neural computation, 9(8):1735–1780, 1997. [11] ChihWeiHsu,ChihChungChang,ChihJenLin,etal.Apracticalguidetosupportvector classification, 2003. [12] Norden Eh Huang. HilbertHuang transform and its applications, volume 16. World Scientific, 2014. [13] Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning, pages 448–456. PMLR, 2015. [14] Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. An introduction to statistical learning, volume 112. Springer, 2013. [15] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 25:1097–1105, 2012. [16] Guohui Li, Zhichao Yang, and Hong Yang. Noise reduction method of underwater acoustic signals based on uniform phase empirical mode decomposition, amplitudeaware permutation entropy, and pearson correlation coefficient. Entropy, 20(12), 2018. [17] KR Muller, Sebastian Mika, Gunnar Ratsch, Koji Tsuda, and Bernhard Scholkopf. An introduction to kernelbased learning algorithms. IEEE transactions on neural networks, 12(2):181–201, 2001. [18] David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. Learning representations by backpropagating errors. nature, 323(6088):533–536, 1986. [19] Mike Schuster and Kuldip K Paliwal. Bidirectional recurrent neural networks. IEEE transactions on Signal Processing, 45(11):2673–2681, 1997. [20] Ohad Shamir and Tong Zhang. Stochastic gradient descent for nonsmooth optimization: Convergence results and optimal averaging schemes. In International conference on machine learning, pages 71–79. PMLR, 2013. [21] Alex J Smola and Bernhard Schölkopf. A tutorial on support vector regression. Statistics and computing, 14(3):199–222, 2004. [22] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1):1929–1958, 2014. [23] EugeneVorontsov,ChihebTrabelsi,SamuelKadoury,andChrisPal.Onorthogonalityand learning recurrent networks with long term dependencies. In International Conference on Machine Learning, pages 3570–3578. PMLR, 2017. [24] Xing Wan. Influence of feature scaling on convergence of gradient iterative algorithm. In Journal of Physics: Conference Series, volume 1213, page 032021. IOP Publishing, 2019. [25] Zhaohua Wu and Norden E Huang. Ensemble empirical mode decomposition: a noise assisted data analysis method. Advances in adaptive data analysis, 1(01):1–41, 2009. |