>100 Views

September 16, 16

スライド概要

Daichi Kitamura, Nobutaka Ono, "Efficient initialization for nonnegative matrix factorization based on nonnegative independent component analysis," The 15th International Workshop on Acoustic Signal Enhancement (IWAENC 2016), Xi'an, China, September 2016.

http://d-kitamura.net/links_en.html

1.

IWAENC 2016, Sept. 16, 08:30 - 10:30, Session SPS-II - Student paper competition 2 SPC-II-04 Efficient initialization for NMF based on nonnegative ICA Daichi Kitamura (SOKENDAI, Japan) Nobutaka Ono (NII/SOKENDAI, Japan)

2.

Research background: what is NMF? • Nonnegative matrix factorization (NMF) [Lee, 1999] Amplitude Frequency Frequency – Dimensionality reduction with nonnegative constraint – Unsupervised learning extracting meaningful features – Sparse decomposition (implicitly) Time Input data matrix (power spectrogram) Time Amplitude Basis matrix (spectral patterns) Activation matrix (time-varying gains) : # of rows : # of columns : # of bases 2/19

3.

Research background: how to optimize? • Optimization in NMF – Define a cost function (data fidelity) and minimize it – No closed-form solution for and – Efficient iterative optimization • Multiplicative update rules (auxiliary function technique) [Lee, 2001] (when the cost function is a squared Euclidian distance) – Initial values for all the variables are required. 3/19

4.

Problem and motivation • Results of all applications using NMF always depend the initialization of and . More than 1 dB 10 8 6 4 Different random seeds Rand10 Rand9 Rand8 Rand7 Rand6 Rand5 Rand4 0 Rand3 2 Rand2 Poor 12 Rand1 Good SDR improvement [dB] – Ex. source separation via full-supervised NMF [Smaragdis, 2007] • Motivation: Initialization method that always gives us a good performance is desired. 4/19

5.

Conventional NMF initialization techniques • With random values (not focused here) – Directly use random values – Search good values via genetic algorithm [Stadlthanner, 2006], [Janecek, 2011] – Clustering-based initialization [Zheng, 2007], [Xue, 2008], [Rezaei, 2011] • Cluster input data into initial basis vectors. clusters, and set the centroid vectors to • Without random values – PCA-based initialization [Zhao, 2014] • Apply PCA to input data , extract orthogonal bases and coefficients, and set their absolute values to the initial bases and activations. – SVD-based initialization [Boutsidis, 2008] • Apply a special SVD (nonnegative double SVD) to input data and set nonnegative left and right singular vectors to the initial values. 5/19

6.

Bases orthogonality? • Are orthogonal bases really better for NMF? – PCA and SVD are orthogonal decompositions. – A geometric interpretation of NMF [Donoho, 2003] • The optimal bases in NMF are “along the edges of a convex cone” that includes all the observed data points. Convex cone Meaningless areas Data points Edge Optimal bases Orthogonal bases satisfactory for representing all the data points have a risk to represent a meaningless area Tight bases cannot represent all the data points – Orthogonality might not be a good initial value for NMF. 6/19

7.

Proposed method: utilization of ICA • What can we do from only the input data ? – Independent component analysis (ICA) [Comon, 1994] – ICA extracts non-orthogonal bases • that maximize a statistical independence between sources. – ICA estimates sparse sources • when we assume a super-Gaussian prior. – Objectives: • 1. Deeper minimization • 2. Faster convergence • 3. Better performance Value of cost function in NMF • Propose to use ICA bases and estimated sources as initial NMF values Deeper minimization Faster convergence Number of update iterations in NMF 7/19

8.

Proposed method: concept • The input data matrix is a mixture of some sources. are mixed via Input data matrix Mixing matrix ICA … bases , then observed as Source matrix … sources in … – – ICA can estimate a demixing matrix independent sources . Input data matrix PCA NICA Nonnegativization PCA matrix for dimensionality reduction Mutually independent and the Initial values • PCA for only the dimensionality reduction in NMF • Nonnegative ICA for taking nonnegativity into account • Nonnegativization for ensuring complete nonnegativity NMF 8/19

9.

Nonnegative constrained ICA • Nonnegative ICA (NICA) [Plumbley, 2003] – estimates demixing matrix so that all of the separated sources become nonnegative. – finds rotation matrix for pre-whitened mixtures . Observed Pre-whitened Whitening w/o centering Separated Rotation (demixing) – Steepest gradient descent for estimating Cost function: where 9/19

10.

Combine PCA for dimensionality reduction • Dimensionality reduction via PCA ICA bases Sources Eigenvalues High Low Rows are eigenvectors of has top- eigenvectors Zero matrix • NMF variables obtained from the estimates of NICA – Support that – then we have , Rotation matrix estimated by NICA Basis matrix Activation matrix 10/19

11.

Nonnegativization • Even if we use NICA, there is no guarantee that – obtained (sources) becomes completely nonnegative because of the dimensionality reduction by PCA. – As for the obtained basis (ICA bases), nonnegativity is not assumed in NICA. • Take a “nonnegativization” for obtained – Method 1: – Method 2: – Method 3: and : Correlation between and Correlation between and • where and are scale fitting coefficient that depend on a divergence of following NMF 11/19

12.

Experiment: conditions • Power spectrogram of mixture with Vo. and Gt. Frequency [kHz] – Song: “Actions – One Minute Smile” from SiSEC2015 – Size of power spectrogram: 2049 x 1290 (60 sec.) – Number of bases: Time [s] 12/19

13.

Experiment: results of NICA • Convergence of cost function in NICA Value of cost function in NICA 0.6 0.5 Steepest gradient descent 0.4 0.3 0.2 0.1 0.0 0 500 1500 1000 Number of iterations 2000 13/19

14.

Experiment: results of Euclidian NMF • Convergence of EU-NMF Rand1~10 are based on random initialization with different seeds. Processing time for initialization NICA: 4.36 s PCA: 0.98 s SVD: 2.40 s EU-NMF: 12.78 s (for 1000 iter.) 14/19

15.

Experiment: results of Kullback-Leibler NMF • Convergence of KL-NMF Rand1~10 are based on random initialization with different seeds. Processing time for initialization NICA: 4.36 s PCA: 0.98 s SVD: 2.40 s PCA KL-NMF: 48.07 s (for 1000 iter.) Proposed methods 15/19

16.

Experiment: results of Itakura-Saito NMF • Convergence of IS-NMF x106 Rand1~10 are based on random initialization with different seeds. Processing time for initialization NICA: 4.36 s PCA: 0.98 s SVD: 2.40 s PCA IS-NMF: 214.26 s (for 1000 iter.) Proposed methods 16/19

17.

Experiment: full-supervised source separation • Full-supervised NMF [Smaragdis, 2007] – Simply use pre-trained sourcewise bases for separation Training stage , Initialized by conventional or proposed method Cost functions: Pre-trained bases (fixed) Separation stage Initialized based on the correlations between and or Cost function: 17/19

18.

Experiment: results of separation • Two sources separation using full-supervised NMF – SiSEC2015 MUS dataset (professionally recorded music) – Averaged SDR improvements of 15 songs Conv. 10 8 6 4 0 NICA1 NICA2 NICA3 PCA SVD Rand1 Rand2 Rand3 Rand4 Rand5 Rand6 Rand7 Rand8 Rand9 Rand10 2 Separation performance for source 1 5 Prop. Conv. 4 3 2 1 0 NICA1 NICA2 NICA3 PCA SVD Rand1 Rand2 Rand3 Rand4 Rand5 Rand6 Rand7 Rand8 Rand9 Rand10 Prop. SDR improvement [dB] SDR improvement [dB] 12 Separation performance for source 2 18/19

19.

Conclusion • Proposed efficient initialization method for NMF • Utilize statistical independence for obtaining nonorthogonal bases and sources – The orthogonality may not be preferable for NMF. • The proposed initialization gives – deeper minimization – faster convergence – better performance for full-supervised source separation Thank you for your attention! 19/19