>100 Views

March 18, 15

スライド概要

Presented at IEEE International Symposium on Signal Processing and Information Technology (ISSPIT 2013) (international conference)

Daichi Kitamura, Hiroshi Saruwatari, Kosuke Yagi, Kiyohiro Shikano, Yu Takahashi, Kazunobu Kondo, "Robust music signal separation based on supervised nonnegative matrix factorization with prevention of basis sharing," Proceedings of IEEE International Symposium on Signal Processing and Information Technology (ISSPIT 2013), pp.392-397, Athens, Greece, December 2013.

http://d-kitamura.net/links_en.html

1.

IEEE International Symposium on Signal Processing and Information Technology December 12-15, 2013 - Athens, Greece Session T.B3: Speech – Audio - Music Robust Music Signal Separation Based on Supervised Nonnegative Matrix Factorization with Prevention of Basis Sharing Daichi Kitamura, Hiroshi Saruwatari, Kosuke Yagi, Kiyohiro Shikano （Nara Institute of Science and Technology, Japan） Yu Takahashi, Kazunobu Kondo （Yamaha Corporation, Japan）

2.

Outline • 1. Research background • 2. Conventional method – Nonnegative matrix factorization – Supervised nonnegative matrix factorization – Problem of conventional method: basis sharing • 3. Proposed method – Penalized supervised nonnegative matrix factorization • Orthogonality penalty • Maximum-divergence penalty • 4. Experiments – Two-source case – Four-source case • 5. Conclusions 2

3.

Outline • 1. Research background • 2. Conventional method – Nonnegative matrix factorization – Supervised nonnegative matrix factorization – Problem of conventional method: basis sharing • 3. Proposed method – Penalized supervised nonnegative matrix factorization • Orthogonality penalty • Maximum-divergence penalty • 4. Experiments – Two-source case – Four-source case • 5. Conclusions 3

4.

Background • Sound signal separation – decomposes target source from an observed mixed signal. – Speech and noise, specific instrumental sound, etc. Extract! • Typical method for sound signal separation Frequency – is treated in the time-frequency domain. Separation First tone Time Spectrogram Second tone 4

5.

Outline • 1. Research background • 2. Conventional method – Nonnegative matrix factorization – Supervised nonnegative matrix factorization – Problem of conventional method: basis sharing • 3. Proposed method – Penalized supervised nonnegative matrix factorization • Orthogonality penalty • Maximum-divergence penalty • 4. Experiments – Two-source case – Four-source case • 5. Conclusions 5

6.

Nonnegative matrix factorization [Lee, et al., 2012] • Nonnegative matrix factorization (NMF) – is a sparse representation algorithm. – can extract significant features from the observed matrix. Frequency Amplitude Basis matrix Activation matrix (spectral patterns) (Time-varying gain) Frequency Observed matrix (spectrogram) Time Amplitude Time Basis Ω: Number of frequency bins 𝑇: Number of time frames 𝐾: Number of bases • It is difficult to cluster the bases as specific sources. 6

7.

Supervised NMF (SNMF) [Smaragdis, et al., 2007] • SNMF utilizes some sample sounds of the target. – Construct the trained basis matrix of the target sound. – Decompose into the target signal and other signal. Training process Ex. Musical scale Sample sounds of target signal Supervised basis matrix (spectral dictionary) Optimize Separation process Mixed signal Target signal Fixed Other signal 7

8.

Problem of SNMF • Basis sharing problem in SNMF – There is no constraint between and . – Other bases may also have the target spectral patterns. If also have the target basis… Target signal Estimated target signal Estimated other signals – The estimated target signal loses some of the target signal. – The cost function is only defined as the distance between and . 8

9.

Basis sharing problem: example of SNMF Separated by SNMF Mixed signal Only the target signal (oracle) 9

10.

Basis sharing problem: example of SNMF Separated by SNMF Mixed signal Only the target signal (oracle) 10

11.

Basis sharing problem: example of SNMF Separated by SNMF Separated signal (estimated) The estimated signal loses some of the target components because of the basis sharing problem. 11

12.

Outline • 1. Research background • 2. Conventional method – Nonnegative matrix factorization – Supervised nonnegative matrix factorization – Problem of conventional method: basis sharing • 3. Proposed method – Penalized supervised nonnegative matrix factorization • Orthogonality penalty • Maximum-divergence penalty • 4. Experiments – Two-source case – Four-source case • 5. Conclusions 12

13.

Proposed method • In SNMF, other basis matrix may have the same spectral patterns with supervised basis matrix . Basis sharing problem • Propose to make as different as possible from by introducing a penalty term in the cost function. Penalized SNMF (PSNMF) Mixed signal Fixed Target signal Optimize Other signal as different as possible from . 13

14.

Decomposition model and cost function Decomposition model: Supervised basis matrix (fixed) Cost function in SNMF: Generalized divergence function: -divergence [Eguchi, et al., 2001] 14

15.

Decomposition model and cost function Decomposition model: Supervised basis matrix (fixed) Cost function in SNMF: Cost function in PSNMF: Introduce a penalty term We propose two types of penalty terms. 15

16.

Orthogonality penalty • Orthogonality penalty is the optimization of that minimizes the inner product of matrices and . – If includes the similar basis to larger. , becomes • All the bases are normalized as one. • Introduce a weighting parameter . 16

17.

Maximum-divergence penalty • Maximum-divergence penalty is the optimization of that maximizes the divergence between and . – If includes the similar basis to becomes smaller. , the divergence • All the bases are normalized as one. • Introduce a weighting parameter and sensitivity parameter . 17

18.

Derivation of optimal variables in PSNMF • Derive the optimal variables . • Auxiliary function method – Optimization scheme that uses the upper bound function. – Design the auxiliary function for and as and . – Minimize the original cost functions by minimizing the auxiliary functions indirectly. 18

19.

Derivation of optimal variables in PSNMF where • The second and third terms become convex or concave function w.r.t. value. – Convex: Jensen’s inequality – Concave: tangent line inequality 19

20.

Derivation of optimal variables in PSNMF : auxiliary variable • Always becomes the convex function – Convex: Jensen’s inequality 20

21.

Derivation of optimal variables in PSNMF • Auxiliary functions and are designed as • The update rules for optimization are obtained by , and . 21

22.

Update rules for optimization of PSNMF • Update rules with orthogonality penalty where, 22

23.

Update rules for optimization of PSNMF • Update rules with maximum-divergence penalty where, 23

24.

Outline • 1. Research background • 2. Conventional method – Nonnegative matrix factorization – Supervised nonnegative matrix factorization – Problem of conventional method: basis sharing • 3. Proposed method – Penalized supervised nonnegative matrix factorization • Orthogonality penalty • Maximum-divergence penalty • 4. Experiments – Two-source case – Four-source case • 5. Conclusions 24

25.

Experimental conditions • Produced four melodies using a MIDI synthesizer. • Used the same MIDI sounds of the target instruments containing two octave notes as a supervision sound. Training sound Two octave notes that cover all the notes of the target signal. • Evaluation in two-source case and four-source case. – There are 12 combinations in the two-source case, and 4 patterns in the four-source case. 25

26.

Experimental conditions Observed signal Mixed 2 or 4 signals as the same power The same MIDI sounds of the target signal Training signal containing two octave notes Divergence All combinations of criteria Supervised bases : 100 Number of bases Other bases : 50 Parameters Experimentally determined Methods Conventional SNMF, Proposed PSNMF • Evaluation scores [Vincent, 2006] – Source-to-distortion ratio (SDR) – SDR indicates the total quality of separated signal. 26

27.

Experimental results: two-source-case • Average scores of 12 combinations 16 16 14 14 PSNMF (Max.) 8 Conv. SNMF 6 PSNMF (Max.) 14 12 12 10 10 8 6 Conv. SNMF 6 4 2 2 2 0 0 0 1 2 PSNMF (Ortho.) Conv. 8 SNMF 4 0 PSNMF (Max.) 16 SDR [dB] SDR [dB] 10 PSNMF (Ortho.) SDR [dB] 12 PSNMF (Ortho.) 4 0 1 2 0 1 2 – Conventional SNMF cannot achieve high separation accuracy because of the basis sharing problem. – Proposed method outperforms conventional SNMF. 27

28.

Experimental results: four-source-case 14 14 12 12 PSNMF (Max.) 6 14 PSNMF (Max.) PSNMF (Ortho.) 10 Conv. 8 SNMF 6 6 4 4 4 2 2 2 0 0 0 0 1 2 0 1 2 PSNMF (Max.) PSNMF (Ortho.) 10 Conv. 8 SNMF 12 SDR [dB] PSNMF 10 (Ortho.) Conv. 8 SNMF SDR [dB] SDR [dB] • Average scores of 4 combinations 0 1 2 – PSNMF outperforms the conventional method. 28

29.

Example of separation (Cello & Oboe) Separated by SNMF Mixed signal Separated by PSNMF (Ortho.) Cello signal 29

30.

Conclusions • Conventional supervised NMF has a basis sharing problem that degrades the separation performance. • We propose to add a penalty term, which forces the other bases to become uncorrelated with supervised bases, in the cost function. Penalized supervised NMF • Penalized supervised NMF can achieve the high separation accuracy. Thank you for your attention! 30