Hybrid multichannel signal separation using supervised nonnegative matrix factorization with spectrogram restoration

109 Views

March 18, 15

スライド概要

Presented at Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2014 (APSIPA 2014, international conference)
Daichi Kitamura, Hiroshi Saruwatari, Satoshi Nakamura, Yu Takahashi, Kazunobu Kondo, Hirokazu Kameoka, "Hybrid multichannel signal separation using supervised nonnegative matrix factorization with spectrogram restoration," Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference 2014 (APSIPA 2014), Siem Reap, Cambodia, December 2014 (invited paper).

profile-image

http://d-kitamura.net/links_en.html

シェア

またはPlayer版

埋め込む »CMSなどでJSが使えない場合

関連スライド

各ページのテキスト
1.

Asia-Pacific Signal and Information Processing Association ASC 2014 Special session – Recent Advances in Audio and Acoustic Signal processing Hybrid Multichannel Signal Separation Using Supervised Nonnegative Matrix Factorization Daichi Kitamura, (The Graduate University for Advanced Studies, Japan) Hiroshi Saruwatari, (The University of Tokyo, Japan) Satoshi Nakamura, (Nara Institute of Science and Technology, Japan) Yu Takahashi, (Yamaha Corporation, Japan) Kazunobu Kondo, (Yamaha Corporation, Japan) Hirokazu Kameoka, (The University of Tokyo, Japan)

2.

Outline • 1. Research background • 2. Conventional methods – Nonnegative matrix factorization – Supervised nonnegative matrix factorization – Multichannel NMF • 3. Proposed method – SNMF with spectrogram restoration and its Hybrid method • 4. Experiments – Closed data experiment – Open data experiment • 5. Conclusions 2

3.

Outline • 1. Research background • 2. Conventional methods – Nonnegative matrix factorization – Supervised nonnegative matrix factorization – Multichannel NMF • 3. Proposed method – SNMF with spectrogram restoration and its Hybrid method • 4. Experiments – Closed data experiment – Open data experiment • 5. Conclusions 3

4.

Research background • Signal separation have received much attention. Applications • Automatic music transcription • 3D audio system, etc. Separate! • Music signal separation based on nonnegative matrix factorization (NMF) is a very active research area. • Supervised NMF (SNMF) achieves the highest separation performance. • To improve its performance, SNMF-based multichannel signal separation method is required. Separate the target signal from multichannel signals with high accuracy. 4

5.

Outline • 1. Research background • 2. Conventional methods – Nonnegative matrix factorization – Supervised nonnegative matrix factorization – Multichannel NMF • 3. Proposed method – SNMF with spectrogram restoration and its Hybrid method • 4. Experiments – Closed data experiment – Open data experiment • 5. Conclusions 5

6.

NMF [Lee, et al., 2001] • NMF can extract significant spectral patterns. Frequency Amplitude Basis matrix Activation matrix (spectral patterns) (Time-varying gain) Frequency Observed matrix (spectrogram) Time Amplitude Time Basis Ω: Number of frequency bins 𝑇: Number of time frames 𝐾: Number of bases – Basis matrix has frequently-appearing spectral patterns in . 6

7.

Supervised NMF [Smaragdis, et al., 2007] • SNMF – Supervised spectral separation method Training process Sample sound Sample sounds of target signal Supervised basis matrix (spectral dictionary) Optimize Separation process Mixed signal Target signal Other signal Fixed 7

8.

Problems of SNMF • SNMF is only for a single-channel signal – For multichannel signal, SNMF cannot use information between channels. • When many interference sources exist, separation performance of SNMF markedly degrades. Separate Residual components 8

9.

Multichannel NMF [Sawada, et al., 2013] • Multichannel NMF – is a natural extension of NMF for a multichannel signal – uses spatial information for the clustering of bases to achieve the unsupervised separation task. Microphone array Problems: Multichannel NMF involve strong dependence on initial values and lack robustness. 9

10.

Outline • 1. Research background • 2. Conventional methods – Nonnegative matrix factorization – Supervised nonnegative matrix factorization – Multichannel NMF • 3. Proposed method – Motivation and strategy – SNMF with spectrogram restoration and its Hybrid method • 4. Experiments – Closed data experiment – Open data experiment • 5. Conclusions 10

11.

Motivation and strategy • Sawada’s multichannel NMF – is unified method to solve spatial and spectral separations. – Maximizes a likelihood: Target Spatial direction of target signal Other Source components Observed spectrograms of all signals – For supervised situation, target spectral patterns is given. – Too much difficult to solve (lack robustness) – Computationally inefficient (much computational time) 11

12.

Motivation and strategy • Proposed hybrid method – divides the problems as follows: Approximation Unsupervised spatial separation Supervised spectral separation Classical D.O.A. estimation SNMF-based method – The spatial separation should be carried out with classical D.O.A. estimation methods. • These methods are very efficient and stable. – Divide and conquer method 12

13.

Directional clustering [Araki, et al., 2007] • Directional clustering – Unsupervised spatial separation method – k-means clustering (fast and stable) Input signal (stereo) Right C C C C C C L C R C C L C R C R L C L C Time L R L R R L C R R R L C Binary mask Frequency Spectrogram Frequency Left Center Separated signal Entry-wise product 1 1 1 1 1 1 0 1 0 1 1 0 1 0 1 0 0 1 0 1 0 0 0 0 1 Center 0 0 0 0 1 Time Binary masking L R • Problems – Artificial distortion arises owing to the binary masking. 13

14.

Proposed method: hybrid separation • Hybrid separation method Input stereo signal L R Spatial separation method (Directional clustering) SNMF-based separation method (SNMF with spectrogram restoration) Separated signal 14

15.

SNMF with spectrogram restoration Spectral holes (lost components) Frequency Separated cluster : Holes Time The proposed SNMF treats these holes as unseen observations Supervised basis (dictionary of target signal) Extrapolate the fittest bases Fix up … 15

16.

Directional clustering Time Frequency Separated cluster Binary masking Time Frequency Reconstructed data SNMF with spectrogram restoration Extrapolate Time Supervised spectral bases Target (a) Input signal Left Frequency of source component Target Interference Right Center Direction (b) After directional clustering z Left Frequency of source component Frequency Observed spectrogram Frequency of source component SNMF with spectrogram restoration Center Direction Right (c) After superresolutionbased SNMF Left Center Direction Extrapolated components Right 16

17.

Decomposition model and cost function Decomposition model: Cost function: Supervised bases (Fixed) : Binary masking matrix obtained from directional clustering : Entries of matrices, : Binary complement, , and , respectively : Frobenius norm : Weighting parameters, • The divergence is defined at all grids except for the holes by using the Binary mask matrix . 17

18.

Decomposition model and cost function Decomposition model: Cost function: Supervised bases (Fixed) Binary index to exclude the holes : Binary masking matrix obtained from directional clustering : Entries of matrices, : Binary complement, , and , respectively : Frobenius norm : Weighting parameters, • The divergence is defined at all grids except for the holes by using the Binary mask matrix . 18

19.

Decomposition model and cost function Decomposition model: Cost function: Supervised bases (Fixed) Binary index to exclude the holes Regularization term : Binary masking matrix obtained from directional clustering : Entries of matrices, : Binary complement, , and , respectively : Frobenius norm : Weighting parameters, • The divergence is defined at all grids except for the holes by using the Binary mask matrix . 19

20.

Decomposition model and cost function Decomposition model: Cost function: Supervised bases (Fixed) Binary index to exclude the holes Penalty term Regularization term [Kitamura, et al. 2014] : Binary masking matrix obtained from directional clustering : Entries of matrices, : Binary complement, , and , respectively : Frobenius norm : Weighting parameters, • The divergence is defined at all grids except for the holes by using the Binary mask matrix . 20

21.

Generalized divergence: b -divergence • : -divergence [Eguchi, et al., 2001] – EUC-distance – KL-divergence The best criterion for signal separation [Kitamura, et al., 2014] – IS-divergence 21

22.

Decomposition model and cost function Decomposition model: Cost function: Supervised bases (Fixed) • We used two -divergences for the main cost and the regularization cost as and . 22

23.

Update rules • We can obtain the update rules for the optimization of the variables matrices , , and . Update rules: 23

24.

Outline • 1. Research background • 2. Conventional methods – Nonnegative matrix factorization – Supervised nonnegative matrix factorization – Multichannel NMF • 3. Proposed method – SNMF with spectrogram restoration and its Hybrid method • 4. Experiments – Closed data experiment – Open data experiment • 5. Conclusions 24

25.

Experimental condition • Mixed signal includes four melodies (sources). • Three compositions of instruments – We evaluated the average score of 36 patterns. Left Center 2 Dataset No. 1 No. 2 No. 3 Melody 1 Oboe Trumpet Horn Melody 2 Midrange Bass Flute Piano Trombone Violin Harpsichord Fagotto Clarinet Piano Cello Target source 4 1 Right 3 Supervision signal 24 notes that cover all the notes in the target melody 25

26.

Experimental result: closed data • Signal-to-distortion ratio (SDR) – total quality of the separation, which includes the degree of separation and absence of artificial distortion. SDR [dB] Good 14 12 Proposed hybrid method 10 8 6 4 Conventional SNMF (single-channel SNMF) 2 Bad 0 Directional Supervised clustering Multichannel NMF [Sawada] 0 1 KL-divergence bNMF 2 3 EUC-distance 4 26

27.

SNMF with spectrogram restoration • SNMF with spectrogram restoration has two tasks. SNMF with spectrogram restoration Source separation Basis extrapolation • The optimal divergence for source separation is KLdivergence ( ). • In contrast, a divergence with higher value is suitable for the basis extrapolation. 27

28.

Trade-off: separation and restoration Performance • The optimal divergence for SNMF with spectrogram restoration and its hybrid method is based on the trade-off between separation and restoration abilities. Separation 1 Restoration 2 3 0 -2 -4 -6 -8 -10 0 4 Amplitude [dB] 0 Amplitude [dB] 0 -2 -4 -6 -8 -10 0 Total performance of the hybrid method 1 2 3 4 Frequency [kHz] 5 Sparseness: strong 1 2 3 4 Frequency [kHz] 5 Sparseness: weak 28

29.

Experimental condition • Closed data experiment – used different Tone generator for training and test signals Provided by Tone generator A Supervision signal 24 notes that cover all the notes in the target melody Provided by Tone generator B (more real sound) Left Center 2 + back ground noise (SNR = 10 dB) Target source 4 1 Right 3 29

30.

Experimental result: open data • Signal-to-distortion ratio (SDR) – total quality of the separation, which includes the degree of separation and absence of artificial distortion. SDR [dB] Good 10 8 6 4 2 Proposed hybrid method Directional clustering Conventional SNMF (single-channel SNMF) 0 -2 Bad -4 Supervised Multichannel NMF [Sawada] 0 1 KL-divergence bNMF 2 3 EUC-distance 4 30

31.

Conclusions • We proposed a hybrid multichannel signal separation method combining directional clustering and SNMF with spectrogram restoration. • There is a trade-off between separation and restoration abilities. You can hear a demonstration from my HP! Thank you for your attention! 31