Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation

>100 Views

March 18, 15

スライド概要

Presented at 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA 2014) (international conference)
Daichi Kitamura, Hiroshi Saruwatari, Satoshi Nakamura, Yu Takahashi, Kazunobu Kondo, Hirokazu Kameoka, "Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation," Proceedings of 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA 2014), pp.92-96, Nancy, France, May 2014.

profile-image

http://d-kitamura.net/links_en.html

シェア

またはPlayer版

埋め込む »CMSなどでJSが使えない場合

関連スライド

各ページのテキスト
1.

4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays Oral session 2 – Microphone array processing Divergence optimization in nonnegative matrix factorization with spectrogram restoration for multichannel signal separation Daichi Kitamura, Hiroshi Saruwatari, Satoshi Nakamura (Nara Institute of Science and Technology, Japan) Yu Takahashi, Kazunobu Kondo (Yamaha Corporation, Japan) Hirokazu Kameoka (The University of Tokyo, Japan)

2.

Outline • 1. Research background • 2. Conventional methods – – – – Directional clustering Nonnegative matrix factorization Supervised nonnegative matrix factorization Hybrid method • 3. Analysis of restoration ability – Generalized cost function – Analysis based on generation model • 4. Experiments • 5. Conclusions 2

3.

Outline • 1. Research background • 2. Conventional methods – – – – Directional clustering Nonnegative matrix factorization Supervised nonnegative matrix factorization Hybrid method • 3. Analysis of restoration ability – Generalized cost function – Analysis based on generation model • 4. Experiments • 5. Conclusions 3

4.

Research background • Signal separation have received much attention. Applications • Automatic music transcription • 3D audio system, etc. Separate! • Music signal separation based on nonnegative matrix factorization (NMF) is a very active research area. • Supervised NMF (SNMF) achieves the highest separation performance. • To improve its performance, SNMF-based multichannel signal separation method is required. We have proposed a new SNMF and its hybrid separation method for multichannel signals. 4

5.

Research background • Our proposed hybrid method Input stereo signal L R Spatial separation method (Directional clustering) SNMF-based separation method (SNMF with spectrogram restoration) Separated signal 5

6.

Research background • Divergence criterion in SNMF strongly affects separation performance. – Euclidian distance (EUC-distance) – Kullback-Leibler divergence (KL-divergence) – Itakura-Saito divergence (IS-divergence) • The optimal divergence for SNMF with spectrogram restoration is not apparent. We extend our new SNMF to a more generalized form. We give a theoretical analysis for the optimization of the divergence. 6

7.

Outline • 1. Research background • 2. Conventional methods – – – – Directional clustering NMF Supervised NMF Hybrid method • 3. Analysis of restoration ability Hybrid method Stereo signal Spatial separation Spectral separation Separated signal – Generalized cost function – Analysis based on generation model • 4. Experiments • 5. Conclusions 7

8.

Directional clustering [Araki, et al., 2007] • Directional clustering – Unsupervised spatial separation method Input signal (stereo) Right C C C C C C L C R C C L C R C R L C L C Time L R L R R L C R R R L C Binary mask Frequency Spectrogram Frequency Left Center Separated signal Entry-wise product 1 1 1 1 1 1 0 1 0 1 1 0 1 0 1 0 0 1 0 1 0 0 0 0 1 Center 0 0 0 0 1 Time Binary masking L R • Problems – Cannot separate sources in the same direction – Artificial distortion arises owing to the binary masking. 8

9.

NMF [Lee, et al., 2001] • NMF can extract significant spectral patterns. Frequency Amplitude Basis matrix Activation matrix (spectral patterns) (Time-varying gain) Frequency Observed matrix (spectrogram) Time Amplitude Time Basis Ω: Number of frequency bins 𝑇: Number of time frames 𝐾: Number of bases – Basis matrix has frequently-appearing spectral patterns in . 9

10.

Divergence criterion in NMF • Cost function in NMF : Entries of variable matrices and , respectively. – Euclidian distance (EUC-distance) – Kullback-Leibler divergence (KL-divergence) – Itakura-Saito divergence (IS-divergence) 10

11.

Supervised NMF [Smaragdis, et al., 2007] • SNMF – Supervised spectral separation method Training process Sample sound Sample sounds of target signal Supervised basis matrix (spectral dictionary) Optimize Separation process Mixed signal Target signal Other signal Fixed 11

12.

Hybrid method [Kitamura, et al., 2013] • We have proposed a new SNMF called SNMF with spectrogram restoration and its hybrid method. Hybrid method Spatial separation L Spectral separation R Directional clustering SNMF with spectrogram restoration 12

13.

SNMF with spectrogram restoration • SNMF with spectrogram restoration can separate the target and restore the spectrogram simultaneously. Non-target Target After SNMF with spectrogram restoration Frequency Frequency Spectrogram after directional clustering : Hole Time Time Non-target Target Supervised bases (Dictionary of the target) 13

14.

Decomposition model and cost function Decomposition model: Cost function: Supervised bases (Fixed) Penalty term Regularization term : Binary masking matrix obtained from directional clustering : Entries of matrices, : Binary complement, , and , respectively : Frobenius norm : Weighting parameters, • The divergence is defined at all grids except for the holes by using the Binary mask matrix . 14

15.

Outline • 1. Research background • 2. Conventional methods – – – – Directional clustering Nonnegative matrix factorization Supervised nonnegative matrix factorization Hybrid method • 3. Analysis of restoration ability – Generalized cost function – Analysis based on generation model • 4. Experiments • 5. Conclusions 15

16.

Generalized divergence: b -divergence • : -divergence [Eguchi, et al., 2001] – EUC-distance – KL-divergence – IS-divergence 16

17.

Decomposition model and cost function Decomposition model: Cost function: Supervised bases (Fixed) • We introduced -divergence to extend the cost function as a generalized form. 17

18.

Update rules • We can obtain the update rules for the optimization of the variables matrices , , and . Update rules: 18

19.

SNMF with spectrogram restoration • This SNMF has two tasks. SNMF with spectrogram restoration Source separation Basis extrapolation • The optimal divergence for source separation has been investigated. – KL-divergence ( ) is suitable for source separation. • No one investigates about the optimal divergence for basis extrapolation. • We analyze the optimal divergence for basis extrapolation based on a generation model in NMF. 19

20.

Analysis of extrapolation ability • The decomposition of NMF is equivalent to a maximum likelihood estimation, which assumes the generation model of the input data , implicitly. Cost function in NMF: IS-divergence KL-divergence EUC-distance Exponential dist. Poisson dist. Gaussian dist. : Maximum of data 20

21.

Analysis of extrapolation ability • To compare net extrapolation ability, we generate a random data , which obey each generation model. • Also, we prepare the binary-masked random data , and attempt to restore that. Training 100 bases is created. Restoration 21

22.

Analysis of extrapolation ability • Binary mask was randomly generated. – We generate two types of binary mask whose densities of holes are 75% and 98%. Input random data Binary-masked data Binary masking Restored data Restoration • SAR indicates the accuracy of restoration Entry-wise square [dB] 22

23.

Results of restoration analysis • Simulated result of the restoration ability 20 15 breg= 0 breg= 1 breg= 2 breg= 3 10 0 0 98%-binary-masked 20 5 Bad 25 75%-binary-masked SAR [dB] SAR [dB] Good 25 1 2 bNMF 3 15 breg= 0 breg= 1 breg= 2 breg= 3 10 5 4 0 0 1 2 bNMF 3 4 Optimal divergence for source separation (KL-divergence) • The optimal divergence for the basis extrapolation (restoration) is around ! 23

24.

Trade-off between separation and restoration Performance • The optimal divergence for SNMF with spectrogram restoration and its hybrid method is based on the trade-off between separation and restoration abilities. Separation 1 Restoration 2 3 0 -2 -4 -6 -8 -10 0 4 Amplitude [dB] 0 Amplitude [dB] 0 -2 -4 -6 -8 -10 0 Total performance of the hybrid method 1 2 3 4 Frequency [kHz] 5 Sparseness: strong 1 2 3 4 Frequency [kHz] 5 Sparseness: weak 24

25.

Outline • 1. Research background • 2. Conventional methods – – – – Directional clustering Nonnegative matrix factorization Supervised nonnegative matrix factorization Hybrid method • 3. Analysis of restoration ability – Generalized cost function – Analysis based on generation model • 4. Experiments • 5. Conclusions 25

26.

Experimental condition • Mixed signal includes four melodies (sources). • Three compositions of instruments – We evaluated the average score of 36 patterns. Left Center 2 Dataset No. 1 No. 2 No. 3 Melody 1 Oboe Trumpet Horn Melody 2 Midrange Bass Flute Piano Trombone Violin Harpsichord Fagotto Clarinet Piano Cello Target source 4 1 Right 3 Supervision signal 24 notes that cover all the notes in the target melody 26

27.

Experimental result • Signal-to-distortion ratio (SDR) – total quality of the separation, which includes the degree of separation and absence of artificial distortion. SDR [dB] Good 14 12 Proposed hybrid method ( 10 8 6 Unsupervised method Conventional SNMF 4 Supervised method 2 Bad 0 Directional Multichannel clustering NMF [Sawada] ) 0 1 KL-divergence Multichannel NMF is an integrated method. bNMF 2 3 4 EUC-distance 27

28.

Experiment for real-recorded signal • We recorded a binaural signal using dummy head Center • Reverberation time: 4 – 200 ms • The other conditions Left are the same as those in the previous 2 instantaneous mixture signal. 2.5 m Target signal Right 1 1.5 m 1.5 m 3 1.5 m Dummy head 28

29.

Experimental result • Result for real-recorded signals SDR [dB] Good 14 12 Proposed hybrid method ( 10 8 6 4 Unsupervised method Conventional SNMF Supervised method 2 Bad ) 0 Directional Multichannel clustering NMF [Sawada] 0 1 KL-divergence Multichannel NMF is an integrated method. bNMF 2 3 4 EUC-distance 29

30.

Conclusions • Restoration requires anti-sparse criterion ( b = 3 ) • There is a trade-off between separation and restoration abilities • Optimal divergence is EUC-distance for SNMF with spectrogram restoration – whereas KL-divergence is the best for conventional SNMF. Thank you for your attention! 30