Online divergence switching for superresolution-based nonnegative matrix factorization

>100 Views

March 18, 15

#nmf #source separation #direction of arrivals #music #Audio signal processing #Nonnegative Matrix Factorization #Supervised Nonnegative Matrix Factorization #Directional Clustering #Hybrid Method

スライド概要

Presented at 2014 RISP International Workshop on Nonlinear Circuits, Communications and Signal Processing (NCSP 2014) (international conference)
Daichi Kitamura, Hiroshi Saruwatari, Satoshi Nakamura, Yu Takahashi, Kazunobu Kondo, Hirokazu Kameoka, "Online divergence switching for superresolution-based nonnegative matrix factorization," Proceedings of 2014 RISP International Workshop on Nonlinear Circuits, Communications and Signal Processing (NCSP 2014), pp.485-488, Hawaii, USA, March 2014 (Student Paper Award).

Daichi Kitamura

@d-kitamura

スライド一覧

http://d-kitamura.net/links_en.html

またはPlayer版

埋め込む »CMSなどでJSが使えない場合

ダウンロード

ダウンロード(pdf - 1.55MB)

ダウンロード(pptx - 1.5MB)

関連スライド

音源分離技術の基礎と応用～音源分離ﾁｮｯﾄﾜｶﾙになるための手引き～

source separation nmf music bss ica ilrma direction of arrivals deep neural network audio signal processing deep learning

Daichi Kitamura 106.2K

音源分離における音響モデリング（Acoustic modeling in audio source separation）

nmf source separation music bss ica ilrma optimization audio signal processing model

Daichi Kitamura 44.9K

Windowsマシン上でVisual Studio Codeとpipenvを使ってPythonの仮想実行環境を構築する方法（Jupyter notebookも）

python install jupyter visual studio code pipenv

Daichi Kitamura 37.6K

独立低ランク行列分析に基づく音源分離とその発展（Audio source separation based on independent low-rank matrix analysis and its extensions）

nmf source separation music bss ica ilrma idlma deep neural network spectrogram consistency

Daichi Kitamura 30K

独立低ランク行列分析に基づくブラインド音源分離（Blind source separation based on independent low-rank matrix analysis）

nmf source separation music bss ica ilrma

Daichi Kitamura 11.2K

音響メディア信号処理における独立成分分析の発展と応用, History of independent component analysis for sound media signal processing and its applications

nmf source separation bss ica

Daichi Kitamura 10K

各ページのテキスト

2014 RISP International Workshop on Nonlinear Circuits, Communications and Signal Processing Speech Analysis(2),2PM2-2 Online Divergence Switching for Superresolution-Based Nonnegative Matrix Factorization Daichi Kitamura, Hiroshi Saruwatari, Satoshi Nakamura (Nara Institute of Science and Technology, Japan) Yu Takahashi, Kazunobu Kondo (Yamaha Corporation, Japan) Hirokazu Kameoka (The University of Tokyo, Japan)

Outline • 1. Research background • 2. Conventional methods – – – – Nonnegative matrix factorization Supervised nonnegative matrix factorization Directional clustering Hybrid method • 3. Proposed method – Online divergence switching for hybrid method • 4. Experiments • 5. Conclusions 2

Research background • Music signal separation technologies have received much attention. Applications • Automatic music transcription • 3D audio system, etc. Separate! • Music signal separation based on nonnegative matrix factorization (NMF) is a very active research area. • The separation performance of supervised NMF (SNMF) markedly degrades for the case of many source mixtures. We have been proposed a new hybrid separation method for stereo music signals. 4

Research background • Our proposed hybrid method Input stereo signal L R Spatial separation method (Directional clustering) SNMF-based separation method (Superresolution-based SNMF) Separated signal 5

Research background • Optimal divergence criterion in superresolution-based SNMF depends on the spatial conditions of the input signal. • Our aim in this presentation We propose a new optimal separation scheme for this hybrid method to separate the target signal with high accuracy for any types of the spatial condition. 6

NMF [Lee, et al., 2001] • NMF – is a sparse representation algorithm. – can extract significant features from the observed matrix. Frequency Amplitude Basis matrix Activation matrix (spectral patterns) (Time-varying gain) Frequency Observed matrix (spectrogram) Time Amplitude Time Basis Ω: Number of frequency bins 𝑇: Number of time frames 𝐾: Number of bases 8

Optimization in NMF • The variable matrices and are optimized by minimization of the divergence between and . Cost function: : Entries of variable matrices and , respectively. • Euclidian distance (EUC-distance) and KullbuckLeibler divergence (KL-divergence) are often used for the divergence in the cost function. • In NMF-based separation, KL-divergence based cost function achieves high separation performance. 9

10.

SNMF [Smaragdis, et al., 2007] • SNMF utilizes some sample sounds of the target. – Construct the trained basis matrix of the target sound – Decompose into the target signal and other signal Training process Ex. Musical scale Sample sounds of target signal Supervised basis matrix (spectral dictionary) Optimize Separation process Mixed signal Target signal Fixed Other signal 10

11.

Problem of SNMF • The separation performance of SNMF markedly degrades when many interference sources exist. Two-source case Separate Five-source case Separate Residual components 11

12.

Directional clustering [Araki, et al., 2007] • Directional clustering – utilizes differences between channels as a separation cue. – Is equal to binary masking in the spectrogram domain. Input signal (stereo) Right C C C C C C L C R C C L C R C R L C L C Time L R L R R L C R R R L C Binary mask Frequency Spectrogram Frequency Left Center Separated signal Entry-wise product 1 1 1 1 1 1 0 1 0 1 1 0 1 0 1 0 0 1 0 1 0 0 0 0 1 Center 0 0 0 0 1 Time Binary masking L R • Problems – Cannot separate sources in the same direction – Artificial distortion arises owing to the binary masking. 12

13.

Hybrid method [D. Kitamura, et al., 2013] • We have proposed a new SNMF called superresolution-based SNMF and its hybrid method. • Hybrid method consists of directional clustering and superresolution-based SNMF. Hybrid method L Spatial separation Spectral separation Directional clustering Superresolutionbased SNMF R 13

14.

Superresolution-based SNMF Other direction Target direction Time Directional clustering Separated cluster Reconstructed spectrogram Frequency Frequency Input spectrogram Frequency • This SNMF reconstructs the spectrogram obtained from directional clustering using supervised basis extrapolation. : Chasms Time Time Superresolutionbased SNMF 14

15.

Superresolution-based SNMF • Spectral chasms owing to directional clustering Frequency Separated cluster Chasms : Chasm Time Supervised basis Treat these chasms as an unseen observations Extrapolate the fittest bases … 15

16.

Directional clustering Time Frequency Separated cluster Binary masking Time Frequency Reconstructed data Superresolutionbased SNMF Extrapolate Time Supervised spectral bases Target (a) Input signal Left Frequency of source component Target Interference Right Center Direction (b) After directional clustering z Left Frequency of source component Frequency Observed spectrogram Frequency of source component Superresolution-based SNMF Center Direction Right (c) After superresolutionbased SNMF Left Center Direction Extrapolated components Right 16

17.

Decomposition model and cost function Decomposition model: Supervised bases (Fixed) Cost function: Penalty term Regularization term : Index matrix obtained from directional clustering : Entries of matrices, : Binary complement, , and : Weighting parameters, , respectively : Frobenius norm • The divergence is defined at all grids except for the chasms by using the index matrix . 17

18.

Update rules • We can obtain the update rules for the optimization of the variables matrices , , and . Update rules: 18

19.

20.

Consideration for optimal divergence • Separation performance of conventional SNMF KL-divergence EUC-distance However… • Superresolution-based SNMF KL-divergence ? EUC-distance – Optimal divergence depends on the amount of spectral chasms. 20

21.

Consideration for optimal divergence • Superresolution-based SNMF has two tasks. Superresolutionbased SNMF Signal separation Basis extrapolation • Abilities of each divergence KL-divergence EUC-distance Signal separation (Very good) (Good) Basis extrapolation (Poor) (Good) 21

22.

Consideration for optimal divergence • Spectrum decomposed by NMF with KL-divergence tends to become sparse compared with that decomposed by NMF with EUC-distance. 0 -2 -4 -6 -8 -10 0 EUC-distance Amplitude [dB] Amplitude [dB] 0 -2 -4 -6 -8 -10 0 KL-divergence 1 2 3 4 Frequency [kHz] 5 1 2 3 4 Frequency [kHz] 5 • Sparse basis is not suitable for extrapolating using observable data. 22

23.

Consideration for optimal divergence • The optimal divergence for superresolution-based SNMF depends on the amount of spectral chasms because of the trade-off between separation and extrapolation abilities. Performance Total performance Separation Extrapolation KL-divergence EUC-distance Sparse Sparseness: 0 -2 -4 -6 -8 -10 0 Amplitude [dB] Amplitude [dB] 0 -2 -4 -6 -8 -10 0 Anti-sparse 1 2 3 4 Frequency [kHz] Strong 5 1 2 3 4 Frequency [kHz] Weak 5 23

24.

Consideration for optimal divergence • The optimal divergence for superresolution-based SNMF depends on the amount of spectral chasms. : Chasms Time If the chasms are not exist Frequency Frequency If there are many chasms : Chasms Time The extrapolation ability is required. The separation ability is required. EUC-distance should be used. KL-divergence should be used. 24

25.

Hybrid method for online input data • When we consider applying the hybrid method to online input data… Binary mask Frequency Directional clustering Observed spectrogram Time Online binary-masked spectrogram 25

26.

Hybrid method for online input data Frequency • We divide the online spectrogram into some block parts. Time In parallel Superresolutionbased SNMF Superresolutionbased SNMF Superresolutionbased SNMF 26

27.

Online divergence switching • We calculate the rate of chasms in each block part. Threshold value The chasms are not exist so much. Superresolutionbased SNMF with KL-divergence Threshold value There are many chasms. Superresolutionbased SNMF with EUC-distance 27

28.

Procedure of proposed method 28

29.

30.

Experimental conditions • We used stereo-panning signals. • Mixture of four instruments generated by MIDI synthesizer • We used the same type of MIDI sounds of the target instruments as supervision for training process. Left Center ２４１ Target source Right ３ Supervision sound Two octave notes that cover all the notes of the target signal 30

31.

Experimental conditions • We compared three methods. – Hybrid method using only EUC-distance-based SNMF (Conventional method 1) – Hybrid method using only KL-divergence-based SNMF (Conventional method 2) – Proposed hybrid method that switches the divergence to the optimal one (Proposed method) • We used signal-to-distortion ratio (SDR) as an evaluation score. – SDR indicates the total separation accuracy, which includes both of quality of separated target signal and degree of separation. 31

32.

Experimental result • Average SDR scores for each method, where the four instruments are shuffled with 12 combinations. Good Bad Conventional method 1 Conventional method 2 Proposed method 8.0 8.5 9.0 9.5 SDR [dB] 10.0 • Proposed method outperforms other methods. 32

33.

Conclusions • We propose a new divergence switching scheme for superresolution-based SNMF. • This method is for the online input signal to separate using optimal divergence in NMF. • The proposed method can be used for any types of the spatial condition of sources, and separates the target signal with high accuracy. Thank you for your attention! 33