---
title: 実環境を見据えた音による行動認識モデルの精度向上に関する一検討
tags: 
author: [custard](https://www.docswell.com/user/custard-1855)
site: [Docswell](https://www.docswell.com/)
thumbnail: https://bcdn.docswell.com/page/LE1Y4L557G.jpg?width=480
description: 卒業論文をまとめ, NS研究会にて発表した際の資料です.
published: April 13, 26
canonical: https://www.docswell.com/s/custard-1855/5MQXRE-sed_4_har
---
# Page. 1

![Page Image](https://bcdn.docswell.com/page/LE1Y4L557G.jpg)

実環境を見据えた
音による行動認識モデルの
精度向上に関する一検討
竹本 志恩、川原 亮一
東洋大学 情報連携学部 情報連携学科
本研究内容は、文献[1]での学会発表内容である。また、本資料内での図面のうち、
モデル構成や精度の表についての出典は、[1]でのFig.3及びTable 3である。
[1] 竹本志恩, 川原亮一, “実環境を見据えた音による行動認識モデルの精度向上に関する一検討,” 信
学技報, vol. 125, no. 385, NS2025-226, pp. 31-36, 2026 年 3 月.


# Page. 2

![Page Image](https://bcdn.docswell.com/page/GEWGXV5WJ2.jpg)

背景: 高齢者見守り
課題
少子高齢化で介護負担が増大
解決法
高齢者の自立生活支援を行動認識で実現
入力
応用
行動認識
センサ
自立的な生活
大まかな流れ
2


# Page. 3

![Page Image](https://bcdn.docswell.com/page/47ZL64N2J3.jpg)

要素技術: 行動認識
行動認識
センサの情報で行動を推定
想定される応用
•
行動認識の結果から様式を把握
•
時系列予測を行い、行動様式からの逸脱を検知
入力
結果取得
行動認識
センサ
応用
行動様式を
把握
異常検知
生活改善
大まかな流れ
3


# Page. 4

![Page Image](https://bcdn.docswell.com/page/YJ6W2396JV.jpg)

なぜマイク?
主な理由
•
他センサを補完
•
音を加味した行動認識を実現
ユースケース
•
浴室や洗面所などの異常検知
•
カメラの監視が難しい場所を想定
•
ウェアラブル以外の選択肢を提案
4


# Page. 5

![Page Image](https://bcdn.docswell.com/page/GJ5M2LN5J4.jpg)

研究の目標
全体: 自立生活支援
•
家庭用の異常検知システムを構築
•
エッジデバイスでの推論・学習が重要
本研究: パラメータ以外の精度向上
•
事前学習済みモデルを一つに限定
•
モデルのパラメータを増やさずに精度向上
•
半教師あり学習、ドメイン汎化の変更を提案
5


# Page. 6

![Page Image](https://bcdn.docswell.com/page/9E294L5G7R.jpg)

要素技術: 行動認識
行動認識
•
センサの情報で行動を推定
•
主流なセンサは2つ: ウェアラブル、カメラ
ウェアラブル
•
•
•
常に監視可能
認知的懸念
• 未装着・充電
装着への抵抗
カメラ
•
•
情報が豊富
被監視感
•
死角や暗所での
精度低下
マイク
•
左課題を解消
•
環境要因で
精度低下
センサの利害得失
6


# Page. 7

![Page Image](https://bcdn.docswell.com/page/D7Y4MQKGEM.jpg)

要素技術: 行動認識
ウェアラブル
先行研究[2][3]は慣性センサで転倒を検知
ウェアラブル
•
•
•
常に監視可能
認知的懸念
• 未装着・充電
装着への抵抗
カメラ
•
•
情報が豊富
被監視感
•
死角や暗所での
精度低下
マイク
•
左課題を解消
•
環境要因で
精度低下
[2] S. Badgujar and A. S. Pillai, &quot;Fall Detection for Elderly People using Machine Learning,&quot; 2020 11th International Conference on Computing, Communication and Networking
Technologies (ICCCNT), Kharagpur, India, 2020.
[3] M. Nouredanesh, K. Gordt, M. Schwenk and J. Tung, &quot;Automated Detection of Multidirectional Compensatory Balance Reactions: A Step Towards Tracking Naturally
Occurring Near Falls,&quot; in IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 28, no. 2, pp. 478-487, Feb. 2020.
7


# Page. 8

![Page Image](https://bcdn.docswell.com/page/VENYWKR8J8.jpg)

要素技術: 行動認識
カメラ
RGB、深度センサ、それらの統合が存在
先行研究[4]は視覚情報から人の動作 (歩く、しゃがむ…)を推定
ウェアラブル
•
•
•
常に監視可能
認知的懸念
• 未装着・充電
装着への抵抗
カメラ
•
•
情報が豊富
被監視感
•
死角や暗所での
精度低下
マイク
•
左課題を解消
•
環境要因で
精度低下
fi
[4] N. Zerrouki, F. Harrou, Y. Sun and A. Houacine, &quot;Vision-Based Human Action Classi cation Using Adaptive Boosting Algorithm,&quot; in IEEE Sensors Journal, vol. 18, no. 12, pp.
5115-5121, 15 June15, 2018.
8


# Page. 9

![Page Image](https://bcdn.docswell.com/page/Y79PXVMXE3.jpg)

要素技術: 行動認識
マイク
先行研究[5]は浴室の活動を音で分析
•
蛇口、シャワー、水洗トイレの音を学習、評価
•
推論は二段階: 1. 水かそれ以外か、 2. 水なら3つの内どれか
ウェアラブル
•
•
•
常に監視可能
認知的懸念
• 未装着・充電
装着への抵抗
カメラ
•
•
情報が豊富
被監視感
•
死角や暗所での
精度低下
マイク
•
左課題を解消
•
環境要因で
精度低下
[5] S.H. Hyun, “Sound-event detection of water-usage activities using transfer learning,” Sensors, vol.24, no.1, p.22, 2023.
9


# Page. 10

![Page Image](https://bcdn.docswell.com/page/G78D2PL97D.jpg)

要素技術: 行動認識
マイク
先行研究[5]と本研究の違い
•
識別対象: [5]は水に限定
•
利用空間: [5]は浴室に限定
•
本研究ではより広範な利用を検討
• 目指すのは音の文脈を汲んだ行動認識
[5] S.H. Hyun, “Sound-event detection of water-usage activities using transfer learning,” Sensors, vol.24, no.1, p.22, 2023.
10


# Page. 11

![Page Image](https://bcdn.docswell.com/page/L7LM21LMJR.jpg)

要素技術: 行動認識
実環境の課題
雑音や音響の程度で精度低下
具体タスクは音響イベント検出
ウェアラブル
•
•
•
常に監視可能
認知的懸念
• 未装着・充電
装着への抵抗
カメラ
•
•
情報が豊富
被監視感
•
死角や暗所での
精度低下
マイク
•
左課題を解消
•
環境要因で
精度低下
センサの利害得失
11


# Page. 12

![Page Image](https://bcdn.docswell.com/page/4EMY8WM6EW.jpg)

要素技術: 音響イベント検出
音響イベント検出
Sound Event Detection, SEDで行動認識
音から以下を推定
•
イベントの種類
•
開始・終了時刻
種類
ドアの開閉
流水
時刻
開始
終了
12


# Page. 13

![Page Image](https://bcdn.docswell.com/page/PER95YVNJ9.jpg)

要素技術: 行動認識
実環境の課題例
•
同時に生じる複数の音を入力に、行動を推定
•
下図では換気扇の音で推論精度が低下
入力
出力
モデル
対象 (足音)
足音?
雑音 (換気扇)
雑音の精度への影響
13


# Page. 14

![Page Image](https://bcdn.docswell.com/page/P7XQKGZ8EX.jpg)

先行実装: ベースライン
先行実装
•
•
1
DCASE は実環境でのSEDを目指す取組み
2
2024 task4の基準モデル が本研究の先行実装
Input
Audios
BEATs
pooling /
interpolation
CNN
TCN N ⇥ 768
concat
TCN N ⇥ 128
BiGRU
TCN N ⇥ 128
モデル構成図
Linear
TCN N ⇥ 384
TCN N ⇥ 27
TP T ⇥ 768
T: フレーム数
1. Detection and Classi cation of Acoustic Scenes and Events, SEDのコンペディション
2. https://github.com/DCASE-REPO/DESED_task/tree/master/recipes/dcase2024_task4_baseline
fi
14


# Page. 15

![Page Image](https://bcdn.docswell.com/page/37K9548L7D.jpg)

先行実装: ベースライン
先行実装 アーキテクチャ
•
CRNN (CNN + RNN) を採用
•
事前学習済みモデルBEATs [6]を特徴抽出に使用
•
音声向けTransformerを用いるフレームワーク
•
大量の未ラベルデータで音の意味と文脈を学習
Input
Audios
BEATs
pooling /
interpolation
CNN
TCN N ⇥ 768
concat
TCN N ⇥ 128
BiGRU
TCN N ⇥ 128
モデル構成図
Linear
TCN N ⇥ 384
TCN N ⇥ 27
TP T ⇥ 768
T: フレーム数
[6] S. Chen, Y. Wu, C. Wang, S. Liu, D. Tompkins, Z. Chen, and F. Wei, “BEATs: Audio pre-training with acoustic tokenizers,”
arXiv:2212.09058, 2022.
15


# Page. 16

![Page Image](https://bcdn.docswell.com/page/LJ3WKL26J5.jpg)

先行実装: Mean Teacher
Mean Teacher, MT [7]
•
•
ラベルなしデータを活用する半教師あり学習
通常モデルの指数移動平均が教師
θ′t = αθ′t−1 + (1 − α)θt (t: ステップ数、α: 平滑化係数)
•
通常の損失
ラベル
一貫性損失
擬似ラベル
予測
指数移動平均
ラベルあり
通常モデル
重み θt
教師
重み θ′t
[7] A. Tarvainen and H. Valpola. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep
learning results. Advances in neural information processing systems 30 (2017).
￼
￼
￼
ラベルなし
16


# Page. 17

![Page Image](https://bcdn.docswell.com/page/8JDK3NZMEG.jpg)

先行実装: Mean Teacher
Mean Teacher, MT [7]
•
教師予測の質が学習の質に直結
通常の損失
ラベル
一貫性損失
擬似ラベル
予測
指数移動平均
ラベルあり
通常モデル
重み θt
教師
重み θ′t
￼
ラベルなし
[7] A. Tarvainen and H. Valpola. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep
learning results. Advances in neural information processing systems 30 (2017).
17


# Page. 18

![Page Image](https://bcdn.docswell.com/page/VEPK45DQ78.jpg)

先行研究: Con dent Mean Teacher
Con dent Mean Teacher, CMT [8]
•
教師予測 y &gt; 0.5のみ使用
•
当環境では一部精度が低下も[8]では向上
通常の損失
ラベル
一貫性損失
擬似ラベル
予測
&gt; 0.5
指数移動平均
ラベルあり
通常モデル
教師
ラベルなし
[8] S. Xiao, X. Zhang, and P. Zhang. Multi-dimensional frequency dynamic convolution with con dent mean teacher for sound event detection.
ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE. 2023, pp. 1–5.
fi
fi
fi
18


# Page. 19

![Page Image](https://bcdn.docswell.com/page/27VVXP1P7Q.jpg)

提案手法: CMT 改良手法
CMTの問題
•
予測 y &lt;= 0.5が未使用
•
「イベントなし」の学習が困難
二段階閾値を導入
•
y =&gt; 0.7、0.3 =&lt; yを使用
負例
正例
確率
0
0.3
0.7
1
19


# Page. 20

![Page Image](https://bcdn.docswell.com/page/5JGLVP2Q7L.jpg)

先行研究: MixStyle
MixStyle [9]
•
データ拡張の一種
•
環境要因による精度低下を防止
処理の概要
•
入力の平均 μ・分散 σを計算し、λに従い混合
•
本研究では周波数方向に計算
入力 1
μ1, σ1
入力 2
μ2, σ2
···
···
μ̃ = λμ1 + (1 − λ)μ2
···
[9] K. Zhou, Y. Yang, Y. Qiao, and T. Xiang. Domain generalization with mixstyle. arXiv preprint arXiv:2104.02008 (2021).
20


# Page. 21

![Page Image](https://bcdn.docswell.com/page/47QY62PWEP.jpg)

先行研究: 周波数重み付きMixStyle
先行研究 [10]は注意機構と統合
1. 入力
複数の特徴量を入力
2. 注意機構
周波数方向の重要度を取得
3. MixStyle
複数の特徴量を混合
[10] Y. Xiao, H. Yin, J. Bai, and R. K. Das. FMSG-JLESS submission for DCASE 2024 task4 on sound event detection with heterogeneous
training dataset and potentially missing labels. arXiv preprint arXiv:2407.00291 (2024).
21


# Page. 22

![Page Image](https://bcdn.docswell.com/page/KE4W4LY1J1.jpg)

提案手法: CNNでの周波数重み付け
提案の工夫
•
注意機構をCNNに簡素化
•
特徴量の和で情報損失を防止
1. 入力
複数の特徴量を入力
2. CNN
周波数方向の重要度を取得
3. MixStyle
複数の特徴量を混合
4. 残差接続
入力及び加工後特徴量を加算
22


# Page. 23

![Page Image](https://bcdn.docswell.com/page/L71Y4L65JG.jpg)

提案のまとめ
適用箇所
•
MixStyle: 学習初期
•
CMT: 損失計算~更新時
1. 入力
MixStyleを適用
2. CRNN
1,2層目の畳み込みでMixStyle
3. 損失計算
ラベルあり・なしで計算
4. 更新
教師及び生徒モデルを更新
23


# Page. 24

![Page Image](https://bcdn.docswell.com/page/G7WGXVWWE2.jpg)

評価指標
評価対象
•
•
時刻推定
クラス
ドアの開閉
流水
21クラス分類
時刻
データセットとの対応
•
開始
終了
DCASE 2024 task4に準拠
1
•
DESED [11, 12]: PSDS [13]
•
MAESTRO Real [14]: mpAUC
fi
1. Domestic Environment Sound Event Detection Dataset
[11] N. Turpault, R. Serizel, A. P. Shah, and J. Salamon. Sound event detection in domestic environments with weakly labeled data and soundscape synthesis. Workshop on Detection and Classi cation of
Acoustic Scenes and Events. 2019.
[12] R. Serizel, N. Turpault, A. Shah, and J. Salamon. Sound event detection in synthetic domestic environments. ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal
Processing (ICASSP). IEEE. 2020, pp. 86–90.
[13] Ç. Bilen, G. Ferroni, F. Tuveri, J. Azcarreta, and S. Krstulović. A framework for the robust evaluation of sound event detection. ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and
Signal Processing (ICASSP). IEEE. 2020, pp. 61–65.
[14] I. Martín-Morató and A. Mesaros. Strong labeling of sound events using crowdsourced weak labels and annotator competence estimation. IEEE/ACM transactions on audio, speech, and language
processing 31 (2023), pp. 902–914.
24


# Page. 25

![Page Image](https://bcdn.docswell.com/page/4JZL6452E3.jpg)

評価指標
混同行列
•
表は犬の鳴き声 (クラスc) の検出例
•
性能評価のため、TPとFPの割合を使用
•
TPR (True Positive Rate):
TP
• TPR =
TP + FN
FPR (False Positive Rate):
FP
• FPR =
FP + TN
•
予測/
正解ラベル
犬 (正解)
犬以外
犬
TP
(True
Positive)
FN
(False
Negative)
犬以外
FP
TN
25


# Page. 26

![Page Image](https://bcdn.docswell.com/page/YE6W23V6EV.jpg)

評価指標
ROC曲線
•
予測確率に対し、どこから正解とするか?
•
閾値ごとのTP/FPを可視化
•
•
甘め: 誤報の可能性
•
厳しめ: 見逃す可能性
低FPR+高TPRが理想
26


# Page. 27

![Page Image](https://bcdn.docswell.com/page/GE5M2LD5E4.jpg)

評価指標
AUC (Area Under the Curve)
ROC曲線下面積を指す。1.0に近づくほど良好
27


# Page. 28

![Page Image](https://bcdn.docswell.com/page/97294L8GJR.jpg)

評価指標
mpAUC (macro-averaged partial AUC)
•
•
Partial: 実用上の要請に基づき、FPRの範囲を限定
•
誤報を抑制
•
DCASE 2024では0.0~0.1
•
評価時は0.1で正規化
macro-averaged:
•
全クラスの平均
28


# Page. 29

![Page Image](https://bcdn.docswell.com/page/DJY4MQ6G7M.jpg)

評価指標
PSDS (Polyphonic Sound Detection Score)
•
クラスcの予測について
•
正解ラベルと予測の時間的重複の割合でTP/FPを決定
TP or FP
•
重なりの割合が閾値を超過
•
̂
他クラスcの正解ラベルと重複するFP
•
CT: クロストリガー
•
例: 犬の鳴き声をサイレンと誤認
CT
29


# Page. 30

![Page Image](https://bcdn.docswell.com/page/V7NYWKZ8E8.jpg)

評価指標
PSDS (Polyphonic Sound Detection Score)
•
ある閾値におけるクラスcの有効FPR (eFPR)
1
e*
+ αCT
R*
c ≜ R*
FP,c
CT,c,c ̂
| C | − 1 c∑
̂∈C
ĉ ≠ c
•
R*
:
データセット総時間当たりのFP数
FP,c
•
R*
CT,c,c ̂ : 他クラスの正解ラベルの総時間当たりのCT数
•
αCT : CTに対するペナルティ
30


# Page. 31

![Page Image](https://bcdn.docswell.com/page/YJ9PXVZX73.jpg)

評価指標
PSDS (Polyphonic Sound Detection Score)
•
任意のeFPR (横軸の位置) について
•
クラスごとの精度で有効TP率 (eTPR)を算出
r(e) ≜ μTP(e) − αST * σTP(e)
•
μTP : 平均TPR
•
σTP : 検出精度のクラス間のばらつき (標準偏差)
•
αST : 特定クラスの精度が極端に低いとペナルティ
31


# Page. 32

![Page Image](https://bcdn.docswell.com/page/GJ8D2P19JD.jpg)

評価指標
PSDS (Polyphonic Sound Detection Score)
•
右図はPSD-ROC曲線
•
曲線化の面積 (AUC) を計算し、
システムの性能を評価
https://github.com/j-bernardi/psds_eval
32


# Page. 33

![Page Image](https://bcdn.docswell.com/page/LJLM21QMER.jpg)

評価指標
PSDS (Polyphonic Sound Detection Score)
シナリオ1と2の違い:
•
1は時間的正確性を重視
•
•
正解と予測の時間的重複の判定が厳密
2は誤報を抑止
•
時間的な判定は緩め
•
誤検出のペナルティ (αCT) が重い
33


# Page. 34

![Page Image](https://bcdn.docswell.com/page/47MY8W167W.jpg)

実験: CMT
•
提案はPSDS1を維持、PSDS2を改善
•
負例利用でイベントのない区間での認識が改善
•
mpAUCは低下
0.8
先行実装
先行研究
提案手法
0.6
0.4
0.2
0
PSDS1
PSDS2
mpAUC
34


# Page. 35

![Page Image](https://bcdn.docswell.com/page/P7R95YLNE9.jpg)

実験: CMT クラス別精度 - 向上
•
5%前後の変化を抜粋
•
長く持続する音で向上
1
先行実装
先行研究
提案手法
0.75
0.5
0.25
0
Blender
wind̲blowing
PSDS2
mpAUC
35


# Page. 36

![Page Image](https://bcdn.docswell.com/page/PJXQKGW87X.jpg)

実験: CMT クラス別精度 - 低下
•
5%前後の変化を抜粋
•
瞬間的な音の持続で低下
0.9
先行実装
先行研究
Dishes
Car horn
提案手法
0.675
0.45
0.225
0
PSDS2
Metro approaching
mpAUC
36


# Page. 37

![Page Image](https://bcdn.docswell.com/page/3JK954VLJD.jpg)

実験1. クラス別精度
5%&gt;の変化
•
向上は青、低下は赤
37


# Page. 38

![Page Image](https://bcdn.docswell.com/page/LE3WKLM6E5.jpg)

実験: MixStyle
•
CNNで精度向上
•
原因: 残差接続で一貫性の制約が強化
•
重み付けは実装の不備で未機能
0.8
CNNなし
CNNあり
0.6
0.4
0.2
0
PSDS1
PSDS2
mpAUC
38


# Page. 39

![Page Image](https://bcdn.docswell.com/page/8EDK3N6M7G.jpg)

実験: CMT+MixStyle
•
提案は先行実装より精度向上
•
先行研究との差は0.004~0.006
•
MixStyleはCMTより影響が強い可能性
0.8
先行実装
先行研究
PSDS1
PSDS2
提案手法
0.6
0.4
0.2
0
mpAUC
39


# Page. 40

![Page Image](https://bcdn.docswell.com/page/V7PK45YQJ8.jpg)

まとめ
提案は精度向上
•
CMTの改良
•
MixStyleのCNN統合
課題
•
CMT+MixStyleの学習の収束
•
モデル軽量化
40


# Page. 41

![Page Image](https://bcdn.docswell.com/page/2JVVXPGPJQ.jpg)

参考文献
[1] 竹本志恩, 川原亮一, “実環境を見据えた音による行動認識モデルの精度向上に関する一検討,” 信学技報, vol. 125, no. 385,
NS2025-226, pp. 31-36, 2026 年 3 月.
[2] S. Badgujar and A. S. Pillai, &quot;Fall Detection for Elderly People using Machine Learning,&quot; 2020 11th International
Conference on Computing, Communication and Networking Technologies (ICCCNT), Kharagpur, India, 2020.
[3] M. Nouredanesh, K. Gordt, M. Schwenk and J. Tung, &quot;Automated Detection of Multidirectional Compensatory Balance
Reactions: A Step Towards Tracking Naturally Occurring Near Falls,&quot; in IEEE Transactions on Neural Systems and
Rehabilitation Engineering, vol. 28, no. 2, pp. 478-487, Feb. 2020.
[4] N. Zerrouki, F. Harrou, Y. Sun and A. Houacine, &quot;Vision-Based Human Action Classi cation Using Adaptive Boosting
Algorithm,&quot; in IEEE Sensors Journal, vol. 18, no. 12, pp. 5115-5121, 15 June15, 2018.
[5] S.H. Hyun, “Sound-event detection of water-usage activities using transfer learning,” Sensors, vol.24, no.1, p.22, 2023.
[6] S. Chen, Y. Wu, C. Wang, S. Liu, D. Tompkins, Z. Chen, and F. Wei, “BEATs: Audio pre-training with acoustic tokenizers,”
arXiv:2212.09058, 2022.
[7] A. Tarvainen and H. Valpola. Mean teachers are better role models: Weight-averaged consistency targets improve semisupervised deep learning results. Advances in neural information processing systems 30 (2017).
[8] S. Xiao, X. Zhang, and P. Zhang. Multi-dimensional frequency dynamic convolution with con dent mean teacher for sound
event detection. ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
IEEE. 2023, pp. 1–5.
[9] K. Zhou, Y. Yang, Y. Qiao, and T. Xiang. Domain generalization with mixstyle. arXiv preprint arXiv:2104.02008 (2021).
[10] Y. Xiao, H. Yin, J. Bai, and R. K. Das. FMSG-JLESS submission for DCASE 2024 task4 on sound event detection with
heterogeneous training dataset and potentially missing labels. arXiv preprint arXiv:2407.00291 (2024).
[11] N. Turpault, R. Serizel, A. P. Shah, and J. Salamon. Sound event detection in domestic environments with weakly labeled
data and soundscape synthesis. Workshop on Detection and Classi cation of Acoustic Scenes and Events. 2019.
[12] R. Serizel, N. Turpault, A. Shah, and J. Salamon. Sound event detection in synthetic domestic environments. ICASSP
2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE. 2020, pp. 86–90.
[13] Ç. Bilen, G. Ferroni, F. Tuveri, J. Azcarreta, and S. Krstulović. A framework for the robust evaluation of sound event
detection. ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE.
2020, pp. 61–65.
[14] I. Martín-Morató and A. Mesaros. Strong labeling of sound events using crowdsourced weak labels and annotator
competence estimation. IEEE/ACM transactions on audio, speech, and language processing 31 (2023), pp. 902–914.
fi
fi
fi
41


# Page. 42

![Page Image](https://bcdn.docswell.com/page/5EGLVP8QJL.jpg)

付録1. 音による行動推定
想定される見守り
•
浴室や洗面所などの異常検知
•
カメラの監視が難しい場所を想定
•
ウェアラブル以外の選択肢を提案
提案の精度
•
付録1より、提案は概ねCMTより精度向上
•
特に流水と足音、風の吹く音での精度が良好
•
異常検知の前段として良好
42


# Page. 43

![Page Image](https://bcdn.docswell.com/page/4JQY629W7P.jpg)

付録2. エッジ向けの検討
パラメータ削減
•
MT以外の半教師あり学習
•
タスク変更: 種類のみ分類
•
•
時間を他レイヤに委譲
モデル圧縮: 蒸留、量子化など
推論・学習の効率化
•
実行環境の検討
•
機械学習コンパイラの最適化
43


# Page. 44

![Page Image](https://bcdn.docswell.com/page/K74W4LX1E1.jpg)

付録3. 実験の動作環境
•
Ubuntu 24.04.3 LTS
•
Kernel: 6.14.0-36-generic
•
Python 3.12.11
•
CPU: AMD Ryzen 7 5700X 8-Core Processor, x86 64
•
GPU: NVIDIA GeForce RTX 4070, 12282 MiB
•
RAM: 15GiB
44