[Solution]Tweet Sentiment Extraction

305 Views

July 20, 21

#AI #Tweet Sentiment Extraction #Model Comparison #BERT #RoBERTa

スライド概要

KaggleのTweet Sentiment Extractionコンペ(https://www.kaggle.com/c/tweet-sentiment-extraction/overview/evaluation)のsolutionをまとめた

加藤まる

@marbou090

スライド一覧

公立はこだて未来大学複雑系学部３年

またはPlayer版

埋め込む »CMSなどでJSが使えない場合

（ダウンロード不可）

関連スライド

[論文紹介]Best Arm Identification in Multi-Armed Bandits

加藤まる 8.8K

kaggleハンズオン

加藤まる 4.9K

プログラミング基礎講習１

加藤まる 1.3K

プログラミング基礎講習３

加藤まる 871

プログラミング基礎講習２

加藤まる 444

UE4で作成するUIと最適化手法【GAME CREATORS CONFERENCE '20】

ue4 ue-ui ue-optimize

エピックゲームズジャパン 645K

各ページのテキスト

Tweet Sentiment Extraction Solution comparison B3 加藤まる 2021/5/31 FUN AI

理解度レベル BERTは全て飛ばしてる

目次 1. コンペ概要 2. First level models 3. Second level models

コンペ概要 Tweet Sentiment Extraction

Task & Data Task: あるツイートと感情が与えられたら、感情を反映している部分を抽出 Data(tweet数) ● train:27k ● Public test 4k ● Private test 8k

Data

Evaluation metric def jaccard(str1, str2): a = set(str1.lower().split()) b = set(str2.lower().split()) c = a.intersection(b) return float(len(c)) / (len(a) + len(b) - len(c))

Evaluation metric

Jaccard係数

10.

Problem with labels

11.

Problem with labels

12.

First level models

13.

Heartkilla’s model ● Models: RoBERTa-base-squad2, RoBERTa-large-squad2, DistilRoBERTa-base, XLNet-base-cased ● Multi Sample Dropout ● AdamW with linear warmup schedule ● Custom loss: Jaccard-based Soft Labels

14.

Heartkilla’s model

15.

Dropout 画像出典

16.

Multi Sample Dropout 日本語survey: https://github.com/AillisInc/paper_survey/issues/4

17.

Adam 画像出典

18.

AdamW 画像出典

19.

Use Normal Jaccard

20.

Jaccard-based Soft Labels

21.

Hikkiiii’s model ● Models: 5fold-roberta-base-squad2(0.712CV), 5fold-roberta-large-squad2(0.714CV) ● Append sentiment token to the end of the text ● CNN + Linear layer on the concatenation of the last 3 hidden states ● Standard cross-entropy loss

22.

交差エントロピー誤差(cross entropy) 出典

23.

Theo’s model ● Models: bert-base-uncased (CV 0.710), bert-large-uncased-wwm (CV 0.710), distilbert (CV 0.705), albert-large-v2 (CV 0.711) ● MSD on the concatenation of the last 8 hidden states ● Smoothed categorical cross-entropy ● Discriminative learning rate ● Sequence bucketing to speed up the training

24.

ラベル平滑化交差エントロピー誤差出典

25.

Cl_ev’s model ● ● ● ● ● Models: roberta-base (CV 0.715),Bertweet MSD on the concatenation of the last 8 hidden states Smoothed categorical cross-entropy Discriminative learning rate Custom merges.txt file for RoBERTa

26.

問題・Transformersはトークンレベルで処理してるからノイズの多いパターンに弱い・Transformersは文字レベルにできない(難しい) ・トークン化が違うモデルは単純にブレンドできない

27.

解決法・Transformerでトークンの確率を出した後、各文字にそれを割り当てる(noise対策) ・スタッキング(アンサンブル手法の一つ)を使用して、複数のトランスフォーマーからNN(文字レベル)に流す

28.

Second level models

29.

アンサンブル学習個々に別々の学習器として学習させたものを、融合させる事によって、未学習のデータに対しての予測能力を向上させるための学習

30.

スタッキング画像出典

31.

OOF(out-of-fold)用いたスタッキング画像出典

32.

擬似ラベル(Pseudo-labeling) 1. ラベルづけされているデータで学習済みモデルを作る 2. 1.で作成したモデルを使って、ラベルづけされていないデータで予測値を出す 3. 2.の予測値を疑似的なラベル、疑似ラベルとし、疑似ラベルづきデータをラベルづきデータに混ぜて学習する参考

33.

擬似ラベル(Pseudo-labeling) ・Google Quest Q&A 1位からのアプローチ「漏れのない」疑似ラベル・信頼度スコア: (start_probas.max() + end_probas.max()) / 2 ・閾値= 0.35 カットオフ

34.

RNN

35.

BiLSTM 画像出典

36.

Softmax 画像出典

37.

Softmax 画像出典

38.

CNN

39.

Batch Normalization 画像出典

40.

WaveNet

41.

Char-level NNs ・Adam optimizer ・Linear learning rate decay without warmup ・Smoothed Cross Entropy Loss ・Stochastic Weighted Average ・Select the whole text if predicted start_idx > end_idx

42.

Ensemble

43.