commonlit コンペ紹介

283 Views

January 13, 24

スライド概要

kaggle commonlitコンペの紹介
一部、公開できないスライドは除外した版

関連スライド

各ページのテキスト
1.

Commonlit コンペ紹介

2.

紹介するコンペティション ◼ CommonLit - Evaluate Student Summaries ⚫ 生徒の書いた文章のスコアを予測するコンペ ✓ 配布データ:お題となる文章、その文章のタイトル、その文章に関する質問、生徒の質問への回答文、回答文のスコア • お題となる文章・タイトル・質問は4種類 • 回答文は全部で約7000 • 各お題に対して1000~2000の回答 • スコアはcontentとwordingの2種類あり、この2つを予測 • Content: 内容の良し悪し、Wording: 文章としての良し悪し • 評価は MCRMSE (mean column-wise root mean squared error) Commonlit自体はアメリカの非営利組織 小中高校生向けの教材(読み物のデータなど)を提供

3.
[beta]
データ例
タイトル

On Tragedy

本文

"Chapter 13
As the sequel to what has already been said, we must proceed to consider what the poet should aim at, and what he should avoid, in constructing his plots; and by what means the specific effect of Tragedy will be produced.
A perfect tragedy should, as we have seen, be arranged not on the simple but on the complex plan. It should, moreover, imitate actions which excite pity and fear, this being the distinctive mark of tragic imitation. It follows plainly,
in the first place, that the change of fortune presented must not be the spectacle of a virtuous man brought from prosperity to adversity: for this moves neither pity nor fear; it merely shocks us. Nor, again, that of a bad man
passing from adversity to prosperity: for nothing can be more alien to the spirit of Tragedy; it possesses no single tragic quality; it neither satisfies the moral sense nor calls forth pity or fear. Nor, again, should the downfall of the
utter villain be exhibited. A plot of this kind would, doubtless, satisfy the moral sense, but it would inspire neither pity nor fear; for pity is aroused by unmerited misfortune, fear by the misfortune of a man like ourselves. Such an
event, therefore, will be neither pitiful nor terrible. There remains, then, the character between these two extremes — that of a man who is not eminently good and just, yet whose misfortune is brought about not by vice or
depravity, but by some error of judgement or frailty. He must be one who is highly renowned and prosperous — a personage like Oedipus, Thyestes, or other illustrious men of such families.
A well-constructed plot should, therefore, be single in its issue, rather than double as some maintain. The change of fortune should be not from bad to good, but, reversely, from good to bad. It should come about as the result not
of vice, but of some great error or frailty, in a character either such as we have described, or better rather than worse. The practice of the stage bears out our view. At first the poets recounted any legend that came in their way.
Now, the best tragedies are founded on the story of a few houses — on the fortunes of Alcmaeon, Oedipus, Orestes, Meleager, Thyestes, Telephus, and those others who have done or suffered something terrible. A tragedy, then,
to be perfect according to the rules of art, should be of this construction. Hence they are in error who censure Euripides just because he follows this principle in his plays, many of which end unhappily. It is, as we have said, the
right ending. The best proof is that on the stage and in dramatic competition, such plays, if well worked out, are the most tragic in effect; and Euripides, faulty though he may be in the general management of his subject, yet is felt
to be the most tragic of the poets.
In the second rank comes the kind of tragedy which some place first. Like the Odyssey, it has a double thread of plot, and also an opposite catastrophe for the good and for the bad. It is accounted the best because of the weakness
of the spectators; for the poet is guided in what he writes by the wishes of his audience. The pleasure, however, thence derived is not the true tragic pleasure. It is proper rather to Comedy, where those who, in the piece, are the
deadliest enemies — like Orestes and Aegisthus — quit the stage as friends at the close, and no one slays or is slain."

質問

"Summarize at least 3 elements of an ideal tragedy, as described by Aristotle.“

回答例(1)

"The three elements of an ideal tragedy are: Having a character that isn't bad have misfortune befall them., Having no subplots, and ending in death. “
Content: -0.970236693352702,
Wording: -0.417058297304168
回答例(2)

"One element of a perfect tragery in Aristotle's opinion is a plot that is arranged ""...not on the simple but on the complex..."" In other words, Aristotle believes good trageries have complex
plotlines. He also thinks they need to have actions that ""...excite pity and fear...the distinctive mark of tragin imitation."" Aristotle believes good trageries incite negative emotions which cause the
events of the plot. Finally, Aristitle believes the perfect tragedy should have a downfall caused by ""...some error of judgement or fraility."" The character should not be neither good nor evil, and
should somply have made a mistake.“
Content: 1.19964242006474
Wording: 1.07890647099528

タイトル+本文+質問+回答文
(またはどれか一部)

スコア予測モデル
(これを作る)

Content: 〇.〇〇
Wording: △ .△ △

4.

ベースとなるアプローチ(多くの参加者が活用) 1. BERT, DeBERTaなどの言語モデルでの予測 タイトル+本文+質問+回答文 (またはどれか一部) Tokenizer 言語モデルによる回帰 Content: 〇.〇〇 Wording: △ .△ △ Transformerベースの各種言語モデル (コンペでは DeBERTa V3 が一番使われていた) DeBERTa: トークン長の上限がなく、今回のデータ(本文も入れると1000token以上)にもマッチ ※他のモデルだと512tokenを上限としたものが多い 2文章の結合例: [質問文のtoken列] + [SEP] + [回答文のtoken列] ※[SEP]はseparate token, 文の区切り https://vitalflux.com/encoder-only-transformer-models-examples/

5.

ベースとなるアプローチ(多くの参加者が活用) 2. 言語モデル予測値+テキスト特徴 → LightGBMなど タイトル+本文+質問+回答文 (またはどれか一部) 言語モデルによる回帰 Content: 〇.〇〇 Wording: △.△△ LightGBMなど Content: ◇.◇◇ Wording: □. □ □ 特徴量作成 Upvote数が多い公開ノートで利用されていた特徴 各種指標を計算するライブラリもいろいろある(例:textstat)

6.

上位解法(1st) ◼ LLMでデータを生成して学習に利用(モデルはDeBERTa v3 large) 1. Pseudo Labelingした生成データのみで2epoch学習、ValidationはTraining Dataで実行 2. 上記モデルをさらにTraining Dataで2~3epoch学習

7.

上位解法(2nd) ◼ プロンプトの工夫 'Think through this step by step : ' + prompt_question + [SEP] + 'Pay attention to the content and wording : ' + text + [SEP] + prompt_text ◼ Head Mask ⚫ 通常のattention mask ✓ ⚫ 1024token入力できるモデルで、524tokenのテキストを入力する場合は、後ろ500token分は0でマスク 2nd solutionでのmask(head部分のみ) → 'Think through this step by step : ' + prompt_question + [SEP] + 'Pay attention to the content and wording : ' + text + [SEP] + prompt_text ここ以外マスク ◼ 他にもいろいろ

8.

自身の参加結果 ◼ Public 8位、Private 17位 ⚫ しっかりshake downして金メダルならず。。 Public Private

9.

自身の解法 ◼ 4モデルのアンサンブル

10.

自身の解法 ◼ やったこと ⚫ 効いた ✓ Mask Augmentation • 入力テキストの一部トークンを [MASK] トークンと置き換える → 学習時のLossの変遷が少し安定、実験をコントロールしやすくなった ✓ Freezing Layers • DeBERTaの入力に近い層は重みの更新をしない → 過学習の抑制 ⚫ 効かなかった ✓ AWP(Adversarial Weight Perturbation) • NLPコンペでよく使われるらしい • 学習時にモデルの重みに対して、性能が悪くなる方向の摂動を与えて頑健性を向上させる ✓ Text Cleaning • テキストの前処理で記号などを削除 ✓ Back translation augmentation • Augmentationの一種、英語→ドイツ語→英語、のような翻訳をかませて似ているが異なるテキストを生成 AWPとは関係ないが、入力に対する性能を悪化させる摂動の例 悪くなった例も学習に入れるとモデルの頑健性が向上

11.

まとめ・参加しての感想 ◼ 言語処理のコンペは初だったので勉強になった ⚫ Transformers (Hugging Face) によるDeBERTaなどの使い方 ✓ 一度学習パイプラインを組むと使い回しが効く → 別のLLMコンペ等で出だしがスムーズ ⚫ 画像だとAugmentationはいろいろあるがテキストだと? ✓ Mask Augmentationなど ◼ Trust CV難しい ⚫ ついPublic LBばかり気にしちゃう ✓ CVが高いものを選んではいるが、そもそも検討の方向性をPublicの上がり方で決めちゃっている ⚫ CVとTestの相関が必ずしもある訳ではないので、Trust CVが絶対でもないのがまた難しい ⚫ GMでもShake Downしているチームがいくつかあったのは精神的に救い ◼ 上位は試していることが多い ⚫ (たぶん)スマートに勝っている訳ではなく、試行錯誤の数で差がついてる ⚫ 疑似ラベル付与など、別のコンペでも利用されているテクニックをちゃんと使っている ✓ → 自分はめんどくさがって出来なかった