-- Views
September 22, 25
スライド概要
Chihiro Yamasaki, Kai Sugahara, Kazushi Okamoto: Knowledge-augmented relation learning for complementary recommendation with large language models, The 2nd Workshop on Generative AI for E-Commerce 2025 in conjunction with the 19th ACM Conference on Recommender Systems (RecSys 2025), 2025.9, Prague, Czech Republic.
Data Science Research Group, The University of Electro-Communications
Knowledge-Augmented Relation Learning for Complementary Recommendation with Large Language Models Chihiro Yamasaki, Kai Sugahara, Kazushi Okamoto The University of Electro-Communications 2025.09.22 Workshop on Generative AI for E-Commerce 2025 1 / 16
Complementary Recommendation 2025.09.22 Workshop on Generative AI for E-Commerce 2025 2 / 16
Use Case of Complementary Relationships Item selection: Manual configuration based on purchase logs For a large number of item pairs, coverage is insufficient → Now, machine learning with complementary labels in is highlighting the need for automation. 2025.09.22 Workshop on Generative AI for E-Commerce 2025 3 / 16
BBLs: Behavior-Based Labels [McAuley+, 2015] Many previous studies define co-purchased (co-viewed) items as complementary (substitute) items Often include noise such as the classic example: “diapers and beer” Not necessarily consistent with true complementary [Sugahara+, 2024], [Papso, 2023], [Li+, 2024] 2025.09.22 Workshop on Generative AI for E-Commerce 2025 4 / 16
FBLs: Function-Based Labels [Sugahara+, 2024], [Yamasaki+, 2025] Defined by 9 functional categories regarding complementary relationships Strengths Independent of users' browsing or purchase logs → less noise (than BBLs) Clearly defined categories → high interpretability Challenges Reguires human annotation Costly, especially for large-scale e-commerce Limits the model’s ability to generalize across diverse items 2025.09.22 Workshop on Generative AI for E-Commerce 2025 5 / 16
Beyond Cost of FBLs with LLMs Cost Issues of FBLs FBLs require human annotations ↓ LLMs as Annotators LLMs are effective FBLs annotators; GPT-4o-mini reached macro-F1 0.849 (3-class) [Yamasaki+, 2025] ↓ Remaining Still Cost of LLMs LLMs can cover large-scale e-commerce items moderately well, but not as cheaply as needed ↓ Hypothesis Only a small fraction of item pairs are truly complementary → We should efficiently annotate only the samples useful for training the classification model 2025.09.22 Workshop on Generative AI for E-Commerce 2025 6 / 16
Reducing Labeling Cost via Active Learning Efficient accuracy gains by labeling samples uncertain to the model Human-in-the-loop Active Learning [Settles, 2009], [Tharwat+, 2023] 2025.09.22 LLM-in-the-loop Active Learning [Zhang+, 2023], [Kholodna+, 2024] LLM-in-the-loop: Lower cost and automatable compared to manual Workshop on Generative AI for E-Commerce 2025 7 / 16
Proposal: KARL (Knowledge-Augmented Relation Learning) A framework designed To accurately and effectively classify complementary relationships Under constrained annotation resources Under limited training datasets regarding FBLs Key Idea 1. Begin with a small seed of expert-labeled FBLs 2. Let an ML model learn from them 3. The model points out what it is uncertain about 4. An LLM steps in to annotate those pairs 5. This loop gradually grows the dataset while keeping costs low 2025.09.22 Workshop on Generative AI for E-Commerce 2025 8 / 16
Component of KARL - Uncertain Pairs Candidate Sampling Select a manageable subset of item pairs using item categories Apply category-aware selection to ensure diversity Uncertainty Scoring 2025.09.22 Workshop on Generative AI for E-Commerce 2025 9 / 16
*Our Experiments used GPT-4o-mini as an Annotator Component of KARL - LLMs as Annotator 2025.09.22 Workshop on Generative AI for E-Commerce 2025 10 / 16
Research Questions RQ1: Does KARL improve accuracy in in-distribution (ID) feature spaces? RQ2: Does KARL improve accuracy in out-of-distribution (OOD) feature spaces? RQ3: How does training data diversity impact accuracy in both ID and OOD? 2025.09.22 Workshop on Generative AI for E-Commerce 2025 11 / 16
*: https://github.com/okamoto-lab/fbl_dataset Experimental Setup Dataset ML Model We conducted experiment on ASKUL* dataset. ML Model: Logistic Regression Constructed product information (name, Input: Pair-wise content-based feature vector category, description, brand, etc.) Similarity and Match flag of Descriptions and FBLs for item pairs of item pair ID is used for training and testing (RQ1), LLM Model OOD is used for testing (RQ2). LLM Model: GPT-4o-mini Setting Total Compl. Subst. Unrel. Input: FBLs Prompt + Descriptions of item pair ID 2,625 591 410 1,624 OOD 2,790 375 2,024 391 2025.09.22 Workshop on Generative AI for E-Commerce 2025 12 / 16
RQ1 : In-distribution Accuracy Result The baseline (Loop 0) already achieved high accuracy (0.822). Additional training with KARL yielded only minor improvements ( 0.5%), and accuracy dropped afterward. Finding QBC and Margin sampling often selected ambiguous pairs. → which disrupted the learned distribution and led to accuracy decline. 2025.09.22 Workshop on Generative AI for E-Commerce 2025 13 / 16
RQ2 : Out-of-distribution Accuracy Result KARL achieved substantial gains. Macro-F1 improved by up to +37% over the baseline (Loop 0). QBC and Margin converged faster and more accurately than Random. Finding Actively incorporating diverse data helps acquire knowledge in OOD feature spaces. 2025.09.22 Workshop on Generative AI for E-Commerce 2025 14 / 16
RQ3 : Diversity–Accuracy Relation *: Gain is measured against the baseline (Loop 0) Result ID setting: increasing diversity did not improve accuracy, and sometimes even reduced it. OOD setting: higher diversity led to clear macro-F1 gains. Finding Active learning strategies must be adaptively tuned to the data setting (ID vs. OOD). 2025.09.22 Workshop on Generative AI for E-Commerce 2025 15 / 16
Conclusion Summary Proposed KARL: combines active learning with LLMs to expand FBLs at lower cost. OOD: clear accuracy improvement. ID: only small gains. Data diversity can help, but effect varies with training conditions. Future Work Apply to larger datasets. Improve models and explore different LLMs. Create adaptive methods that switch strategies by ID/OOD. 2025.09.22 Workshop on Generative AI for E-Commerce 2025 16 / 16
References (1 / 2) [Li+, 2024] L. Li, Z. Du: Complementary Recommendation in E-commerce: Definition, Approaches, and Future Directions, arXiv:2403.16135, 2024. [McAuley+, 2015] J. McAuley, R. Pandey, J. Leskovec: Inferring Networks of Substitutable and Complementary Products, Proc. 21st ACM SIGKDD, 785-794, 2015. [Sugahara+, 2024] K. Sugahara, C. Yamasaki, K. Okamoto: Is It Really Complementary? Revisiting Behavior-based Labels, Proc. 18th RecSys, 1091-1095, 2024. [Papso, 2023] R. Papso: Complementary Product Recommendation for Long-tail Products, Proc. 17th RecSys, 13051311, 2023. [Li+, 2024] Z. Li, Y. Liang, M. Wang, S. Yoon, J. Shi, X. Shen, X. He, C. Zhang, W. Wu, H. Wang, J. Li, J. Chan, Y. Zhang: Explainable and Coherent Complement Recommendation with LLMs, Proc. 33rd ACM CIKM, 2024. [Yamasaki+, 2025] C. Yamasaki, K. Sugahara, Y. Nagi, K. Okamoto: Function-based Labels for Complementary Recommendation: Definition, Annotation, and LLM-as-a-Judge, arXiv:2507.03945, 2025. 2025.09.22 Workshop on Generative AI for E-Commerce 2025 17 / 16
References (2 / 2) [Settles, 2009] B. Settles: Active Learning Literature Survey, Univ. of Wisconsin-Madison, Tech. Rep., 2009. [Tharwat+, 2023] A. Tharwat, W. Schenck: A Survey on Active Learning, Math., 11(4), 2023. [Zhang+, 2023] R. Zhang, Y. Li, Y. Ma, M. Zhou, L. Zou: LLMaAA: Making LLMs Active Annotators, EMNLP 2023. [Kholodna+, 2024] N. Kholodna, S. Julka, M. Khodadadi, M. N. Gumus, M. Granitzer: LLMs in the Loop, ECML PKDD 2024. 2025.09.22 Workshop on Generative AI for E-Commerce 2025 18 / 16