>100 Views
November 07, 25
スライド概要
Madoka Hagiri, Kazushi Okamoto, Koki Karube, Kei Harada, Atsushi Shibata: Generation and annotation of item usage scenarios in e-commerce using large language models, Proceedings of the 26th International Symposium on Advanced Intelligent Systems (ISIS 2025), 287-290, 2025.11, Cheongju, Republic of Korea.
Data Science Research Group, The University of Electro-Communications
Generation and annotation of item usage scenarios in e-commerce using large language models Madoka Hagiri, Kazushi Okamoto, Koki Karube, Kei Harada, Atsushi Shibata The University of Electro-Communications 2025.11.07 ISIS2025
Introduction Complementary recommendations Recommend item combinations that enhance the user experience when used together [Li+, 2024]. Ex: Complementary Items Purchase Recommend 2025.11.07 ISIS2025 01/16
Challenges of Complementary Recommendation Prior research defines complementarity as frequent co-purchase. [McAuley+, 2015] [Hao+, 2020] Co-purchased Items Query item → Complementary Items Baby wipes → Unrelated Items Historical data Challenge: Purchase historical data may include co-occurrences that are unrelated to complementarity (e.g., “Diapers and Beer”). → Hard to identify real complementary relationships. [Xu+, 2020] [Sugahara+, 2024] 2025.11.07 ISIS2025 02/16
Research Idea How do people identify complementary items? Hypothesis: People regard an item as complementary when it is perceived as necessary in an imagined usage scenario. Query item Usage scenario “Necessary” item ≈ “Complementary”? By modeling this human selection process computationally, we can estimate complementary relationships more accurately. Large language models (LLMs) can reproduce human-like reasoning and context understanding. 2025.11.07 ISIS2025 03/16
Research Question Scenario-based complementary recommendation process Query item Usage scenario RQ1 Necessary item (Complementary) RQ2 Baby wipes RQ3 RQ1: Can an LLM generate reasonable usage scenarios using only category information? (Target of this study) RQ2: Can an LLM generate necessary items based on the usage scenarios? RQ3: Are the items generated from usage scenarios perceived as complementary to the query item by humans? 2025.11.07 ISIS2025 04/16
Category-based Usage Scenario Generation Generation strategy Product-level scenario generation → impractical (too many items) New approach: Generate scenarios based on product categories ・Limited number and standardized format ・Hierarchical structure can be used as input information. → We use category data from ASKUL (a major Japanese e-commerce site). Generation experiment setup ・About 9,000 categories in total ・Some categories are industry-specific (e.g., Laboratory Equipment & Supplies). → Manual evaluation across all categories is difficult. Target categories: 300 randomly sampled from 1,221 categories under “Household Goods / Kitchenware” LLM: GPT- 4o-mini (temperature: 0.6) 2025.11.07 ISIS2025 05/16
Prompt Design *All the prompts and responses in Japanese # Instructions Provide information on product categories. The categories are listed from left to right in order of increasing detail. Please answer the questions accurately and specifically. # Question Please list as many specific scenarios as possible for using the specified category. # Target Category - Box Tissues # Category Hierarchical Path - Household Goods / Kitchenware > Tissues / Toilet Paper / Paper Towels / Daily Necessities > Tissues > Box Tissues Example scenarios (Box tissues) ・Wiping nose or sneezing during a cold or flu ・Cleaning hands or mouth during meals etc. 2025.11.07 ISIS2025 06/16
Evaluation Experiment Human evaluation setup & method Evaluators: 15 participants (2 faculty members, 13 undergrad/ grad students) → Randomly divided into 5 groups (3 evaluators per group). Evaluation Method: Each group evaluated 60 categories of usage scenarios. ・Each scenario was reviewed by 3 evaluators. ・Total: 300 categories, 2925 scenarios evaluated Evaluation Criteria (Reasonableness): ・The scenario was realistic and reasonable. ・The product category was used appropriately within the scenario. Ex: Oil-based marker 2025.11.07 Scenario Making signs or decorations for events → Reasonable Writing on a whiteboard → Marked as “Unreasonable” ISIS2025 07/16
Results: Human Evaluation Total scenarios evaluated: 2,925 (by 3 evaluators each) Votes: 3 Votes: 2 0.2% 2.8% Votes: 1 14.0% Unreasonable Votes: 0 82.9% 2025.11.07 Scenarios rated as reasonable by all evaluators: 2,426 cases (82.9%) Scenarios rated as unreasonable by majority (≥2 evaluators): 89 cases (3.0%) LLM tended to generate reasonable usage scenarios based solely on category information. ISIS2025 08/16
Results: Human Evaluation Distribution of “Unreasonable” votes (15 evaluators / 5 groups) Group 1 Evaluators Group 2 Group 3 Group 4 Group 5 Proportion of votes as unreasonable [%] Votes were mostly consistent, but evaluator O voted much more than others. → To reduce subjective bias, “unreasonable scenarios" were determined by majority vote (≥2 evaluators) within each group. 2025.11.07 ISIS2025 09/16
Qualitative Analysis of Reasonable Scenarios LLM generated scenarios that reflected real consumer behavior. Examples of reasonable scenarios: Category: Lap blanket Part of the usage scenarios generated by the LLM Giving as a gift for friends or family Counteracting office air conditioning The generated scenarios were consistent with purchase purposes. 2025.11.07 ISIS2025 10/16
Analysis of Unreasonable Scenarios We manually categorized the 89 unreasonable scenarios. Unreasonable scenarios = judged unreasonable by ≥2 evaluators Category Description Cases Inappropriate use Scenario conflicts with the item’s intended function or common use. 35 Not an usage scenario Describes logistics (e.g., purchase, maintenance) rather than actual use. 21 Suboptimal item for the scenario Item is usable but not the best fit for the scenario. 15 Indirect or abstract use Item is used indirectly or not central to the task. 10 Unrealistic scenario Describes highly improbable situations. 8 The 89 failures fell into two main categories: ・Clear errors (incorrect use/task deviation) ・Unnatural scenarios (suboptimal/unrealistic cases). 2025.11.07 ISIS2025 11/16
Analysis of Unreasonable Scenarios Distribution of unreasonable scenario ratio by category Unreasonable scenario ratio = (Unreasonable scenarios) / (Total count of scenarios in the category) 52 categories (17.3%) contained at least one unreasonable scenario. Even among the failing categories, most errors were small. → Typically, only 1 in 10 scenarios was unreasonable. However... One category showed a very high ratio of unreasonable scenarios. 2025.11.07 ISIS2025 Unreasonable Scenario Ratio Category Counts 0.0% 248 10.0% 26 11.1% 2 12.5% 4 20.0% 10 22.2% 2 25.0% 1 37.5% 1 40.0% 4 50.0% 1 87.5% 1 Total category counts 300 12/16
Analysis of the Outlier Category Category: Wraps / Aluminum Foil / Kitchen KitchenPaper Paper>> Kitchen Paper Holder Scenarios (partial): ・ Wiping moisture during cooking ・ Cleaning up spilled drinks on the table ・ Using kitchen paper to absorb food moisture ・ Wiping kitchen utensils These scenarios describe the use of kitchen paper itself, not the holder. The LLM likely over-associated with the parent category ("Kitchen Paper"), ignoring the child’s specific function ("Holder"). This case suggests that ambiguous category names can mislead the LLM. 2025.11.07 ISIS2025 13/16
Tendency in the Position of Unreasonable Scenarios Number of unreasonable scenarios Unreasonable Scenarios by Generation Order Unreasonable scenarios were concentrated in the latter half of the LLM's output sequence. Discarding the latter half of the generated scenarios reduced unreasonable cases by 68.5%. Using all scenarios Using top-5 scenarios Reduction rate # of Unreasonable scenarios 89 cases(3.0%) 28 cases(0.96%) 68.5% # of Unreasonable categories 52 categories (17.3%) 19 categories(6.3%) 63.5% Using only the first-half scenarios is a simple yet effective strategy to improve quality 2025.11.07 ISIS2025 14/16
Conclusion Summary ・82.9% of item usage scenarios generated from category information were judged reasonable by all evaluators. ・Although 17.3% of categories(52 / 300) contained unreasonable scenarios, in most cases, they accounted for only about 1 out of 10. ・Unreasonable scenarios tended to appear in the later part of the LLM's output. Future works ・Identify complementary items from usage scenarios. ・Evaluate human perception of complementarity. Thank you for your kind attention 2025.11.07 ISIS2025 15/16
References [Li+, 2024] Li, L., and Du, Z., “Complementary Recommendation in E-commerce: Definition, Approaches, and Future Directions”, arXiv, arXiv:2403.16135, 2024. [Xu+, 2020] Xu, D., Ruan, C., Cho, J., Korpeoglu, E., Kumar, S., and Achan, K., “Knowledge-aware Complementary Product Representation Learning”, In Proc. 13th Int. Conf. Web Search Data Min., 681-689, 2020. [Sugahara+, 2024] Sugahara, K., Yamasaki, C., and Okamoto, K., “Is It Really Complementary? Revisiting Behavior-based Labels for Complementary Recommendation”, In Proc. 18th ACM Conf. Recomm. Syst., 1091-1095, 2024. [McAuley+, 2015] McAuley, J., Pandey, R., and Leskovec, J., “Inferring Networks of Substitutable and Complementary Products”, In Proc. 21th ACM SIGKDD Int. Conf. Knowl. Discovery Data Min., 785-794, 2015. [Hao+, 2020] Hao, J., Zhao, T., Li, J., Dong, X.L., Faloutsos, C., Sun, Y., and Wang, W., “P-Companion: A Principled Framework for Diversified Complementary Product Recommendation”, In Proc. 29th ACM Int. Conf. Inf. Knowl. Manag., 2517-2524, 2020. 2025.11.07 ISIS2025 16/16
Appendix:Example of a System Design Example of a Complementary Recommendation System Query item Usage scenario Necessary item (Complementary) 2025.11.07 ISIS2025
Appendix:Unreasonable Scenarios We manually categorized the 89 unreasonable scenarios Inappropriate use (35 cases): The proposed scenario contradicts the item's intended function, properties, or common-sense usage. Scenarios: Draining food ingredients in commercial kitchens Category: Kitchen duckboards and mats 2025.11.07 ISIS2025
Appendix:Unreasonable Scenarios Not an item usage scenario (21 cases): These scenarios describe item-related logistics (e.g., purchasing, maintenance) rather than actual usage. Scenarios: Scheduling regular veterinary visits for a guinea pig’s health Category: Guinea pig supplies Suboptimal item for the scenario (15 cases): The item is not the best fit for the proposed scenario, even if physically possible. Scenarios: For temporary document storage at school or work Category: Plastic bag 2025.11.07 ISIS2025
Appendix:Unreasonable Scenarios Use is indirect, auxiliary, or excessively abstract (10 cases): This category included scenarios where the item was technically used but not central to the task. Scenarios: For efficient use when cooking large quantities at once Category: Pot lid Unrealistic scenario (8 cases): A small number of scenarios involved highly improbable situations. Category: Hair removal products 2025.11.07 Scenarios: Hosting a self-hairremoval event at home ISIS2025