>100 Views
September 22, 25
スライド概要
Kaito Terasaki, Taketo Yoneda, Kiyotakashi Takagawa, Hayato Maruyama, Yongzhi Jin, Hibiki Ayabe, Kei Harada, Kazushi Okamoto: Heterogeneous feature integration for behavioral profiles, Proceedings of the Recommender Systems Challenge 2025 (RecSysChallenge 2025), 36-40, 2025.9, Prague, Czech Republic.
Data Science Research Group, The University of Electro-Communications
Heterogeneous Feature Integration for Behavioral Profiles Kaito Terasaki, Taketo Yoneda, Kiyotakashi Takagawa, Hayato Maruyama, Yongzhi Jin, Hibiki Ayabe, Kei Harada, Kazushi Okamoto The University of Electro-Communications 2025.09.22 RecSys Challenge 2025 1 / 21
RecSys Challenge 2025 Overview Aim: Develop Universal Behavioral Profiles — user representations that generalize across tasks Data: Logs of various user actions (product_buy, page_visit, search_query, etc.) Evaluation: Based on performance across multiple downstream tasks This Study Present a solution developed by UEC_bootcamp_2025, awarded 2nd Academic Prize Construct Universal Behavioral Profiles from diverse user features 2025.09.22 RecSys Challenge 2025 2 / 21
Overview of Our Approach product_buy add_to_cart Sequence-based Embeddings (01, 06, 08) remove_from_ cart search_query Graph-based Text-based Embeddings (07) Embeddings (10) page_visit Statistical Features (02, 03, 04, 05, 09) Log Log Transformation Transformation L2 Normalization L2 Normalization + + Weighting Weighting L2 Normalization + Weighting L2 Normalization + Weighting L2 Normalization L2 Normalization + + Weighting Weighting Universal Behavioral Profiles 2025.09.22 RecSys Challenge 2025 3 / 21
Action2Vec (Feature 01) - Action Embedding Obtain action embeddings with Word2Vec[Mikolov+, 2013] using all user behaviors Treat user actions as "words", similar to Word2Vec Word2Vec input format: User Interactions product_buy User 1 add_to_cart remove_from_cart User 2 Word2Vec page_visit Action Embedding search_query User 2025.09.22 RecSys Challenge 2025 4 / 21
Action2Vec (Feature 01) - User Embedding For each user, compute the average embedding per action type Concatenate the five embeddings (product_buy, etc.) product_buy add_to_cart User 1 remove_from_cart Action Embedding page_visit search_query product_buy Embedding add_to_cart Embedding remove_from_cart Embedding page_visit Embedding search_query Embedding Concatenated 250-dim user embedding 2025.09.22 RecSys Challenge 2025 5 / 21
Propensity-oriented add_to_cart profile (Feature 02) Goal: Reflect user interests at category and product levels Target events: add_to_cart Method i. Merge add_to_cart logs with product_properties using SKU ii. Filter by predefined target categories and SKUs iii. For each user: Count add_to_cart actions per target category Count add_to_cart actions per target SKU iv. Concatenate both vectors 200-dim profile Target Category User A 2025.09.22 Target SKU Cat(1) Cat(100) SKU(1) SKU(100) 10 1 4 0 RecSys Challenge 2025 200 dimensions 6 / 21
Weekly-daily action profile (Feature 03) Goal: Capture temporal patterns of user behavior Target events: add_to_cart, page_visit, product_buy, search_query Method: i. Convert timestamps into ISO week number + weekday ii. Aggregate event counts per user time slot iii. 21 weeks (25–45) 7 days = 140 slots iv. Concatenate 4 action types 560-dim profile 2025.09.22 (Week Number, Weekday) User A RecSys Challenge 2025 (25, Mon) (45, Sun) 2 5 140 dimensions × {add_to_cart, page_visit, product_buy, search_query} 7 / 21
Top-50 URL/query interaction profile (Feature 04) Goal: Represent user preferences at the URL and query levels Target events: page_visit, search_query Method i. Identify Top-50 most visited URLs by unique users ii. Identify Top-50 most popular queries iii. Construct user–URL and user–query frequency matrices iv. Concatenate them 100-dim profile URL (Rank) URL(1) User A 9 Query (Rank) URL(50) Query(1) 1 5 Query(50) 1 100 dimensions 2025.09.22 RecSys Challenge 2025 8 / 21
Rule-based Features (Feature 05) Purchase Behavior Total number of purchases Total purchase amount Average purchase date (per period) Average item price Recency & Frequency Days since last purchase Repurchase rate of the same item Cart & Browsing Average purchases after adding to cart Average/variance of item prices total feature dimension: 56 2025.09.22 RecSys Challenge 2025 9 / 21
MacridVAE (Feature 06) MACRo-mIcro Disentangled Variational Auto-Encoder (MacridVAE) disentangles macro-level intentions and micro-level preferences from user behavior [Ma+, 2019] For each user, the model generates 10 concept embeddings (each 32-dim) from their interactions 10 vectors (each 32-dim) are concatenated to form a 320-dimensional user embedding User Interactions User 's Embeddings MacroConcept product_buy User 1 add_to_cart remove_from_cart User 2 MacridVAE page_visit MacroConcept search_query User 2025.09.22 MacroConcept RecSys Challenge 2025 10 / 21
ProNE (Feature 07) ProNE [Zhang+, 2019] Architecture Build a co-occurrence matrix from user activity logs Apply sparse matrix factorization and spectral propagation Obtain n-dimensional distributed representations for each node (e.g., 32-dimensional vectors when n = 32) Aggregate the obtained vectors by client_id 2025.09.22 RecSys Challenge 2025 11 / 21
ProNE (Feature 07) Definition Co-occurrence Sessions Sessions are defined as periods separated by at least 24h of inactivity Only pairs of SKUs and URLs appearing within product_buy, add_to_cart, or page_visit events are counted Each session must contain at least three events If a session exceeds 100 events, it is split into multiple sessions by iteratively slicing off the last 100 events 2025.09.22 RecSys Challenge 2025 12 / 21
ProNE (Feature 07) Client-wise Vector Aggregation Produce 32-dimensional embeddings for each SKU and URL via ProNE Summed embeddings of SKUs and URLs for each user over five backward-looking windows (0–7, 7–14, 14–28, 28–56, and >56 days) and their cumulative totals, anchored at the most recent timestamp Result: nine 32-dimensional vectors per user, concatenated into one user profile 2025.09.22 RecSys Challenge 2025 13 / 21
Doc2Vec (Feature 08) Doc2Vec is an unsupervised algorithm that learns representations of texts [Le+, 2014] Each user’s interaction history is treated as a document Captures higher-level semantics beyond local co-occurrence, unlike Action2Vec Total feature dimension: 50 User Interactions product_buy User 1 add_to_cart remove_from_cart User 2 Doc2Vec page_visit 50-dim User Embeddings search_query User 2025.09.22 RecSys Challenge 2025 14 / 21
RFM Feature (Feature 09) Inspired by Recency–Frequency–Monetary (RFM) analysis, widely used in marketing Recency: The days since the most recent action and the first action Frequency: The total count of each action performed by the client Monetary: The total monetary amount associated with purchases (for product_buy, add_to_cart, and remove_from_cart) Additionally, we include: Number of unique SKUs, queries, categories, days and weeks Client ID To prevent distortion from feature normalization (e.g., layer normalization), we introduce a counterweight feature to maintain overall scaling structure Total feature dimension: 29 2025.09.22 RecSys Challenge 2025 15 / 21
SVD-based Text Embedding (Feature 10) Quantized cluster IDs are treated as "word tokens" Constructed a sparse client–cluster matrix and applied Singular Value Decomposition (SVD) Total featuer dimension: 80 User A's Item Name History (Buy / Add ) Item X's Name User A's Search Query History 167 24 255 Query 1 202 151 Name(Cluster ID, Position) Query(Cluster ID, Position) 236 (0, 0) 16 buckets Item Y's Name 219 Item X's Name 167 48 16 buckets 72 Query 2 24 0 151 Per-User Average Frequency A 0 (255, 15) (0, 16) 2/3 1/2 (255, 31) 0 236 8192 dimensions 255 SVD User A's Embedding 80 dimensions 2025.09.22 RecSys Challenge 2025 16 / 21
Aggregation of Embeddings Steps ID Feature Name 01 Action2Vec 1. Missing values is imputed with mean of Propensity-oriented add_to_cart 02 observed users profile 2. For statistical features, the transformation 03 Weekly-daily action profile log(x+1) + 0.1 is applied to reduce skewness 04 Top-50 URL/query interaction profile 3. Each embedding is individually L2-normalized 05 Rule-based feature 4. Weights are applied to each feature and 06 MacridVAE concatenated 07 ProNE 08 Doc2Vec 09 RFM feature 10 SVD-based Text Embedding 2025.09.22 RecSys Challenge 2025 Dim. Weight 250 3 200 1 560 1 100 1 56 1 320 1 288 3 50 1 29 3 80 3 17 / 21
Experiment Experimental Question (EQ) EQ: How does our approach perform compared to the baseline? Baseline Features: Statistical features: Per-user interaction counts by sku , category , and price over time windows (1/7/30 days) To keep vectors small, use only Top-10 values per column Query features: Per-user mean of the quantized query-embedding vectors from search_query 2025.09.22 RecSys Challenge 2025 18 / 21
Results Most of individual features outperformed the baseline Integration of all features outperformed each individual feature Weighted integration further improved performance 2025.09.22 RecSys Challenge 2025 19 / 21
Conclusion Summary We proposed a user representation learning approach by integrating heterogeneous features Our method outperformed the baseline across multiple tasks (EQ) Future Work Conduct comprehensive ablation studies on feature combinations Analyze inter-feature correlations and marginal contributions to reduce redundancy 2025.09.22 RecSys Challenge 2025 20 / 21
References [Mikolov+, 2013] T. Mikolov, K. Chen, G. Corrado, and J. Dean: Efficient Estimation of Word Representations in Vector Space, arXiv:1301.3781, 2013. [Ma+, 2019] J. Ma, C. Zhou, P. Cui, H. Yang, and W. Zhu: Learning Disentangled Representations for Recommendation, Proc. of NeurIPS 33, pp.5711–5722, 2019. [Le+, 2014] Q. Le and T. Mikolov: Distributed Representations of Sentences and Documents, Proc. of ICML 31, pp.1188–1196, 2014. [Zhang+, 2019] J. Zhang, Y. Dong, Y. Wang, J. Tang, and M. Ding: ProNE: Fast and Scalable Network Representation Learning, Proc. of IJCAI 28, pp.4278–4284, 2019. 2025.09.22 RecSys Challenge 2025 21 / 21