Heterogeneous Feature Integration for Behavioral Profiles

>100 Views

September 22, 25

スライド概要

Kaito Terasaki, Taketo Yoneda, Kiyotakashi Takagawa, Hayato Maruyama, Yongzhi Jin, Hibiki Ayabe, Kei Harada, Kazushi Okamoto: Heterogeneous feature integration for behavioral profiles, Proceedings of the Recommender Systems Challenge 2025 (RecSysChallenge 2025), 36-40, 2025.9, Prague, Czech Republic.

profile-image

Data Science Research Group, The University of Electro-Communications

シェア

またはPlayer版

埋め込む »CMSなどでJSが使えない場合

ダウンロード

関連スライド

各ページのテキスト
1.

Heterogeneous Feature Integration for Behavioral Profiles Kaito Terasaki, Taketo Yoneda, Kiyotakashi Takagawa, Hayato Maruyama, Yongzhi Jin, Hibiki Ayabe, Kei Harada, Kazushi Okamoto The University of Electro-Communications 2025.09.22 RecSys Challenge 2025 1 / 21

2.

RecSys Challenge 2025 Overview Aim: Develop Universal Behavioral Profiles — user representations that generalize across tasks Data: Logs of various user actions (product_buy, page_visit, search_query, etc.) Evaluation: Based on performance across multiple downstream tasks This Study Present a solution developed by UEC_bootcamp_2025, awarded 2nd Academic Prize Construct Universal Behavioral Profiles from diverse user features 2025.09.22 RecSys Challenge 2025 2 / 21

3.

Overview of Our Approach product_buy add_to_cart Sequence-based Embeddings (01, 06, 08) remove_from_ cart search_query Graph-based Text-based Embeddings (07) Embeddings (10) page_visit Statistical Features (02, 03, 04, 05, 09) Log Log Transformation Transformation L2 Normalization L2 Normalization + + Weighting Weighting L2 Normalization + Weighting L2 Normalization + Weighting L2 Normalization L2 Normalization + + Weighting Weighting Universal Behavioral Profiles 2025.09.22 RecSys Challenge 2025 3 / 21

4.

Action2Vec (Feature 01) - Action Embedding Obtain action embeddings with Word2Vec[Mikolov+, 2013] using all user behaviors Treat user actions as "words", similar to Word2Vec Word2Vec input format: User Interactions product_buy User 1 add_to_cart remove_from_cart User 2 Word2Vec page_visit Action Embedding search_query User 2025.09.22 RecSys Challenge 2025 4 / 21

5.

Action2Vec (Feature 01) - User Embedding For each user, compute the average embedding per action type Concatenate the five embeddings (product_buy, etc.) product_buy add_to_cart User 1 remove_from_cart Action Embedding page_visit search_query product_buy Embedding add_to_cart Embedding remove_from_cart Embedding page_visit Embedding search_query Embedding Concatenated 250-dim user embedding 2025.09.22 RecSys Challenge 2025 5 / 21

6.

Propensity-oriented add_to_cart profile (Feature 02) Goal: Reflect user interests at category and product levels Target events: add_to_cart Method i. Merge add_to_cart logs with product_properties using SKU ii. Filter by predefined target categories and SKUs iii. For each user: Count add_to_cart actions per target category Count add_to_cart actions per target SKU iv. Concatenate both vectors 200-dim profile Target Category User A 2025.09.22 Target SKU Cat(1) Cat(100) SKU(1) SKU(100) 10 1 4 0 RecSys Challenge 2025 200 dimensions 6 / 21

7.

Weekly-daily action profile (Feature 03) Goal: Capture temporal patterns of user behavior Target events: add_to_cart, page_visit, product_buy, search_query Method: i. Convert timestamps into ISO week number + weekday ii. Aggregate event counts per user time slot iii. 21 weeks (25–45) 7 days = 140 slots iv. Concatenate 4 action types 560-dim profile 2025.09.22 (Week Number, Weekday) User A RecSys Challenge 2025 (25, Mon) (45, Sun) 2 5 140 dimensions × {add_to_cart, page_visit, product_buy, search_query} 7 / 21

8.

Top-50 URL/query interaction profile (Feature 04) Goal: Represent user preferences at the URL and query levels Target events: page_visit, search_query Method i. Identify Top-50 most visited URLs by unique users ii. Identify Top-50 most popular queries iii. Construct user–URL and user–query frequency matrices iv. Concatenate them 100-dim profile URL (Rank) URL(1) User A 9 Query (Rank) URL(50) Query(1) 1 5 Query(50) 1 100 dimensions 2025.09.22 RecSys Challenge 2025 8 / 21

9.

Rule-based Features (Feature 05) Purchase Behavior Total number of purchases Total purchase amount Average purchase date (per period) Average item price Recency & Frequency Days since last purchase Repurchase rate of the same item Cart & Browsing Average purchases after adding to cart Average/variance of item prices total feature dimension: 56 2025.09.22 RecSys Challenge 2025 9 / 21

10.

MacridVAE (Feature 06) MACRo-mIcro Disentangled Variational Auto-Encoder (MacridVAE) disentangles macro-level intentions and micro-level preferences from user behavior [Ma+, 2019] For each user, the model generates 10 concept embeddings (each 32-dim) from their interactions 10 vectors (each 32-dim) are concatenated to form a 320-dimensional user embedding User Interactions User 's Embeddings MacroConcept product_buy User 1 add_to_cart remove_from_cart User 2 MacridVAE page_visit MacroConcept search_query User 2025.09.22 MacroConcept RecSys Challenge 2025 10 / 21

11.

ProNE (Feature 07) ProNE [Zhang+, 2019] Architecture Build a co-occurrence matrix from user activity logs Apply sparse matrix factorization and spectral propagation Obtain n-dimensional distributed representations for each node (e.g., 32-dimensional vectors when n = 32) Aggregate the obtained vectors by client_id 2025.09.22 RecSys Challenge 2025 11 / 21

12.

ProNE (Feature 07) Definition Co-occurrence Sessions Sessions are defined as periods separated by at least 24h of inactivity Only pairs of SKUs and URLs appearing within product_buy, add_to_cart, or page_visit events are counted Each session must contain at least three events If a session exceeds 100 events, it is split into multiple sessions by iteratively slicing off the last 100 events 2025.09.22 RecSys Challenge 2025 12 / 21

13.

ProNE (Feature 07) Client-wise Vector Aggregation Produce 32-dimensional embeddings for each SKU and URL via ProNE Summed embeddings of SKUs and URLs for each user over five backward-looking windows (0–7, 7–14, 14–28, 28–56, and >56 days) and their cumulative totals, anchored at the most recent timestamp Result: nine 32-dimensional vectors per user, concatenated into one user profile 2025.09.22 RecSys Challenge 2025 13 / 21

14.

Doc2Vec (Feature 08) Doc2Vec is an unsupervised algorithm that learns representations of texts [Le+, 2014] Each user’s interaction history is treated as a document Captures higher-level semantics beyond local co-occurrence, unlike Action2Vec Total feature dimension: 50 User Interactions product_buy User 1 add_to_cart remove_from_cart User 2 Doc2Vec page_visit 50-dim User Embeddings search_query User 2025.09.22 RecSys Challenge 2025 14 / 21

15.

RFM Feature (Feature 09) Inspired by Recency–Frequency–Monetary (RFM) analysis, widely used in marketing Recency: The days since the most recent action and the first action Frequency: The total count of each action performed by the client Monetary: The total monetary amount associated with purchases (for product_buy, add_to_cart, and remove_from_cart) Additionally, we include: Number of unique SKUs, queries, categories, days and weeks Client ID To prevent distortion from feature normalization (e.g., layer normalization), we introduce a counterweight feature to maintain overall scaling structure Total feature dimension: 29 2025.09.22 RecSys Challenge 2025 15 / 21

16.

SVD-based Text Embedding (Feature 10) Quantized cluster IDs are treated as "word tokens" Constructed a sparse client–cluster matrix and applied Singular Value Decomposition (SVD) Total featuer dimension: 80 User A's Item Name History (Buy / Add ) Item X's Name User A's Search Query History 167 24 255 Query 1 202 151 Name(Cluster ID, Position) Query(Cluster ID, Position) 236 (0, 0) 16 buckets Item Y's Name 219 Item X's Name 167 48 16 buckets 72 Query 2 24 0 151 Per-User Average Frequency A 0 (255, 15) (0, 16) 2/3 1/2 (255, 31) 0 236 8192 dimensions 255 SVD User A's Embedding 80 dimensions 2025.09.22 RecSys Challenge 2025 16 / 21

17.

Aggregation of Embeddings Steps ID Feature Name 01 Action2Vec 1. Missing values is imputed with mean of Propensity-oriented add_to_cart 02 observed users profile 2. For statistical features, the transformation 03 Weekly-daily action profile log(x+1) + 0.1 is applied to reduce skewness 04 Top-50 URL/query interaction profile 3. Each embedding is individually L2-normalized 05 Rule-based feature 4. Weights are applied to each feature and 06 MacridVAE concatenated 07 ProNE 08 Doc2Vec 09 RFM feature 10 SVD-based Text Embedding 2025.09.22 RecSys Challenge 2025 Dim. Weight 250 3 200 1 560 1 100 1 56 1 320 1 288 3 50 1 29 3 80 3 17 / 21

18.

Experiment Experimental Question (EQ) EQ: How does our approach perform compared to the baseline? Baseline Features: Statistical features: Per-user interaction counts by sku , category , and price over time windows (1/7/30 days) To keep vectors small, use only Top-10 values per column Query features: Per-user mean of the quantized query-embedding vectors from search_query 2025.09.22 RecSys Challenge 2025 18 / 21

19.

Results Most of individual features outperformed the baseline Integration of all features outperformed each individual feature Weighted integration further improved performance 2025.09.22 RecSys Challenge 2025 19 / 21

20.

Conclusion Summary We proposed a user representation learning approach by integrating heterogeneous features Our method outperformed the baseline across multiple tasks (EQ) Future Work Conduct comprehensive ablation studies on feature combinations Analyze inter-feature correlations and marginal contributions to reduce redundancy 2025.09.22 RecSys Challenge 2025 20 / 21

21.

References [Mikolov+, 2013] T. Mikolov, K. Chen, G. Corrado, and J. Dean: Efficient Estimation of Word Representations in Vector Space, arXiv:1301.3781, 2013. [Ma+, 2019] J. Ma, C. Zhou, P. Cui, H. Yang, and W. Zhu: Learning Disentangled Representations for Recommendation, Proc. of NeurIPS 33, pp.5711–5722, 2019. [Le+, 2014] Q. Le and T. Mikolov: Distributed Representations of Sentences and Documents, Proc. of ICML 31, pp.1188–1196, 2014. [Zhang+, 2019] J. Zhang, Y. Dong, Y. Wang, J. Tang, and M. Ding: ProNE: Fast and Scalable Network Representation Learning, Proc. of IJCAI 28, pp.4278–4284, 2019. 2025.09.22 RecSys Challenge 2025 21 / 21