Guangyue Xu

I am currently a Senior Data Scientist at Search@Target. Previously, I pursued a Ph.D. at Michigan State University, where I focused on Machine Learning (ML) and Natural Language Processing (NLP). My research centers on pre-training large vision-language models and enhancing their generalization capabilities, with applications in e-commerce search.

I received my B.E. in Software Engineering from Jilin University and M.S. in Computer Science from Tsinghua University. I also interned in MSRA's Web Search and Mining Group.

Email CV Google Scholar GitHub

News

Mar 2026 Paper New preprint: Beyond Text: Aligning Vision and Language for Multimodal E-Commerce Retrieval is on arXiv.

Feb 2026 Paper New preprint: Unified Learning-to-Rank for Multi-Channel Retrieval in Large-Scale E-Commerce Search is on arXiv.

Posts

Beyond Text: How We Built Multimodal Retrieval for E-Commerce Search

Mar 10, 2026 · multimodal e-commerce

Most e-commerce search systems rely purely on text — but product images carry signals that text often misses. We propose a two-stage vision-language alignment framework achieving +5% Recall@100 over text-only baselines.

Multi-Channel Retrieval Fusion: From Heuristic Weights to Unified Ranking

May 2, 2025 · Search Learning-to-Rank

E-commerce search pulls candidates from text, semantic, and behavioral channels — then merges them with hand-tuned weights. We replace that with a unified LTR model, deployed at Target.com with +2.85% conversion lift.

View all posts →

Selected Publications

Beyond Text: Aligning Vision and Language for Multimodal E-Commerce Retrieval New

Guangyue Xu*, Qujiaheng Zhang*, Fengjie Li (* co-first authors)

arXiv preprint arXiv:2603.04836, 2026

arXiv

We study unified text-image fusion for two-tower retrieval models in e-commerce, proposing a novel modality fusion network that captures cross-modal complementary information between product text and images.

Unified Learning-to-Rank for Multi-Channel Retrieval in Large-Scale E-Commerce Search New

Guangyue Xu*, A. Gaydhani*, D. Kamath, A. Singh, A. Li (* co-first authors)

arXiv preprint arXiv:2602.23530, 2026

arXiv

A unified learning-to-rank framework that jointly optimizes ranking across multiple retrieval channels in large-scale e-commerce search, improving relevance and efficiency.

GIPCOL: Graph-Injected Soft Prompting for Compositional Zero-Shot Learning

Guangyue Xu, Parisa Kordjamshid, Joyce Chai

WACV, 2024

project page arXiv

We introduce a GNN into soft-prompting design to improve CLIP's compositional zero-shot learning ability.

MetaReVision: Meta-Learning with Retrieval for Visually Grounded Compositional Concept Acquisition

Guangyue Xu, Parisa Kordjamshid, Joyce Chai

EMNLP-Findings, 2023

project page arXiv

We meta-train vision-language models using retrieved items to obtain more generalizable token representations and improve compositional ability.

Prompting Large Pre-Trained Vision-Language Models for Compositional Concept Learning

Guangyue Xu, Parisa Kordjamshid, Joyce Chai

arXiv, 2022

arXiv

We systematically investigate various prompting techniques for CLIP in compositional zero-shot learning.

Full list on Google Scholar →

Service

Program Committee / Reviewer: EACL, EMNLP, ACM MM, ACL, WACV, LREC, ICLR, NeurIPS, ...