Ph.D. Student in Computer Science
Jan. 2024
Jan. 2028 (expected)
Rensselaer Polytechnic Institute
M.Sc. in Multimedia Information Technology, Distinction
2022
2024
City University of Hong Kong
B.Eng. in Software Engineering - Systems and Technology
2018
2022
University of Electronic Science and Technology of China
My current research focuses on multimodal learning and reasoning, vision-language models (VLMs), multimodal RAG, knowledge-based visual question answering (KB-VQA), and reliable foundation-model evaluation.
I am interested in how foundation models ground non-textual structure, visual entities, and external knowledge. Before moving toward multimodal RAG and KB-VQA, I worked on graph learning, graph self-supervised learning, graph foundation models, spatio-temporal forecasting, time-series modeling, and urban computing.
See my Google Scholar profile for updates.
Department of Computer Science
Rensselaer Polytechnic Institute
Troy, NY, United States