Welcome to Hui Wu's website

Research Scientist, IBM Research AI

Welcome to Hui Wu's website

Profile Photo

About Hui Wu

I am a computer vision researcher in the computer vision and multimedia group at IBM Research AI, My current research interest includes vision and language, interactive image retrieval, multi-modal dialogs and computer vision for fashion. Previously at IBM, I worked on food visual recognition, which was integrated into Watson Visual Recognition Cloud API. Prior to joining IBM, I received my PhD in Computer Science from UNC Charlotte, with a thesis focus on manifold learning based machine learning techniques applied to image set analysis problems.

Contact me: wuhu AT us.ibm.com


Selected Projects

The Fashion IQ Dataset

The Fashion IQ Dataset, Retrieving Images by Combining Side Information and Relative Natural Language Feedback
Xiaoxiao Guo*, Hui Wu*, Yupeng Gao, Steven J. Rennie and Rogério S. Feris (* equal contribution)
Arxiv 2019 [PDF] [Project Page]


Fashion IQ is a new and novel dataset we contribute to the research community to facilitate research on natural language based interactive image retrieval. High fidelity interactive image retrieval, despite decades of research and many great strides, remains a research challenge. We believe that Fashion IQ can encourage further work on developing more natural and real-world applicable conversational shopping assistants, and serves as a new benchmark for composing natural language and image for image retrieval. To further research on natural language interaction for image retrieval, we announce Fashion IQ Challenge at ICCV 2019 workshop on Linguistics Meets Image and Video Retrieval.

Dialog-based Interactive Image Retrieval

Dialog-based Interactive Image Retrieval
Xiaoxiao Guo*, Hui Wu*, Yu Cheng, Steven J. Rennie, Gerald Tesauro and Rogério S. Feris (* equal contribution)
NeurIPS 2018 [PDF] [CODE] [DEMO] [Project Page]


We proposed a novel type of dialog agent for the task of interactive image retrieval. Recently, there has been a rapid rise of research interest in visually grounded conversational agents, driven by the progress of deep learning techniques for both image and natural language understanding. A few interesting application scenarios have been explored by recent work, such as collaborative drawing, visual dialog and object guessing game. In this work, we tested the value of visually grounded dialog agents in a practical and yet challenging context. Specifially, we proposed a novel framework of image retrieval system which learns to seek natural and expressive dialog feedbacks from the user and iteratively refine the retrieval result.

Pooling with Stochastic Spatial Sampling

S3pool - Pooling with stochastic spatial sampling
Zhai, Shuangfei, Hui Wu, Abhishek Kumar, Yu Cheng, Yongxi Lu, Zhongfei Zhang, and Rogerio Feris
CVPR 2017 [PDF] [CODE] [Project Page]