Chao Feng

I am a second-year PhD student in Computer Science at Cornell Tech, Cornell University, working with Andrew Owens. Prior to that, I was a master's student at the University of Michigan (UMich). Feel free to contact me.

Email: cf583 at cornell dot edu

Google Scholar / Github

News

2025/09: Two papers are accepted by NeurIPS 2025.
2025/08: One paper is accepted by EMNLP 2025 (Findings).

Research

I'm interested in computer vision, multimodal learning, and generative models. Please see Google Scholar.

GPS as a Control Signal for Image Generation
Chao Feng, Ziyang Chen, Aleksander Holynski, Alexei A. Efros, Andrew Owens,
CVPR, 2025
project page / paper

This&That: Language-Gesture Controlled Video Generation for Robot Planning
Boyang Wang , Nikhil Sridhar, Chao Feng, Mark Van der Merwe, Adam Fishman,
Nima Fazeli, Jeong Joon Park,
ICRA, 2025
project page / paper

We introduce This&That, a framework that generates videos from text instructions and gestures for robot planning.

Binding Touch to Everything: Learning Unified Multimodal Tactile Representations
Fengyu Yang*, Chao Feng*, Ziyang Chen*, Hyoungseob Park, Daniel Wang, Yiming Dou,
Ziyao Zeng, Xien Chen, Rit Gangopadhyay, Andrew Owens, Alex Wong,
CVPR, 2024
project page / paper / code

We introduce UniTouch, a unified tactile representation for vision-based tactile sensors aligned with multiple modalities. We show we can now use powerful models trained on other modalities (e.g. CLIP, LLM) to conduct tactile sensing tasks zero shot.

Vision-Flan: Scaling Human-Labeled Tasks in Visual Instruction Tuning
Zhiyang Xu, Chao Feng, Rulin Shao, Trevor Ashby, Ying Shen, Di Jin,
Yu Cheng, Qifan Wang, Lifu Huang,
ACL, 2024 (Findings)
project page / paper

We construct Vision-Flan, the most diverse publicly available visual instruction tuning dataset to date.

Self-Supervised Video Forensics by Audio-Visual Anomaly Detection
Chao Feng, Ziyang Chen, Andrew Owens,
CVPR, 2023 (Highlight)
project page / arXiv / code

We learn several feature sets in a self-supervised manner by using audio-visual synchronization task and utilize autoregressive model to do anomaly detection on top of each feature set for video forensics detection.

AVA-AVD: Audio-Visual Speaker Diarization in the Wild
Eric Zhongcong Xu, Zeyang Song, Satoshi Tsutsui, Chao Feng, Mang Ye, Mike Zheng Shou,
ACM Multimedia, 2022
project page / arXiv / code

We create the AVA Audio-Visual Diarization (AVA-AVD) dataset to develop diarization methods for in-the-wild videos.

Service

CVPR 2022/2024, WACV 2023, ACM MM 2023, ICCV 2023, ECCV 2024, NeurIPS 2024, ICRA 2025, ICLR 2025, AISTATS 2025, TPAMI.

Credit