News
2024/02: Our paper "Binding Touch to Everything" is accepted by CVPR 2024.
2024/02: Check out our paper Vision-Flan, the most diverse publicly available visual instruction tuning dataset to date.
2023/12: Our report is accepted by Responsible Language Models Workshop (ReLM) at AAAI 2024.
2023/03: Our paper "Self-Supervised Video Forensics by Audio-Visual Anomaly Detection" is seleted as a highlight by CVPR 2023.
2023/02: Our paper "Self-Supervised Video Forensics by Audio-Visual Anomaly Detection" is accepted by CVPR 2023.
|
Research
I'm interested in computer vision and machine learning.
|
|
GPS-to-3D: Lifting Tourist Photos to 3D Using 2D GPS-Conditioned Diffusion
Chao Feng,
Ziyang Chen,
Aleksander Holynski,
Alexei A. Efros,
Andrew Owens,
In submission
We produce 3D reconstruction for landmarks from unordered collections of tourist photos by GPS conditioned diffusion model and score distillation sampling.
|
|
Binding Touch to Everything: Learning Unified Multimodal Tactile Representations
Fengyu Yang*,
Chao Feng*,
Ziyang Chen*,
Hyoungseob Park,
Daniel Wang,
Yiming Dou,
Ziyao Zeng,
Xien Chen,
Rit Gangopadhyay,
Andrew Owens,
Alex Wong,
CVPR, 2024
project page /
paper
We introduce UniTouch, a unified tactile representation for vision-based tactile sensors aligned with multiple modalities. We show we can now use powerful models trained on other modalities (e.g. CLIP, LLM) to conduct tactile sensing tasks zero shot.
|
|
Vision-Flan: Scaling Human-Labeled Tasks in Visual Instruction Tuning
Zhiyang Xu,
Chao Feng,
Rulin Shao,
Trevor Ashby,
Ying Shen,
Di Jin,
Yu Cheng,
Qifan Wang,
Lifu Huang,
In submission
project page /
paper
We construct Vision-Flan, the most diverse publicly available visual instruction tuning dataset to date.
|
|
Teaching LLMs to Search for Domain Knowledge from Knowledge Graphs
Chao Feng,
Xinyu Zhang,
Responsible Language Models Workshop (ReLM) at AAAI, 2024
We introduce a paradigm teaching large language models (LLMs) to search for essential knowledge from external knowledge graphs (KGs) by harnessing their strong capabilities.
|
|
Self-Supervised Video Forensics by Audio-Visual Anomaly Detection
Chao Feng,
Ziyang Chen,
Andrew Owens,
CVPR, 2023   (Highlight -- 2.5% accept rate)
project page
/
arXiv
/
code
We learn several feature sets in a self-supervised manner by using audio-visual synchronization task and utilize autoregressive model to do anomaly detection on top of each feature set for video forensics detection.
|
|
AVA-AVD: Audio-Visual Speaker Diarization in the Wild
Eric Zhongcong Xu,
Zeyang Song,
Satoshi Tsutsui,
Chao Feng,
Mang Ye,
Mike Zheng Shou,
ACM Multimedia, 2022
project page
/
arXiv
/
code
We create the AVA Audio-Visual Diarization (AVA-AVD) dataset to develop diarization methods for in-the-wild videos.
|
|