Yicong Hong

( Neo Orion )

Building embodied intelligence in real and virtual worlds.

scroll
About
Yicong Hong

I am a researcher at a Stealth Startup.

Recently, my work has focused on building interactive video world models, pursuing extended duration (Progressive Diffusion), long-horizon memory (Test-Time Training), consistent geometry (WorldCam), high modeling efficiency (Hydra-Transformer-Hybrid), and real-time control (RELIC).

See our latest work — The RELIC World Model !!!

I believe this direction paves the way for creating immersive virtual worlds and supports embodied intelligence in learning and reasoning about the real world.

Prior to this, I created the Recurrent Transformers and the Large Reconstruction Models — two projects I'm particularly proud of.

I completed my Ph.D. (Dec 2023) at the CECC at the Australian National University, advised by Prof. Stephen Gould, Prof. Qi Wu, and Prof. Lexing Xie.

Highlights

Featured Research

Publications

Research Journey

2025
Preprint

RELIC: Interactive World Model

Yicong Hong, Yiqun Mei, Chongjian Ge, Yiran Xu, Yang Zhou, Sai Bi, Yannick Hold-Geoffroy, Mike Roberts, Matthew Fisher, Eli Shechtman, Kalyan Sunkavalli, Feng Liu, Zhengqi Li, Hao Tan

Preprint 2025

WorldCam: Interactive Autoregressive 3D Gaming Worlds with Camera Pose as a Unifying Geometric Representation

Jisu Nam, Yicong Hong, Chun-Hao Paul Huang, Feng Liu, JoungBin Lee, Jiyoung Kim, Siyoon Jin, Yunsung Lee, Jaeyoon Jung, Suhwan Choi, Seungryong Kim, Yang Zhou

Preprint

Test-Time Training Done Right

Tianyuan Zhang, Sai Bi, Yicong Hong, Kai Zhang, Fujun Luan, Songlin Yang, Kalyan Sunkavalli, William T. Freeman, Hao Tan

Preprint

Hydra-Transformer-Hybrid for Efficient Generation

Yicong Hong*, Jiuxiang Gu*, Weidong Cai, Hao Tan

ICCV 2025

Long-LRM: Long-Sequence Large Reconstruction Model for Wide-Coverage Gaussian Splats

Chen Ziwen, Hao Tan, Kai Zhang, Sai Bi, Fujun Luan, Yicong Hong, Li Fuxin, Zexiang Xu

ICCV 2025

VEGGIE: Instructional Editing and Reasoning Video Concepts with Grounded Generation

Shoubin Yu, Difan Liu, Ziqiao Ma, Yicong Hong, Yang Zhou, Hao Tan, Joyce Chai, Mohit Bansal

Preprint 2025

Coarse-to-Real: Generative Rendering for Populated Dynamic Scenes

Gonzalo Gomez-Nogales, Yicong Hong, Chongjian Ge, Marc Comino-Trinidad, Dan Casas, Yi Zhou

Preprint 2025

Rethinking Training Dynamics in Scale-wise Autoregressive Generation

Gengze Zhou, Chongjian Ge, Hao Tan, Feng Liu, Yicong Hong

ICCV 2025

SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts

Gengze Zhou, Yicong Hong, Zun Wang, Chongyang Zhao, Mohit Bansal, Qi Wu

2024
Preprint 2024

Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel

Zun Wang, Jialu Li, Yicong Hong, Songze Li, Kunchang Li, Shoubin Yu, Yi Wang, Yu Qiao, Yali Wang, Mohit Bansal, Limin Wang

ECCV 2024

NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models

Gengze Zhou, Yicong Hong, Zun Wang, Xin Eric Wang, Qi Wu

CVPR 2025 Workshop

Progressive Autoregressive Video Diffusion Models

Desai Xie, Zhan Xu, Yicong Hong, Hao Tan, Difan Liu, Feng Liu, Arie Kaufman, Yang Zhou

ICLR 2024 · Oral

LRM: Large Reconstruction Model for Single Image to 3D

Yicong Hong, Kai Zhang, Jiuxiang Gu, Sai Bi, Yang Zhou, Difan Liu, Feng Liu, Kalyan Sunkavalli, Trung Bui, Hao Tan

ICLR 2024 · Poster

Instant3D: Fast Text-to-3D with Sparse-View Generation and Large Reconstruction Model

Jiahao Li, Hao Tan, Kai Zhang, Zexiang Xu, Fujun Luan, Yinghao Xu, Yicong Hong, Kalyan Sunkavalli, Greg Shakhnarovich, Sai Bi

2023
AAAI 2024

NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models

Gengze Zhou, Yicong Hong, Qi Wu

ICCV 2023 · Oral

Scaling Data Generation in Vision-and-Language Navigation

Zun Wang, Jialu Li, Yicong Hong, Yi Wang, Qi Wu, Mohit Bansal, Stephen Gould, Hao Tan, Yu Qiao

ICCV 2023

Learning Navigational Visual Representations with Semantic Map Supervision

Yicong Hong, Yang Zhou, Ruiyi Zhang, Franck Dernoncourt, Trung Bui, Stephen Gould, Hao Tan

2022
TPAMI 2022

HOP+: History-Enhanced and Order-Aware Pre-Training for Vision-and-Language Navigation

Yanyuan Qiao, Yuankai Qi, Yicong Hong, Zheng Yu, Peng Wang, Qi Wu

CVPR 2022

Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation

Yicong Hong, Zun Wang, Qi Wu, Stephen Gould

CVPR 2022

HOP: History-and-Order Aware Pre-training for Vision-and-Language Navigation

Yanyuan Qiao, Yuankai Qi, Yicong Hong, Peng Wang, Qi Wu

CVPR 2022 Workshop · 1st Place

1st Place Solutions for RxR-Habitat Vision-and-Language Navigation Competition

Dong An, Zun Wang, Yangguang Li, Yi Wang, Yicong Hong, Yan Huang, Liang Wang, Jing Shao

2021
CVPR 2021 · Oral

A Recurrent Vision-and-Language BERT for Navigation

Yicong Hong, Qi Wu, Yuankai Qi, Cristian Rodriguez-Opazo, Stephen Gould

ICCV 2021

The Road to Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation

Yuankai Qi, Zizheng Pan, Yicong Hong, Ming-Hsuan Yang, Anton van den Hengel, Qi Wu

2020
NeurIPS 2020

Language and Visual Entity Relationship Graph for Agent Navigation

Yicong Hong, Cristian Rodriguez-Opazo, Yuankai Qi, Qi Wu, Stephen Gould

EMNLP 2020

Sub-Instruction Aware Vision-and-Language Navigation

Yicong Hong, Cristian Rodriguez-Opazo, Qi Wu, Stephen Gould