Felix Wimbauer

I am a PhD student at the Computer Vision Group supervised by Prof. Dr. Daniel Cremers at the Technical University of Munich. Before that, I finished my M.Sc. in Computer Science at the University of Oxford, where I worked at VGG with Dr. Christian Rupprecht. During undergrad, I also spent a semester at the National University of Singapore.

Email  /  Scholar  /  Twitter  /  LinkedIn  /  Github  /  Group Website

profile photo


February 2024 Cache Me if You Can, alongside ControlRoom3D and Multiview Behind the Scenes were accepted at CVPR 2024.

December 2023 I completed my internship at Meta and the resulting paper Cache Me if You Can is out on ArXiv.

October 2023 Our work S4C on self-supervised semantic scene completion was accepted to 3DV 2024.

August 2023 I started a 14 week internship at Meta GenAI with Jialiang Wang in Menlo Park.

June 2023 Check out our video presentation and demo video of Behind the Scenes.

February 2023 Behind the Scenes was accepted at CVPR 2023.

More updates

January 2023 Pre-Print of new paper Behind the Scenes is online.

April 2022 Started Ph.D. in Computer Vision at the chair of Daniel Cremers.

March 2022 De-render3D was accepted at CVPR 2022.

September 2021 Finished M.Sc. Computer Science at University of Oxford.

March 2021 MonoRec was accepted at CVPR 2021. Check out our video and open-source code.

October 2020 Started M.Sc. Computer Science at University of Oxford.

October 2020 Finished ADL4CV. Check out our video presentation here.

March 2020 Finished my B.Sc. Informatik at TUM.


My aim is to develop methods to understand the 3D world from images and videos. Specifically, I work on methods that can be trained with weak or even no supervision. Through this, I hope to enable more wide-spread applications of such methods, and to improve accuracy by training on large, unlabeled datasets.

Cache Me if You Can: Accelerating Diffusion Models through Block Caching
Felix Wimbauer, Bichen Wu, Edgar Schoenfeld, Xiaoliang Dai, Ji Hou, Zijian He, Artsiom Sanakoyeu, Peizhao Zhang, Sam Tsai, Jonas Kohler, Christian Rupprecht, Daniel Cremers, Peter Vajda, Jialiang Wang
CVPR, 2024
project page / arXiv

We reuse layer computations from previous timesteps to make image generation with diffusion models more efficient.

Boosting Self-Supervision for Single-View Scene Completion via Knowledge Distillation
Keonhee Han*, Dominik Muhle*, Felix Wimbauer, Daniel Cremers,
CVPR, 2024
arXiv (coming soon)

Leveraging multi-view supervision and distillation training to improve volumetric reconstruction from a single image.

ControlRoom3D: Room Generation using Semantic Proxy Rooms
Jonas Schult, Sam Tsai, Lukas Höllein, Bichen Wu, Jialiang Wang, Chih-Yao Ma Kunpeng Li, Xiaofang Wang, Felix Wimbauer, Zijian He, Peizhao Zhang, Bastian Leibe, Peter Vajda, Ji Hou,
CVPR, 2024
project page / arXiv / video

ControlRoom3D creates diverse and plausible 3D room meshes aligning well with user-defined room layouts and textual descriptions of the room style.

S4C: Self-Supervised Semantic Scene Completion with Neural Fields
Adrian Hayler*, Felix Wimbauer*, Dominik Muhle, Christian Rupprecht, Daniel Cremers,
3DV, 2024 (Spotlight)
project page / arXiv / code

A self-supervised method for semantic scene completion, that rivals supervised approaches.

Behind the Scenes: Density Fields for Single View Reconstruction
Felix Wimbauer, Nan Yang, Christian Rupprecht, Daniel Cremers
CVPR, 2023
project page / arXiv / code / video

A self-supervised method for implicit volumetric reconstruction from a single image.

De-rendering 3D Objects in the Wild
Felix Wimbauer, Shangzhe Wu, Christian Rupprecht,
CVPR, 2022
project page / arXiv / code / video

A self-supervised method for intrinsic image decomposition.

MonoRec: Semi-Supervised Dense Reconstruction in Dynamic Environments from a Single Moving Camera
Felix Wimbauer, Nan Yang, Lukas von Stumberg, Niclas Zeller Daniel Cremers
CVPR, 2021
project page / arXiv / code / video

A state-of-the-art semi-supervised monocular dense reconstruction system, that utilizes a multi-view stereo approach with a filter for moving objects to predict depth maps in dynamic environments.