Talking Papers Podcast

Itzik Ben-Shabat

🎙️ Welcome to the Talking Papers Podcast: Where Research Meets Conversation 🌟

Are you ready to explore the fascinating world of cutting-edge research in computer vision, machine learning, artificial intelligence, graphics, and beyond? Join us on this podcast by researchers, for researchers, as we venture into the heart of groundbreaking academic papers.

At Talking Papers, we've reimagined the way research is shared. In each episode, we engage in insightful discussions with the main authors of academic papers, offering you a unique opportunity to dive deep into the minds behind the innovation.

📚 Structure That Resembles a Paper 📝
Just like a well-structured research paper, each episode takes you on a journey through the academic landscape. We provide a concise TL;DR (abstract) to set the stage, followed by a thorough exploration of related work, approach, results, conclusions, and a peek into future work.

🔍 Peer Review Unveiled: "What Did Reviewer 2 Say?" 📢
But that's not all! We bring you an exclusive bonus section where authors candidly share their experiences in the peer review process. Discover the insights, challenges, and triumphs behind the scenes of academic publishing.

🚀 Join the Conversation 💬
Whether you're a seasoned researcher or an enthusiast eager to explore the frontiers of knowledge, Talking Papers Podcast is your gateway to in-depth, engaging discussions with the experts shaping the future of technology and science.

🎧 Tune In and Stay Informed 🌐
Don't miss out on the latest in research and innovation.
Subscribe and stay tuned for our enlightening episodes. Welcome to the future of research dissemination – welcome to Talking Papers Podcast!
Enjoy the journey! 🌠
#TalkingPapersPodcast #ResearchDissemination #AcademicInsights

read less

HMD-NeMo - Sadegh Aliakbarian
Yesterday
HMD-NeMo - Sadegh Aliakbarian
🎙️Join us on this exciting episode of the Talking Papers Podcast as we sit down with the talented Sadegh Aliakbarian to explore his groundbreaking ICCV 2023 paper "HMD-NeMo: Online 3D Avatar Motion Generation From Sparse Observations" . Our guest, will take us on a journey through this pivotal research that addresses a crucial aspect of immersive mixed reality experiences.🌟 The quality of these experiences hinges on generating plausible and precise full-body avatar motion, a challenge given the limited input signals provided by Head-Mounted Devices (HMDs), typically head and hands 6-DoF. While recent approaches have made strides in generating full-body motion from such inputs, they assume full hand visibility. This assumption, however, doesn't hold in scenarios without motion controllers, relying instead on egocentric hand tracking, which can lead to partial hand visibility due to the HMD's field of view.🧠 "HMD-NeMo" presents a groundbreaking solution, offering a unified approach to generating realistic full-body motion even when hands are only partially visible. This lightweight neural network operates in real-time, incorporating a spatio-temporal encoder with adaptable mask tokens, ensuring plausible motion in the absence of complete hand observations.👤 Sadegh is currently a senior research scientist at Microsoft Mixed Reality and AI Lab-Cambridge (UK), where he's at the forefront of Microsoft Mesh and avatar motion generation. He holds a PhD from the Australian National University, where he specialized in generative modeling of human motion. His research journey includes internships at Amazon AI, Five AI, and Qualcomm AI Research, focusing on generative models, representation learning, and adversarial examples.🤝 We first crossed paths during our time at the Australian Centre for Robotic Vision (ACRV), where Sadegh was pursuing his PhD, and I was embarking on my postdoctoral journey. During this time, I had the privilege of collaborating with another co-author of the paper, Fatemeh Saleh, who also happens to be Sadegh's life partner. It's been incredible to witness their continued growth. 🚀 Join us as we uncover the critical advancements brought by "HMD-NeMo" and their implications for the future of mixed reality experiences. Stay tuned for the episode release! All links and resources are available in the blogpost: https://www.itzikbs.com/hmdnemo🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com 📧Subscribe to our mailing list: http://eepurl.com/hRznqb 🐦Follow us on Twitter: https://twitter.com/talking_papers 🎥YouTube Channel: https://bit.ly/3eQOgwP
CC3D - Jeong Joon Park
2d ago
CC3D - Jeong Joon Park
Join us on this exciting episode of the Talking Papers Podcast as we sit down with the brilliant Jeong Joon Park to explore his groundbreaking paper, "CC3D: Layout-Conditioned Generation of Compositional 3D Scenes," just published at ICCV 2023.Discover CC3D, a game-changing conditional generative model redefining 3D scene synthesis. Unlike traditional 3D GANs, CC3D boldly crafts complex scenes with multiple objects, guided by 2D semantic layouts. With a novel 3D field representation, CC3D delivers efficiency and superior scene quality. Get ready for a deep dive into the future of 3D scene generation.My journey with Jeong Joon Park began with his influential SDF paper at CVPR 2019. We met in person at CVPR 2022, thanks to mutual guest Despoina,  who was also a guest on our podcast. Now, as Assistant Professor at the University of Michigan CSE, JJ leads research in realistic 3D content generation, offering opportunities for students to contribute to the frontiers of computer vision and AI.Don't miss this insightful exploration of this ICCV 2023 paper and the future of 3D scene synthesis.CC3D: Layout-Conditioned Generation of Compositional 3D ScenesAuthorsSherwin Bahmani, Jeong Joon Park, Despoina Paschalidou, Xingguang Yan, Gordon Wetzstein, Leonidas Guibas, Andrea TagliasacchiAbstractIn this work, we introduce CC3D, a conditional generative model that synthesizes complex 3D scenes conditioned on 2D semantic scene layouts, trained using single-view images. Different from most existing 3D GANs that limit their applicability to aligned single objects, we focus on generating complex scenes with multiple objects, by modeling the compositional nature of 3D scenes. By devising a 2D layout-based approach for 3D synthesis and implementing a new 3D field representation with a stronger geometric inductive bias, we have created a 3D GAN that is both efficient and of high quality, while allowing for a more controllable generation process. Our evaluations on synthetic 3D-FRONT and real-world KITTI-360 datasets demonstrate that our model generates scenes of improved visual and geometric quality in comparison to previous works.All links and resources are available on the blog post: https://www.itzikbs.com/cc3d Subscribe and stay tuned! 🚀🔍🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com 📧Subscribe to our mailing list: http://eepurl.com/hRznqb 🐦Follow us on Twitter: https://twitter.com/talking_papers 🎥YouTube Channel: https://bit.ly/3eQOgwP
NeRF-Det - Chenfeng Xu
Sep 6 2023
NeRF-Det - Chenfeng Xu
Welcome to another exciting episode of the Talking Papers Podcast! In this installment, I had the pleasure of hosting Chengfenfg Xu to discuss his paper "NeRF-Det: Learning Geometry-Aware Volumetric Representation for Multi-View 3D Object Detection" which was published at ICCV2023.In recent times, NeRF has gained widespread prominence, and the field of 3D detection has encountered well-recognized challenges. The principal contribution of this study lies in its ability to address the detection task while simultaneously training a NeRF model and enabling it to generalize to previously unobserved scenes. Although the computer vision community has been actively addressing various tasks related to images and point clouds for an extended period, it is particularly invigorating to witness the application of NeRF representation in tackling this specific challenge.Chenfeng is currently a Ph.D. candidate at UC Berkeley, collaborating with Prof. Masayoshi Tomizuka and Prof. Kurt Keutzer. His affiliations include Berkeley DeepDrive (BDD) and Berkeley AI Research (BAIR), along with the MSC lab and PALLAS. His research endeavors revolve around enhancing computational and data efficiency in machine perception, with a primary focus on temporal-3D scenes and their downstream applications. He brings together traditionally separate approaches from geometric computing and deep learning to establish both theoretical frameworks and practical algorithms for temporal-3D representations. His work spans a wide range of applications, including autonomous driving, robotics, AR/VR, and consistently demonstrates remarkable efficiency through extensive experimentation. I am eagerly looking forward to see his upcoming research papers. PAPERNeRF-Det: Learning Geometry-Aware Volumetric Representation for Multi-View 3D Object DetectionAUTHORSChenfeng Xu, Bichen Wu, Ji Hou, Sam Tsai, Ruilong Li, Jialiang Wang, Wei Zhan, Zijian He, Peter Vajda, Kurt Keutzer, Masayoshi TomizukaABSTRACTNeRF-Det is a novel method for 3D detection with posed RGB images as input. Our method makes novel use of NeRF in an end-to-end manner to explicitly estimate 3D geometry, thereby improving 3D detection performance. Specifically, to avoid the significant extra latency associated with per-scene optimization of NeRF, we introduce sufficient geometry priors to enhance the generalizability of NeRF-MLP. We subtly connect the detection and NeRF branches through a shared MLP, enabling an efficient adaptation of NeRF to detection and yielding geometry-aware volumetric representations for 3D detection. As a result of our joint-training design, NeRF-Det is able to generalize well to unseen scenes for object detection, view synthesis, and depth estimation tasks without per-scene optimization.All links and resources are available on the blog post: https://www.itzikbs.com/nerf-det🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com 📧Subscribe to our mailing list: http://eepurl.com/hRznqb 🐦Follow us on Twitter: https://twitter.com/talking_papers 🎥YouTube Channel: https://bit.ly/3eQOgwP
MagicPony - Tomas Jakab
Aug 9 2023
MagicPony - Tomas Jakab
Welcome to another exciting episode of the Talking Papers Podcast! In this installment, I had the pleasure of hosting Tomas Jakab to discuss his paper "MagicPony: Learning Articulated 3D Animals in the Wild" which was published at CVPR 2023.The motivation behind the MagicPony methodology stems from the challenge posed by the scarcity of labeled data, particularly when dealing with real-world scenarios involving freely moving articulated 3D animals. In response, the authors propose an innovative solution that addresses this issue. This novel approach takes an ordinary RGB image as input and produces a sophisticated 3D model with detailed shape, texture, and lighting characteristics. The method's uniqueness lies in its ability to learn from diverse images captured in natural settings, effectively deciphering the inherent differences between them. This enables the system to establish a foundational average shape while accounting for specific deformations that vary from instance to instance. To achieve this, the researchers blend the strengths of two techniques, radiance fields and meshes, which together contribute to the comprehensive representation of the object's attributes. Additionally, the method employs a strategic viewpoint sampling technique to enhance computational speed. While the current results may not be suitable for practical applications just yet, this endeavor constitutes a substantial advancement in the field, as demonstrated by the tangible improvements showcased both quantitatively and qualitatively.AUTHORSShangzhe Wu*, Ruining Li*, Tomas Jakab*, Christian Rupprecht, Andrea VedaldiABSTRACTWe consider the problem of learning a function that can estimate the 3D shape, articulation, viewpoint, texture, and lighting of an articulated animal like a horse, given a single test image. We present a new method, dubbed MagicPony, that learns this function purely from in-the-wild single-view images of the object category, with minimal assumptions about the topology of deformation. At its core is an implicit-explicit representation of articulated shape and appearance, combining the strengths of neural fields and meshes. In order to help the model understand an object's shape and pose, we distil the knowledge captured by an off-the-shelf self-supervised vision transformer and fuse it into the 3D model. To overcome common local optima in viewpoint estimation, we further introduce a new viewpoint sampling scheme that comes at no added training cost. Compared to prior works, we show significant quantitative and qualitative improvements on this challenging task. The model also demonstrates excellent generalisation in reconstructing abstract drawings and artefacts, despite the fact that it is only trained on real images.RELATED PAPERS📚CMR📚Deep Marching Tetrahedra📚DINO-ViTLINKS AND RESOURCES📚 Paper💻 Project page💻 CodeCONTACTIf you would like to be a guest, sponsor or just share your thoughts, feel free to reach out via email: talking.papers.podcast@gmail.comAll links are available in the blog post: https://www.itzikbs.com/magicpony🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com 📧Subscribe to our mailing list: http://eepurl.com/hRznqb 🐦Follow us on Twitter: https://twitter.com/talking_papers 🎥YouTube Channel: https://bit.ly/3eQOgwP
Word-As-Image - Shir Iluz
Jul 20 2023
Word-As-Image - Shir Iluz
All links are available in this blog postWelcome to another exciting episode of the Talking Papers Podcast! In this installment, I had the pleasure of hosting Shir Iluz to discuss her groundbreaking paper titled "Word-As-Image for Semantic Typography" which won the SIGGRAPH 2023 Honorable Mention award.This scientific paper introduces an innovative approach for text morphing based on semantic context. Using bezier curves with control points, a rasterizer, and a vector diffusion model, the authors transform words like "bunny" into captivating bunny-shaped letters. Their optimization-based method accurately conveys the word's meaning. They address the readability-semantic balance with multiple loss functions, serving as "control knobs" for users to fine-tune results. The paper's compelling results are showcased in an impressive demo. Don't miss it!Their work carries immense potential, promising to revolutionize the creative processes of artists and designers. Rather than commencing from a traditional blank canvas or plain font, this innovative approach enables individuals to initiate their logo design journey by transforming a word into a captivating image. The implications of this novel technique hold the power to reshape the very workflow of artistic expression, opening up exciting new possibilities for visual communication and design aesthetics.I am eagerly anticipating the next set of papers she will sketch out (pun intended).AUTHORSShir Iluz, Yael Vinker, Amir Hertz, Daniel Berio, Daniel Cohen-Or, Ariel ShamirABSTRACTA word-as-image is a semantic typography technique where a word illustration presents a visualization of the meaning of the word, while also preserving its readability. We present a method to create word-as-image illustrations automatically. This task is highly challenging as it requires semantic understanding of the word and a creative idea of where and how to depict these semantics in a visually pleasing and legible manner. We rely on the remarkable ability of recent large pretrained language-vision models to distill textual concepts visually. We target simple, concise, black-and-white designs that convey the semantics clearly. We deliberately do not change the color or texture of the letters and do not use embellishments. Our method optimizes the outline of each letter to convey the desired concept, guided by a pretrained Stable Diffusion model. We incorporate additional loss terms to ensure the legibility of the text and the preservation of the style of the font. We show high quality and engaging results on numerous examples and compare to alternative techniques.RELATED PAPERS📚VectorFusionLINKS AND RESOURCES📚 Paper💻 Project page 💻 Code 💻 DemoCONTACTIf you would like to be a guest, sponsor or just share your thoughts, feel free to reach out via email: talking.papers.podcast@gmail.com🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com 📧Subscribe to our mailing list: http://eepurl.com/hRznqb 🐦Follow us on Twitter: https://twitter.com/talking_papers 🎥YouTube Channel: https://bit.ly/3eQOgwP
Panoptic Lifting - Yawar Siddiqui
Jul 10 2023
Panoptic Lifting - Yawar Siddiqui
In this episode of the Talking Papers Podcast, I hosted Yawar Siddiqui to chat about his CVPR 2023 paper "Panoptic Lifting for 3D Scene Understanding with Neural Fields".All links are available in the blog post.In this paper, they proposed a new method for "lifting" 2D panoptic segmentation into a 3D volume represented as neural fields using in-the-wild scene images. While the semantic segmentation part is simply represented as an MLP, the instance indices are very difficult to keep track of in between the different frames. This is solved using a Hungarian algorithm and a set of custom losses. Yawar is currently a PhD student at the Technical University of Munich (TUM) under the supervision of Prof. Matthias Niessner. This work was done as part of his latest internship with Meta Zurich.  It was a pleasure chatting with him and I can't wait to see what he cooks up next. AUTHORSYawar Siddiqui, Lorenzo Porzi, Samuel Rota Bulò, Norman Müller, Matthias Nießner, Angela Dai, Peter KontschiederABSTRACTWe propose Panoptic Lifting, a novel approach for learning panoptic 3D volumetric representations from images of in-the-wild scenes. Once trained, our model can render color images together with 3D-consistent panoptic segmentation from novel viewpoints. Unlike existing approaches which use 3D input directly or indirectly, our method requires only machine-generated 2D panoptic segmentation masks inferred from a pre-trained network. Our core contribution is a panoptic lifting scheme based on a neural field representation that generates a unified and multi-view consistent, 3D panoptic representation of the scene. To account for inconsistencies of 2D instance identifiers across views, we solve a linear assignment with a cost based on the model's current predictions and the machine-generated segmentation masks, thus enabling us to lift 2D instances to 3D in a consistent way. We further propose and ablate contributions that make our method more robust to noisy, machine-generated labels, including test-time augmentations for confidence estimates, segment consistency loss, bounded segmentation fields, and gradient stopping. Experimental results validate our approach on the challenging Hypersim, Replica, and ScanNet datasets, improving by 8.4, 13.8, and 10.6% in scene-level PQ over state of the art.SPONSORThis episode was sponsored by YOOM. YOOM is an Israeli startup dedicated to volumetric video creation. They were voted as the 2022 best start-up to work for by Dun’s 100.Join their team that works on geometric deep learning research, implicit representations of 3D humans, NeRFs, and 3D/4D generative models.Visit YOOMFor job opportunities with YOOM visit https://www.yoom.com/careers/CONTACTIf you would like to be a guest, sponsor or just share your thoughts, feel free to reach out via email: talking.papers.podcast@gmail.comThis episode was recorded on  July 6th,  2023.#talkingpapers #CVPR2023 #PanopticLifting #NeRF #TensoRF #AI #Segmentation #DeepLearning #MachineLearning #research #artificialintelligence #podcasts #MachineLearning #research #artificialintelligence #podcasts🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com 📧Subscribe to our mailing list: http://eepurl.com/hRznqb 🐦Follow us on Twitter: https://twitter.com/talking_papers 🎥YouTube Channel: https://bit.ly/3eQOgwP
MobileBrick - Kejie Li
Jun 14 2023
MobileBrick - Kejie Li
In this episode of the Talking Papers Podcast, I hosted Kejie Li to chat about his CVPR 2023 paper "MobileBrick: Building LEGO for 3D Reconstruction on Mobile Devices".All links are available in the blog post.In this paper, they proposed a new dataset and paradigm for evaluating 3D object reconstruction. It is very difficult to create a digital twin of 3D objects, even with expensive sensors. They introduce a new RGBD dataset, captured from a mobile device. The nice trick to obtaining the ground truth is that they used LEGO bricks that have an exact CAD model. Kejie is currently a research scientist at ByteDance/ TikTok. When writing this paper he was a postdoc at Oxford. Prior to this, he successfully obtained his PhD from the University of Adelaide. Although we hadn't crossed paths until this episode, we both have some common ground in our CVs, having been affiliated with different nodes of the ACRV (Adelaide for him and ANU for me). I'm excited to see what he comes up with next, and eagerly await his future endeavours.AUTHORSKejie Li, Jia-Wang Bian, Robert Castle, Philip H.S. Torr, Victor Adrian PrisacariuABSTRACTHigh-quality 3D ground-truth shapes are critical for 3D object reconstruction evaluation. However, it is difficult to create a replica of an object in reality, and even 3D reconstructions generated by 3D scanners have artefacts that cause biases in evaluation. To address this issue, we introduce a novel multi-view RGBD dataset captured using a mobile device, which includes highly precise 3D ground-truth annotations for 153 object models featuring a diverse set of 3D structures. We obtain precise 3D ground-truth shape without relying on high-end 3D scanners by utilising LEGO models with known geometry as the 3D structures for image capture. The distinct data modality offered by high-resolution RGB images and low-resolution depth maps captured on a mobile device, when combined with precise 3D geometry annotations, presents a unique opportunity for future research on high-fidelity 3D reconstruction. Furthermore, we evaluate a range of 3D reconstruction algorithms on the proposed dataset.RELATED PAPERS📚COLMAP📚NeRF📚NeuS📚CO3DLINKS AND RESOURCES 📚 Paper 💻Project page 💻Code SPONSOR This episode was sponsored by YOOM. YOOM is an Israeli startup dedicated to volumetric video creation. They were voted as the 2022 best start-up to work for by Dun’s 100.Join their team that works on geometric deep learning research, implicit representations of 3D humans, NeRFs, and 3D/4D generative models.Visit YOOM For job opportunities with YOOM visit https://www.yoom.com/careers/CONTACTIf you would like to be a guest, sponsor or just share your thoughts, feel free to reach out via email: talking.papers.podcast@gmail.comThis episode was recorded on  May 8th,  2023.#talkingpapers #CVPR2023 #NeRF #Dataset #mobilebrick #ComputerVision #AI #NeuS #DeepLearning #MachineLearning #research #artificialintelligence #podcasts🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com 📧Subscribe to our mailing list: http://eepurl.com/hRznqb 🐦Follow us on Twitter: https://twitter.com/talking_papers 🎥YouTube Channel: https://bit.ly/3eQOgwP
IAW Dataset - Jiahao Zhang
May 17 2023
IAW Dataset - Jiahao Zhang
All links are available in the blog post.In this episode of the Talking Papers Podcast, I hosted Jiahao Zhang to chat about our CVPR 2023 paper "Aligning Step-by-Step Instructional Diagrams to Video Demonstrations".furniture assembly diagram. To do that, we collected and annotated a brand new dataset: "IKEA Assembly in the Wild" where we aligned YouTube videos with IKEA's instruction manuals. Our approach to addressing this task proposes several supervised contrastive losses that contrast between video and diagram, video and manual, and internal manual images. Jiahao is currently a PhD student at the Australian National University. His research focus is on human action recognition and multi-modal representation alignment. We first met (virtually) when Jiahao did his Honours project, where he developed an amazing (and super useful) video annotation tool ViDaT. His strong software engineering and web development background gives him a strong advantage when working on his research projects. Even though we never met in person (yet), we are actively collaborating and I already know what he is cooking up next. I hope to share it with the world soon.AUTHORSJiahao Zhang, Anoop Cherian, Yanbin Liu, Yizhak Ben-Shabat, Cristian Rodriguez, Stephen GouldRELATED PAPERS📚IKEA ASM Dataset📚CLIP📚SlowFastLINKS AND RESOURCES📚 Paper💻Project page💻Dataset page💻CodeSPONSOR This episode was sponsored by YOOM. YOOM is an Israeli startup dedicated to volumetric video creation. They were voted as the 2022 best start-up to work for by Dun’s 100.Join their team that works on geometric deep learning research, implicit representations of 3D humans, NeRFs, and 3D/4D generative models.Visit YOOMFor job opportunities with YOOM visit https://www.yoom.com/careers/CONTACTIf you would like to be a guest, sponsor or just share your thoughts, feel free to reach out via email: talking.papers.podcast@gmail.comThis episode was recorded on  May 1st,  2023.#talkingpapers #CVPR2023 #IAWDataset #ComputerVision #AI #ActionRecognition #DeepLearning #MachineLearning #research #artificialintelligence #podcasts🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com 📧Subscribe to our mailing list: http://eepurl.com/hRznqb 🐦Follow us on Twitter: https://twitter.com/talking_papers 🎥YouTube Channel: https://bit.ly/3eQOgwP
INR2Vec - Luca De Luigi
Mar 29 2023
INR2Vec - Luca De Luigi
All links are available in the blog post: https://www.itzikbs.com/inr2vec/In this episode of the Talking Papers Podcast, I hosted Luca De Luigi. We had a great chat about his paper “Deep Learning on Implicit Neural Representations of Shapes”, AKA INR2Vec, published in ICLR 2023 .In this paper, they take implicit neural representations to the next level and use them as input signals for neural networks to solve multiple downstream tasks. The core idea was captured by one of the authors in a very catchy and concise tweet: "Signals are networks so networks are data and so networks can process other networks to understand and generate signals". Luca recently received his PhD from the University of Bolognia and is currently working at a startup based in Bolognia eyecan.ai. His research focus is on neural representations of signals, especially for 3D geometry. To be honest, I knew I wanted to get Luca on the podcast the second I saw the paper on arXiv because I was working on a related topic but had to shelf it due to time management issues. This paper got me excited about that topic again. I didn't know Luca before recording the episode and it was a delight to get to know him and his work.  AUTHORSLuca De Luigi, Adriano Cardace, Riccardo Spezialetti, Pierluigi Zama Ramirez, Samuele Salti, Luigi Di StefanoABSTRACTpes, INRs allow to overcome the fragmentation and shortcomings of the popular discrete representations used so far. Yet, considering that INRs consist in neural networks, it is not clear whether and how it may be possible to feed them into deep learning pipelines aimed at solving a downstream task. In this paper, we put forward this research problem and propose inr2vec, a framework that can compute a compact latent representation for an input INR in a single inference pass. We verify that inr2vec can embed effectively the 3D shapes represented by the input INRs and show how the produced embeddings can be fed into deep learning pipelines to solve several tasks by processing exclusively INRs.RELATED PAPERS📚SIREN📚DeepSDF📚PointNetLINKS AND RESOURCES📚 Paper 💻Project page SPONSOR This episode was sponsored by YOOM. YOOM is an Israeli startup dedicated to volumetric video creation. They were voted as the 2022 best start-up to work for by Dun’s 100.Join their team that works on geometric deep learning research, implicit representations of 3D humans, NeRFs, and 3D/4D generative models.Visit https://www.yoom.com/ For job opportunities with YOOM visit https://www.yoom.com/careers/CONTACTIf you would like to be a guest, sponsor or just share your thoughts, feel free to reach out via email: talking.papers.podcast@gmail.comThis episode was recorded on  March 22,  2023.#talkingpapers #ICLR2023 #INR2Vec  #ComputerVision #AI #DeepLearning #MachineLearning #INR #ImplicitNeuralRepresentation  #research #artificialintelligence #podcasts🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com 📧Subscribe to our mailing list: http://eepurl.com/hRznqb 🐦Follow us on Twitter: https://twitter.com/talking_papers 🎥YouTube Channel: https://bit.ly/3eQOgwP
CLIPasso - Yael Vinker
Mar 13 2023
CLIPasso - Yael Vinker
In this episode of the Talking Papers Podcast, I hosted Yael Vinker. We had a great chat about her paper "CLIPasso: SEmantically-Aware Object Sketching”, SIGGRAPH 2022 best paper award winner. In this paper, they convert images into sketches with different levels of abstraction. They avoid the need for sketch datasets by using the well-known CLIP model to distil the semantic concepts from sketches and images. There is no network training here, just optimizing the control points of Bezier curves to model the sketch strokes (initialized by a saliency map). How is this differentiable? They use a differentiable rasterizer. The degree of abstraction is controlled by the number of strokes. Don't miss the amazing demo they created.Yael is currently a PhD student at Tel Aviv University. Her research focus is on computer vision, machine learning, and computer graphics with a unique twist of combining art and technology. This work was done as part of her internship at EPFLAUTHORSYael Vinker, Ehsan Pajouheshgar, Jessica Y. Bo, Roman Bachmann, Amit Haim Bermano, Daniel Cohen-Or, Amir Zamir, Ariel ShamirABSTRACT Abstraction is at the heart of sketching due to the simple and minimal nature of line drawings. Abstraction entails identifying the essential visual properties of an object or scene, which requires semantic understanding and prior knowledge of high-level concepts. Abstract depictions are therefore challenging for artists, and even more so for machines. We present an object sketching method that can achieve different levels of abstraction, guided by geometric and semantic simplifications. While sketch generation methods often rely on explicit sketch datasets for training, we utilize the remarkable ability of CLIP (Contrastive-Language-Image-Pretraining) to distil semantic concepts from sketches and images alike. We define a sketch as a set of Bézier curves and use a differentiable rasterizer to optimize the parameters of the curves directly with respect to a CLIP-based perceptual loss. The abstraction degree is controlled by varying the number of strokes. The generated sketches demonstrate multiple levels of abstraction while maintaining recognizability, underlying structure, and essential visual components of the subject drawn.RELATED PAPERS📚CLIP: Connecting Text and Images📚Differentiable Vector Graphics Rasterization for Editing and LearningLINKS AND RESOURCES📚 Paper💻Project pageSPONSORThis episode was sponsored by YOOM. YOOM is an Israeli startup dedicated to volumetric video creation. They were voted as the 2022 best start-up to work for by Dun’s 100.Join their team that works on geometric deep learning research, implicit representations of 3D humans, NeRFs, and 3D/4D generative models.Visit YOOM.com.CONTACTIf you would like to be a guest, sponsor or share your thoughts, feel free to reach out via email: talking.papers.podcast@gmail.com🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com 📧Subscribe to our mailing list: http://eepurl.com/hRznqb 🐦Follow us on Twitter: https://twitter.com/talking_papers 🎥YouTube Channel: https://bit.ly/3eQOgwP
Random Walks for Adversarial Meshes - Amir Belder
Dec 14 2022
Random Walks for Adversarial Meshes - Amir Belder
All links are available in the blog post. In this episode of the Talking Papers Podcast, we hosted Amir Belder. We had a great chat about his paper "Random Walks for Adversarial Meshes”, published in SIGGRAPH 2022. In this paper, they take on the task of creating an adversarial attack for triangle meshes. This is a non-trivial task since meshes are irregular. To solve the irregularity they use random walks instead of the raw mesh. On top of that, they trained an imitating network that mimics the predictions of the attacked network and used the gradients to perturb the input points. Amir is currently a PhD student at the Computer Graphics and Multimedia Lab at the Technion Israel Institute of Technology.  His research focus is on computer graphics and geometric processing and machine learning. We spend a lot of time together at the lab and chat often about science, papers and where the field is headed. Having this paper published was a great opportunity to share one of these conversations with you.AUTHORS Amir Belder, Gal Yefet, Ran Ben-Itzhak, Ayellet TalABSTRACT have recently emerged as a useful representation for 3D shapes. These fields are We A polygonal mesh is the most-commonly used representation of surfaces in computer graphics. Therefore, it is not surprising that a number of mesh classification networks have recently been proposed. However, while adversarial attacks are wildly researched in 2D, the field of adversarial meshes is under explored. This paper proposes a novel, unified, and general adversarial attack, which leads to misclassification of several state-of-the-art mesh classification neural networks. Our attack approach is black-box, i.e. it has access only to the network’s predictions, but not to the network’s full architecture or gradients. The key idea is to train a network to imitate a given classification network. This is done by utilizing random walks along the mesh surface, which gather geometric information. These walks provide insight onto the regions of the mesh that are important for the correct prediction of the given classification network. These mesh regions are then modified more than other regions in order to attack the network in a manner that is barely visible to the naked eye.RELATED PAPERS📚Explaining and Harnessing Adversarial Examples📚Meshwalker: Deep mesh understanding by random walksLINKS AND RESOURCES📚 Paper 💻Code  To stay up to date with Amir's latest research, follow him on:🐦Twitter 👨🏻‍🎓Google Scholar👨🏻‍🎓LinkedIn CONTACTIf you would like to be a guest, sponsor or just share your thoughts, feel free to reach out via email: talking.papers.podcast@gmail.comThis episode was recorded on November 23rd 2022.#talkingpapers #SIGGRAPH2022 #RandomWalks #MeshWalker #AdversarialAttacks #Mesh #ComputerVision #AI #DeepLearning #MachineLearning #ComputerGraphics  #research #artificialintelligence #podcasts🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com 📧Subscribe to our mailing list: http://eepurl.com/hRznqb 🐦Follow us on Twitter: https://twitter.com/talking_papers 🎥YouTube Channel: https://bit.ly/3eQOgwP
SPSR - Silvia Sellán
Dec 6 2022
SPSR - Silvia Sellán
In this episode of the Talking Papers Podcast, I hosted Silvia Sellán. We had a great chat about her paper "Stochastic Poisson Surface Reconstruction”, published in SIGGRAPH Asia 2022. In this paper, they take on the task of surface reconstruction with a probabilistic twist. They take the well-known Poisson Surface reconstruction algorithm and generalize it to give it a full statistical formalism. Essentially their method quantifies the uncertainty of surface reconstruction from a point cloud. Instead of outputting an implicit function, they represent the shape as a modified Gaussian process. This unique perspective and interpretation enables conducting statistical queries, for example, given a point, is it on the surface? is it inside the shape?Silvia is currently a PhD student at the University of Toronto. Her research focus is on computer graphics and geometric processing. She is a Vanier Doctoral Scholar, an Adobe Research Fellow and the winner of the 2021 UoFT FAS Deans Doctoral excellence scholarship. I have been following Silvia's work for a while and since I have some work on surface reconstruction when SPSR came out, I knew I wanted to host her on the podcast (and gladly she agreed). Silvia is currently looking for postdoc and faculty positions to start in the fall of 2024. I am really looking forward to seeing which institute snatches her. In our conversation, I particularly liked her explanation of Gaussian Processes with the example "How long does it take my supervisor to answer an email as a function of the time of day the email was sent", You can't read that in any book. But also, we took an unexpected pause from the usual episode structure to discuss the question of "papers" as a medium for disseminating research. Don't miss it.AUTHORSSilvia Sellán, Alec JacobsonABSTRACT shapes from 3D point clouds. Instead of outputting an implicit function, we represent the reconstructed shape as a modified Gaussian Process, which allows us to conduct statistical queries (e.g., the likelihood of a point in space being on the surface or inside a solid). We show that this perspective: improves PSR's integration into the online scanning process, broadens its application realm, and opens the door to other lines of research such as applying task-specific priors. RELATED PAPERS📚Poisson Surface Reconstruction📚Geometric Priors for Gaussian Process Implicit Surfaces📚Gaussian processes for machine learningLINKS AND RESOURCES📚 Paper💻Project pageTo stay up to date with Silvia's latest research, follow him on:🐦Twitter👨🏻‍🎓Google Scholar🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com 📧Subscribe to our mailing list: http://eepurl.com/hRznqb 🐦Follow us on Twitter: https://twitter.com/talking_papers 🎥YouTube Channel: https://bit.ly/3eQOgwP
Beyond Periodicity - Sameera Ramasinghe
Nov 15 2022
Beyond Periodicity - Sameera Ramasinghe
In this episode of the Talking Papers Podcast, I hosted Sameera Ranasinghe. We had a great chat about his paper "Beyond Periodicity: Towards a Unifying Framework for Activations in Coordinate-MLPs”, published in ECCV 2022 as an oral presentation. In this paper, they propose a new family of activation functions for coordinate MLPs and provide a theoretical analysis of their effectiveness. Their main proposition is that the stable rank is a good measure and design tool for such activation functions. They show that their proposed activations outperform the traditional ReLU and Sine activations for image parametrization and novel view synthesis. They further show that while the proposed family of activations does not require positional encoding they can benefit from using it by reducing the number of layers significantly.Sameera is currently an applied scientist at Amazon and the CTO and co-founder of ConscientAI. His research focus is theoretical machine learning and computer vision. This work was done when he was a postdoc at the Australian Institute of Machine Learning (AIML). He completed his PhD at the Australian National University (ANU). We first met back in 2019 when I was a research fellow at ANU and he was still doing his PhD. I immediately noticed we share research interests and after a short period of time, I flagged him as a rising star in the field. It was a pleasure to chat with Sameera and I am looking forward to reading his future papers. AUTHORSSameera Ramasinghe, Simon LuceyRELATED PAPERS📚NeRF📚SIREN📚"Fourier Features Let Networks Learn High-Frequency Functions in Low Dimensional Domains" 📚On the Spectral Bias of Neural NetworksLINKS AND RESOURCES📚 Paper💻CodeTo stay up to date with Marko's latest research, follow him on:🐦Twitter👨🏻‍🎓Google Scholar👨🏻‍🎓LinkedInRecorded on November 14th 2022.🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com 📧Subscribe to our mailing list: http://eepurl.com/hRznqb 🐦Follow us on Twitter: https://twitter.com/talking_papers 🎥YouTube Channel: https://bit.ly/3eQOgwP
KeypointNeRF - Marko Mihajlovic
Oct 19 2022
KeypointNeRF - Marko Mihajlovic
In this episode of the Talking Papers Podcast, I hosted Marko Mihajlovic . We had a great chat about his paper "KeypointNeRF: Generalizing Image-based Volumetric Avatars using Relative Spatial Encoding of Keypoints”, published in ECCV 2022. In this paper, they create a generalizable NeRF for virtual avatars. To get a high-fidelity reconstruction of humans (from sparse observations), they leverage an off-the-shelf keypoint detector in order to condition the NeRF.  Marko is a 2nd year PhD student at ETH, supervised by Siyu Tang. His research focuses on photorealistic reconstruction of static and dynamic scenes and also modeling of parametric human bodies. This work was done mainly during his internship at Meta Reality Labs. Marko and I met at CVPR 2022.  AUTHORSMarko Mihajlovic, Aayush Bansal, Michael Zollhoefer, Siyu Tang, Shunsuke SaitoABSTRACT Neural implicit fields have recently emerged as a useful representation for 3D shapes. These fields are Coordinate-based networks have emerged as a powerful tool for 3D representation and scene reconstruction. These networks are trained to map continuous input coordinates to the value of a signal at each point. Still, current architectures are black boxes: their spectral characteristics cannot be easily Image-based volumetric humans using pixel-aligned features promise generalization to unseen poses and identities. Prior work leverages global spatial encodings and multi-view geometric consistency to reduce spatial ambiguity. However, global encodings often suffer from overfitting to the distribution of the training data, and it is difficult to learn multi-view consistent reconstruction from sparse views. In this work, we investigate common issues with existing spatial encodings and propose a simple yet highly effective approach to modeling high-fidelity volumetric humans from sparse views. One of the key ideas is to encode relative spatial 3D information via sparse 3D keypoints. This approach is robust to the sparsity of viewpoints and cross-dataset domain gap. Our approach outperforms state-of-the-art methods for head reconstruction. On human body reconstruction for unseen subjects, we also achieve performance comparable to prior work that uses a parametric human body model and temporal feature aggregation. Our experiments show that a majority of errors in prior work stem from an inappropriate choice of spatial encoding and thus we suggest a new direction for high-fidelity image-based human modeling. RELATED PAPERS📚NeRF 📚IBRNet 📚PIFuLINKS AND RESOURCES 💻Project website 📚 Paper 💻Code 🎥Video To stay up to date with Marko's latest research, follow him on: 👨🏻‍🎓Personal Page🐦Twitter👨🏻‍🎓Google Scholar CONTACTIf you would like to be a guest, sponsor or just share your thoughts, feel free to reach out via email: talking.papers.podcast@gmail🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com 📧Subscribe to our mailing list: http://eepurl.com/hRznqb 🐦Follow us on Twitter: https://twitter.com/talking_papers 🎥YouTube Channel: https://bit.ly/3eQOgwP
BACON - David Lindell
Aug 9 2022
BACON - David Lindell
In this episode of the Talking Papers Podcast, I hosted David B. Lindell to chat about his paper "BACON: Band-Limited Coordinate Networks for Multiscale Scene Representation”, published in CVPR 2022.  In this paper, they took on training a coordinate network. They do this by introducing a new type of neural network architecture that has an analytical Fourier spectrum. This allows them to do things like multi-scale signal representation, and, it gives an interpretable architecture, with an explicitly controllable bandwidth. David recently completed his Postdoc at Stanford and will join the University of Toronto as an Assistant Professor. During our chat, I got to know a stellar academic with a unique view of the field and where it is going. We even got to meet in person at CVPR. I am looking forward to seeing what he comes up with next. It was a pleasure having him on the podcast. AUTHORSDavid B. Lindell, Dave Van Veen, Jeong Joon Park, Gordon WetzsteinABSTRACT Neural implicit fields have recently emerged as a useful representation for 3D shapes. These fields are Coordinate-based networks have emerged as a powerful tool for 3D representation and scene reconstruction. These networks are trained to map continuous input coordinates to the value of a signal at each point. Still, current architectures are black boxes: their spectral characteristics cannot be easily analyzed, and their behavior at unsupervised points is difficult to predict. Moreover, these networks are typically trained to represent a signal at a single scale, so naive downsampling or upsampling results in artifacts. We introduce band-limited coordinate networks (BACON), a network architecture with an analytical Fourier spectrum. BACON has constrained behavior at unsupervised points, can be designed based on the spectral characteristics of the represented signal, and can represent signals at multiple scales without per-scale supervision. We demonstrate BACON for multiscale neural representation of images, radiance fields, and 3D scenes using signed distance functions and show that it outperforms conventional single-scale coordinate networks in terms of interpretability and quality. RELATED PAPERS📚SIREN📚Multiplicative Filter Networks (MFN)📚Mip-Nerf📚Followup work: Residual MFNLINKS AND RESOURCES💻Project website📚 Paper💻Code🎥VideoTo stay up to date with David's latest research, follow him on:👨🏻‍🎓Personal Page🐦Twitter👨🏻‍🎓Google Scholar👨🏻‍🎓LinkedInRecorded on June 15th 2022.CONTACTIf you would like to be a guest, sponsor or just share your thoughts, feel free to reach out via email: talking.papers.podcast@gmail.comSUBSCRI🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com 📧Subscribe to our mailing list: http://eepurl.com/hRznqb 🐦Follow us on Twitter: https://twitter.com/talking_papers 🎥YouTube Channel: https://bit.ly/3eQOgwP
Lipschitz MLP - Hsueh-Ti Derek Liu
Jul 19 2022
Lipschitz MLP - Hsueh-Ti Derek Liu
In this episode of the Talking Papers Podcast, I hosted Hsueh-Ti Derek Liu to chat about his paper "Learning Smooth Neural Functions via Lipschitz Regularization”, published in SIGGRAPH 2022. In this paper, they took on the unique task of enforcing smoothness on Neural Fields (modelled as a neural network). They do this by introducing a regularization term that forces the Lipschitz constant of the network to be very small. They show the performance of their method on shape interpolation, extrapolation and partial shape reconstruction from 3D point clouds. I mostly like the fact that it is implemented in only 4 lines of code. Derek will soon complete his PhD at the University of Toronto and will start a research scientist position at Roblox Research. This work was done when he was interning at NVIDIA. During our chat, I had the pleasure to discover that Derek is one of the few humans on the plant that has the ability to take a complicated idea and explain it in a simple and easy-to-follow way. His strong background in geometry processing makes this paper, which is well within the learning domain, very unique in the current paper landscape. It was a pleasure recording this episode with him. AUTHORSHsueh-Ti Derek Liu, Francis Williams, Alec Jacobson, Sanja Fidler, Or LitanyABSTRACTNeural implicit fields have recently emerged as a useful representation for 3D shapes. These fields are commonly represented as neural networks which map latent descriptors and 3D coordinates to implicit function values. The latent descriptor of a neural field acts as a deformation handle for the 3D shape it represents. Thus, smoothness with respect to this descriptor is paramount for performing shape-editing operations. In this work, we introduce a novel regularization designed to encourage smooth latent spaces in neural fields by penalizing the upper bound on the field's Lipschitz constant. Compared with prior Lipschitz regularized networks, ours is computationally fast, can be implemented in four lines of code, and requires minimal hyperparameter tuning for geometric applications. We demonstrate the effectiveness of our approach on shape interpolation and extrapolation as well as partial shape reconstruction from 3D point clouds, showing both qualitative and quantitative improvements over existing state-of-the-art and non-regularized baselines.RELATED PAPERS📚DeepSDF📚Neural Fields (collection of works)📚Sorting Out Lipschitz Function ApproximationLINKS AND RESOURCES💻Project website📚 Paper💻CodeTo stay up to date with Derek's latest research, follow him on:👨🏻‍🎓Personal Page🐦Twitter👨🏻‍🎓Google ScholarRecorded on May 30th 2022.CONTACTIf you would like to be a guest, sponsor or just share your thoughts, feel free to reach out via email🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com 📧Subscribe to our mailing list: http://eepurl.com/hRznqb 🐦Follow us on Twitter: https://twitter.com/talking_papers 🎥YouTube Channel: https://bit.ly/3eQOgwP
DiGS - Chamin Hewa Koneputugodage
Jun 14 2022
DiGS - Chamin Hewa Koneputugodage
In this episode of the Talking Papers Podcast, I hosted Chamin Hewa Koneputugodage to chat about OUR paper "DiGS: Divergence guided shape implicit neural representation for unoriented point clouds”, published in CVPR 2022. In this paper, we took on the task of surface reconstruction using a novel divergence-guided approach.  Unlike previous methods, we do not use normal vectors for supervision. To compensate for that, we add a divergence minimization loss as a regularize to get a coarse shape and then anneal it as training progresses to get finer detail. Additionally, we propose two new geometric initialization for SIREN-based networks that enable learning shape spaces.  PAPER TITLE "DiGS: Divergence guided shape implicit neural representation for unoriented point clouds"  AUTHORSYizhak Ben-Shabat, Chamin Hewa Koneputugodage, Stephen GouldABSTRACTShape implicit neural representations (INR) have recently shown to be effective in shape analysis and reconstruction tasks. Existing INRs require point coordinates to learn the implicit level sets of the shape. When a normal vector is available for each point, a higher fidelity representation can be learned, however normal vectors are often not provided as raw data. Furthermore, the method's initialization has been shown to play a crucial role for surface reconstruction. In this paper, we propose a divergence guided shape representation learning approach that does not require normal vectors as input. We show that incorporating a soft constraint on the divergence of the distance function favours smooth solutions that reliably orients gradients to match the unknown normal at each point, in some cases even better than approaches that use ground truth normal vectors directly. Additionally, we introduce a novel geometric initialization method for sinusoidal INRs that further improves convergence to the desired solution. We evaluate the effectiveness of our approach on the task of surface reconstruction and shape space learning and show SOTA performance compared to other unoriented methods.RELATED PAPERS📚 DeepSDF  📚 SIREN LINKS AND RESOURCES💻 Project Page 💻 Code  🎥 5 min videoTo stay up to date with Chamin's latest research, follow him on:🐦 Twitter 👨🏻‍🎓LinkedInRecorded on April 1st 2022.CONTACTIf you would like to be a guest, sponsor or just share your thoughts, feel free to reach out via email: talking.papers.podcast@gmail.comSUBSCRIBE AND FOLLOW🎧Subscribe on your favourite podcast app📧Subscribe to our mailing list🐦Follow us on Twitter🎥YouTube Channel#talkingpapers #CVPR2022 #DiGS #NeuralImplicitRepresentation #SurfaceReconstruction #ShapeSpace #3DVision #ComputerVision #AI #DeepLearning #MachineLearning  #deeplearning #A🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com 📧Subscribe to our mailing list: http://eepurl.com/hRznqb 🐦Follow us on Twitter: https://twitter.com/talking_papers 🎥YouTube Channel: https://bit.ly/3eQOgwP
Dejan Azinović - Neural RGBD Surface Reconstruction
May 6 2022
Dejan Azinović - Neural RGBD Surface Reconstruction
In this episode of the Talking Papers Podcast, I hosted Dejan Azinović to chat about his paper "Neural RGB-D Surface Reconstruction”, published in CVPR 2022. In this paper, they take on the task of RGBD surface reconstruction by using novel view synthesis.  They incorporate depth measurements into the radiance field formulation by learning a neural network that stores a truncated signed distance field. This formulation is particularly useful in regions where depth is missing and the color information can help fill in the gaps. PAPER TITLE "Neural RGB-D Surface Reconstruction"  AUTHORSDejan Azinović Ricardo Martin-Brualla Dan B Goldman Matthias Nießner Justus ThiesABSTRACTIn this work, we explore how to leverage the success of implicit novel view synthesis methods for surface reconstruction. Methods which learn a neural radiance field have shown amazing image synthesis results, but the underlying geometry representation is only a coarse approximation of the real geometry. We demonstrate how depth measurements can be incorporated into the radiance field formulation to produce more detailed and complete reconstruction results than using methods based on either color or depth data alone. In contrast to a density field as the underlying geometry representation, we propose to learn a deep neural network which stores a truncated signed distance field. Using this representation, we show that one can still leverage differentiable volume rendering to estimate color values of the observed images during training to compute a reconstruction loss. This is beneficial for learning the signed distance field in regions with missing depth measurements. Furthermore, we correct for misalignment errors of the camera, improving the overall reconstruction quality. In several experiments, we show-cast our method and compare to existing works on classical RGB-D fusion and learned representations.RELATED PAPERS📚 NeRF 📚 BundleFusion LINKS AND RESOURCES💻 Project Page  💻 Code  To stay up to date with Dejan's latest research, follow him on:👨🏻‍🎓 Dejan's personal page🎓 Google Scholar🐦 Twitter👨🏻‍🎓LinkedIn:Recorded on April 4th 2022.CONTACTIf you would like to be a guest, sponsor or just share your thoughts, feel free to reach out via email: talking.papers.podcast@gmail.comSUBSCRIBE AND FOLLOW🎧Subscribe on your favourite podcast app📧Subscribe to our mailing list 🐦Follow us on Twitter 🎥YouTube Channel#talkingpapers #CVPR2022 #NeuralRGBDSurfaceReconstruction #SurfaceReconstruction #NeRF  #3DVision #ComputerVision #AI #DeepLearning #MachineLearning  #deeplearning #AI #neuralnetworks #research  #artificialintelligence🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com 📧Subscribe to our mailing list: http://eepurl.com/hRznqb 🐦Follow us on Twitter: https://twitter.com/talking_papers 🎥YouTube Channel: https://bit.ly/3eQOgwP
Yuliang Xiu - ICON
Apr 19 2022
Yuliang Xiu - ICON
In this episode of the Talking Papers Podcast, I hosted Yuliang Xiu to chat about his paper "ICON: Implicit Clothed humans Obtained from Normals”, published in CVPR 2022. SMPL(-X) body model to infer clothed humans (conditioned on the normals).  Additionally, they propose an inference-time feedback loop that alternates between refining the body's normals and the shape. PAPER TITLE "ICON: Implicit Clothed humans Obtained from Normals"  https://bit.ly/3uXe6YwAUTHORSYuliang Xiu, Jinlong Yang, Dimitrios Tzionas, Michael J. BlackABSTRACTCurrent methods for learning realistic and animatable 3D clothed avatars need either posed 3D scans or 2D images with carefully controlled user poses. In contrast, our goal is to learn an avatar from only 2D images of people in unconstrained poses. Given a set of images, our method estimates a detailed 3D surface from each image and then combines these into an animatable avatar. Implicit functions are well suited to the first task, as they can capture details like hair and clothes. Current methods, however, are not robust to varied human poses and often produce 3D surfaces with broken or disembodied limbs, missing details, or non-human shapes. The problem is that these methods use global feature encoders that are sensitive to global pose. To address this, we propose ICON ("Implicit Clothed humans Obtained from Normals"), which, instead, uses local features. ICON has two main modules, both of which exploit the SMPL(-X) body model. First, ICON infers detailed clothed-human normals (front/back) conditioned on the SMPL(-X) normals. Second, a visibility-aware implicit surface regressor produces an iso-surface of a human occupancy field. Importantly, at inference time, a feedback loop alternates between refining the SMPL(-X) mesh using the inferred clothed normals and then refining the normals. Given multiple reconstructed frames of a subject in varied poses, we use SCANimate to produce an animatable avatar from them. Evaluation on the AGORA and CAPE datasets shows that ICON outperforms the state of the art in reconstruction, even with heavily limited training data. Additionally, it is much more robust to out-of-distribution samples, e.g., in-the-wild poses/images and out-of-frame cropping. ICON takes a step towards robust 3D clothed human reconstruction from in-the-wild images. This enables creating avatars directly from video with personalized and natural pose-dependent cloth deformation.RELATED PAPERS📚 Monocular Real-Time Volumetric Performance Capture https://bit.ly/3L2S4JF📚 PIFu https://bit.ly/3rBsrYN📚 PIFuHD https://bit.ly/3rymDiELINKS AND RESOURCES💻 Project Page https://icon.is.tue.mpg.de/💻 Code  https://github.com/yuliangxiu/ICONTo stay up to date with Yulian'gs latest research, follow him on:👨🏻‍🎓 Yuliang's personal page:  https://bit.ly/3jQb16n🎓 Google Scholar:  https://bit.ly/3JW25ae🐦 Twitter:  https://twitter.com/yuliangxiu👨🏻‍🎓LinkedIn: https://www.linkedin.com/in/yuliangxiu/Recorded on March11th 2022.CONTACTIf you would like to be a guest, sponsor or just share your thoughts, feel free to reach out via email: talking.papers.podcast@gmail.comSUBSCRIBE AND FOLLOW🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikb...📧Subscribe to our mailing list: http://eepurl.com/hRznqb🐦Follow us on Twitter: https://twitter.com/talking_papers🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com 📧Subscribe to our mailing list: http://eepurl.com/hRznqb 🐦Follow us on Twitter: https://twitter.com/talking_papers 🎥YouTube Channel: https://bit.ly/3eQOgwP
Itai Lang - SampleNet
Mar 28 2022
Itai Lang - SampleNet
In this episode of the Talking Papers Podcast, I hosted Itai Lang to chat about his paper "SampleNet: Differentiable Point Cloud Sampling”, published in CVPR 2020. In this paper, they propose a point soft-projection to allow differentiating through the sampling operation and enable learning task-specific point sampling. Combined with their regularization and task-specific losses, they can reduce the number of points to 3% of the original samples with a very low impact on task performance. I met Itai for the first time at CVPR 2019.  Being a point-cloud guy myself, I have been following his research work ever since. It is amazing how much progress he has made and I can't wait to see what he comes up with next. It was a pleasure hosting him in the podcast. PAPER TITLE "SampleNet: Differentiable Point Cloud Sampling"  https://bit.ly/3wMFwllAUTHORSItai Lang, Asaf Manor, Shai AvidanABSTRACTand offered a workaround instead. We introduce a novel differentiable relaxation for point cloud sampling that approximates sampled points as a mixture of points in the primary input cloud. Our approximation scheme leads to consistently good results on classification and geometry reconstruction applications. We also show that the proposed sampling method can be used as a front to a point cloud registration network. This is a challenging task since sampling must be consistent across two different point clouds for a shared downstream task. In all cases, our approach outperforms existing non-learned and learned sampling alternatives. Our code is publicly available.RELATED PAPERS📚 Learning to Sample https://bit.ly/3vd1FZd📚 Farthest Point Sampling (FPS)  https://bit.ly/3Lkcyx9LINKS AND RESOURCES💻 Code  https://bit.ly/3NoS0pbTo stay up to date with Itai's latest research, follow him on:🎓 Google Scholar: https://bit.ly/3wCMY2u🐦 Twitter: https://twitter.com/ItaiLangRecorded on February 15th 2022.CONTACTIf you would like to be a guest, sponsor or just share your thoughts, feel free to reach out via email: talking.papers.podcast@gmail.comSUBSCRIBE AND FOLLOW🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikb...📧Subscribe to our mailing list: http://eepurl.com/hRznqb🐦Follow us on Twitter: https://twitter.com/talking_papers🎥YouTube Channel: https://bit.ly/3eQOgwPThis episode was recorded on February 11 2022.#talkingpapers #SampleNet #LearnToSample #CVPR2020 #3DVision #ComputerVision #AI #DeepLearning #MachineLearning  #deeplearning #AI #neuralnetworks #research  #artificialintelligence🎧Subscribe on your favourite podcast app: https://talking.papers.podcast.itzikbs.com 📧Subscribe to our mailing list: http://eepurl.com/hRznqb 🐦Follow us on Twitter: https://twitter.com/talking_papers 🎥YouTube Channel: https://bit.ly/3eQOgwP