George Paschalidis

Publications

SDFit: 3D Object Pose and Shape by Fitting a Morphable SDF to a Single Image

International Conference on Computer Vision (ICCV), 2025

Dimitrije Antić, Georgios Paschalidis, Shashank Tripathi, Theo Gevers, Sai KUmar Dwivedi, Dimitrios Tzionas,

Abstract Project page Paper Code Video Bibtex

Recovering 3D object pose and shape from a single image is a challenging and
ill-posed problem. This is due to strong (self-)occlusions, depth ambiguities,
the vast intra- and inter-class shape variance, and the lack of 3D ground truth
for natural images. Existing deep-network methods are trained on synthetic datasets
to predict 3D shapes, so they often struggle generalizing to real-world images.
Moreover, they lack an explicit feedback loop for refining noisy estimates, and
primarily focus on geometry without directly considering pixel alignment. To tackle
these limitations, we develop a novel render-and-compare optimization framework,
called SDFit. This has three key innovations: First, it uses a learned category-specific
and morphable signed-distance-function (mSDF) model, and fits this to an image by
iteratively refining both 3D pose and shape. The mSDF robustifies inference by
constraining the search on the manifold of valid shapes, while allowing for arbitrary
shape topologies. Second, SDFit retrieves an initial 3D shape that likely matches
the image, by exploiting foundational models for efficient look-up into 3D shape
databases. Third, SDFit initializes pose by establishing rich 2D-3D correspondences
between the image and the mSDF through foundational features. We evaluate SDFit on
three image datasets, i.e., Pix3D, Pascal3D+, and COMIC. SDFit performs on par with
SotA feed-forward networks for unoccluded images and common poses, but is uniquely
robust to occlusions and uncommon poses. Moreover, it requires no retraining for
unseen images. Thus, SDFit contributes new insights for generalizing in the wild.

                                        @inproceedings{antic2025sdfit,
                                          title     = {{SDFit}: {3D} Object Pose and Shape by Fitting a Morphable {SDF} to a Single Image},
                                          author    = {Anti\'{c}, Dimitrije and Paschalidis, Georgios and Tripathi, Shashank and Gevers, Theo and Dwivedi, Sai Kumar and Tzionas, Dimitrios},
                                          booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
                                          month     = {October},
                                          year      = {2025},
                                        }

3D Whole-Body Grasp Synthesis with Directional Controllability

International Conference on 3D Vision (3DV), 2025

Georgios Paschalidis, Romana Wilschut, Dimitrije Antić, Omid Taheri, Dimitrios Tzionas,

Abstract Project page Paper Code Video Bibtex

Synthesizing 3D whole bodies that realistically grasp objects is useful for
animation, mixed reality, and robotics. This is challenging, because the hands
and body need to look natural w.r.t. each other, the grasped object, as well as
the local scene (i.e., a receptacle supporting the object). Moreover, training
data for this task is really scarce, while capturing new data is expensive.
Recent work goes beyond finite datasets via a divide-and-conquer approach; it
first generates a “guiding” right-hand grasp, and then searches for bodies that
match this. However, the guiding-hand synthesis lacks controllability and
receptacle awareness, so it likely has an implausible direction (i.e., a body
can’t match this without penetrating the receptacle) and needs corrections
through major post-processing. Moreover, the body search needs exhaustive
sampling and is expensive. These are strong limitations. We tackle these with a
novel method called CWGrasp. Our key idea is that performing geometry-based
reasoning “early on,” instead of “too late,” provides rich “control” signals
for inference. To this end, CWGrasp first samples a plausible
reaching-direction vector (used later for both the arm and hand) from a
probabilistic model built via ray-casting from the object and collision
checking. Then, it generates a reaching body with a desired arm direction, as
well as a “guiding” grasping hand with a desired palm direction that complies
with the arm’s one. Eventually, CWGrasp refines the body to match the “guiding”
hand, while plausibly contacting the scene. Notably, generating
already-compatible “parts” greatly simplifies the “whole”. Moreover, CWGrasp
uniquely tackles both right and left-hand grasps. We evaluate on the GRAB and
ReplicaGrasp datasets. CWGrasp outperforms baselines, at lower runtime and
budget, while all components help performance. Code and models are available for
for research.

                                        @inproceedings{paschalidis2025cwgrasp,
                                        title={{3D} {W}hole-Body Grasp Synthesis with Directional Controllability},
                                        author={Paschalidis, Georgios and Wilschut, Romana and Anti\'{c}, Dimitrije and Taheri, Omid and Tzionas, Dimitrios},
                                        booktitle = {{International Conference on 3D Vision (3DV)}},
                                        year={2025}
                                        }

Publications

Contact