DiffCloud: Real-to-Sim from Point Clouds with Differentiable Simulation and Rendering of Deformable Objects

1Stanford University,

TLDR: We introduce DiffCloud which combines differentiable simulation with differentiable rendering for real-to-sim parameter estimation of deformable objects from point clouds.



Abstract

Research in manipulation of deformable objects is typically conducted on a limited range of scenarios, because handling each scenario on hardware takes significant effort. Realistic simulators with support for various types of deformations and interactions have the potential to speed up experimentation with novel tasks and algorithms. However, for highly deformable objects it is challenging to align the output of a simulator with the behavior of real objects. Manual tuning is not intuitive, hence automated methods are needed. We view this alignment problem as a joint perception-inference challenge and demonstrate how to use recent neural network architectures to successfully perform simulation parameter inference from real point clouds. We analyze the performance of various architectures, comparing their data and training requirements. Furthermore, we propose to leverage differentiable point cloud sampling and differentiable simulation to significantly reduce the time to achieve the alignment. We employ an efficient way to propagate gradients from point clouds to simulated meshes and further through to the physical simulation parameters, such as mass and stiffness. Experiments with highly deformable objects show that our method can achieve comparable or better alignment with real object behavior, while reducing the time needed to achieve this by more than an order of magnitude.


Method Overview

  • Description:










Experiments

We evaluate DiffCloud on its ability to recover the parameters of both real and simulated deformable objects. Our method directly infers deformable simulation parameters—mass and stiffness—from point cloud sequences.


We benchmark against inverse models that treat the simulator as a non-differentiable black box—used solely as a data engine to generate synthetic point cloud sequences paired with ground-truth parameters. These models require offline dataset generation and supervised training to learn to infer object parameters.


Baselines Overview

Method Simulator Access Label-Free Point Cloud Input Optimization Strategy
MLP Black-box Single frame Direct regression
PointNet++ Black-box Single frame Direct regression
MeteorNet Black-box Full sequence Direct regression
DiffCloud (Ours) Differentiable Full sequence Gradient-based


MLP, PointNet++, MeteorNet: Networks trained via offline supervised learning on synthetically generated data which learn to directly regress the parameters of deformable items from point cloud observations.
DiffCloud: Infers object parameters on-the-fly using differentiable simulation and differentiable rendering, without any offline data generation or training. It directly optimizes parameters to align the simulated object's visual appearance with observed point clouds.



Real-to-Sim Parameter Estimation

We evaluate DiffCloud on two real-world cloth manipulation scenarios using a Kinova Gen3 robot, treating them as sources of point cloud observations from which DiffCloud infers the underlying physical parameters of the cloth: Lift and Fold. The fold task is especially challenging due to self-occlusion and partial observability. Each scenario is performed on 5 different cloth types—ranging from highly deformable to stiff. For each cloth, we collect 3 real trajectories and run all methods to recover the cloth parameters given real point cloud observations.

Real-World Data Collection

We collect real point cloud observations of cloth manipulation scenarios using a Kinova Gen3 arm and dual RealSense cameras.

Lift

Fold

Differentiable Simulation Scenarios

We use differentiable simulation environments built with DiffSim to model cloth deformation during Lift and Fold tasks. Each scenario uses a 49-vertex triangular mesh and keyframed end-effector motion over 25 steps to enable gradient-based parameter optimization.

Qualitative Results

DiffCloud recovers mass/stiffness parameters that visually explain the observed point cloud observations and cloth behavior.

Quantitative Results

DiffCloud achieves lower or comparable Chamfer loss than all baselines while recovering intuitive physical parameters (e.g., high mass for heavy cloths). It runs in ~10 minutes per trajectory, compared to hours of dataset generation and training for baselines.



Sim-to-Sim Parameter Estimation

We additionally evaluate DiffCloud in simulation using two contact-rich deformable tasks:


  • Band Stretch: Pulling an elastic band tight against a pole, which tests tension response.
  • Vest Hang: Hanging a cloth vest on a pole, which tests collapse and contact dynamics.

Across 10 target runs per scenario, DiffCloud consistently recovers simulation parameters that align with ground truth, achieving sub-threshold Chamfer loss in all cases. Baselines succeed on only 50–80% of runs despite using noise-free input.

Differentiable Simulation Scenarios

We use differentiable simulation to model complex cloth-object interactions in the Band Stretch and Vest Hang scenarios. These tasks feature contact-rich dynamics and are used to evaluate how well parameter optimization generalizes to varied deformation behaviors beyond simple lifting and folding.

Band Stretch

Vest Hang

Qualitative Results

In the Band Stretch and Vest Hang scenarios, DiffCloud starts from a standard initial guess (black) and optimizes simulation parameters to produce cloth deformations (blue) that closely match the target point cloud observations (green).

Quantitative Results

We evaluate DiffCloud and baseline inverse models on 10 held-out simulation runs for each scenario. A prediction is considered successful if the Chamfer distance between the simulated and target point clouds falls below a predefined threshold. DiffCloud achieves alignment in 100% of runs for both Band Stretch and Vest Hang, while baseline regressors achieve success in only 50–80% of runs across scenarios.



Compute Efficiency

One of the key advantages of DiffCloud is its speed. While supervised baselines require hours of dataset generation and training, DiffCloud directly optimizes simulation parameters in minutes using differentiable simulation.


  • DiffCloud: ~10 minutes per real trajectory (Lift/Fold), ~5 minutes per simulated trajectory (Vest/Band).
  • Baselines: >2 hours for dataset generation + 40–120 minutes for training per task.

The chart below compares end-to-end runtime per method (data collection + training + inference for baselines vs. optimization for DiffCloud).

Left: Total compute time for parameter inference on real data.
Right: Final parameters inferred by DiffCloud across cloth types.


BibTex

BibTex


@inproceedings{sundaresan2022diffcloud,
  title={Diffcloud: Real-to-sim from point clouds with differentiable simulation and rendering of deformable objects},
  author={Sundaresan, Priya and Antonova, Rika and Bohg, Jeannette},
  booktitle={2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
  pages={10828--10835},
  year={2022},
  organization={IEEE}
}