Exp-Force: Experience-Conditioned Pre-Grasp
Force Selection with Vision-Language Models
|
Accurate pre-contact grasp force selection is critical for safe and reliable robotic manipulation. Adaptive controllers regulate force after contact but still require a reasonable initial estimate. Starting a grasp with too little force requires reactive adjustment, while starting a grasp with too high a force risks damaging fragile objects. This trade-off is particularly challenging for compliant grippers, whose contact mechanics are difficult to model analytically. We propose Exp-Force, an experience-conditioned framework that predicts the minimum feasible grasping force from a single RGB image. The method retrieves a small set of relevant prior grasping experiences and conditions a vision–language model on these examples for in-context inference, without analytic contact models or manually designed heuristics. On 129 object instances, Exp-Force achieves a best-case MAE of 0.43 N, reducing error by 72% over zero-shot inference. In real-world tests on 30 unseen objects, it improves appropriate force selection rate from 63% to 87%. These results demonstrate that Exp-Force enables reliable and generalizable pre-grasp force selection by leveraging prior interaction experiences. |
Overview Video
Zero-shot vs Exp-Force
Cheez-It Box |
Champagne Glass |
Grasping Fragile and Light Objects
Grasping Fragile and Heavy Objects
Grasping Bottle and Odd Objects
Dataset
The dataset contains 129 object instances used to study pre-grasp force prediction. Each entry includes the object image, its measured mass in grams, and the minimum feasible grasping force in newtons. Expand the table below to browse the full dataset and inspect each object image directly. Click any thumbnail to enlarge it without leaving the page.