Exp-Force: Experience-Conditioned Pre-Grasp
Force Selection with Vision-Language Models

Accurate pre-contact grasp force selection is critical for safe and reliable robotic manipulation. Adaptive controllers regulate force after contact but still require a reasonable initial estimate. Starting a grasp with too little force requires reactive adjustment, while starting with too high a force risks damaging fragile objects. This trade-off is particularly challenging for compliant grippers, whose contact mechanics are difficult to model analytically.

We propose Exp-Force, an experience-conditioned framework that predicts the minimum feasible grasping force from a single RGB image. The method retrieves a small set of relevant prior grasping experiences and conditions a vision–language model on these examples for in-context inference, without analytic contact models or manually designed heuristics.

On 129 object instances, Exp-Force achieves a best-case MAE of 0.426 N, reducing error by 72% over zero-shot inference. In real-world tests on 30 unseen objects, it improves appropriate force selection from 67% to 91%.

Exp-Force: Experience-Conditioned Pre-Grasp
Force Selection with Vision-Language Models

Zero-shot vs Exp-Force

Grasping Fragile and Light Objects

Grasping Fragile and Heavy Objects

Grasping Bottle and Odd Objects

Exp-Force: Experience-Conditioned Pre-GraspForce Selection with Vision-Language Models

Zero-shot vs Exp-Force

Grasping Fragile and Light Objects

Grasping Fragile and Heavy Objects

Grasping Bottle and Odd Objects

Exp-Force: Experience-Conditioned Pre-Grasp
Force Selection with Vision-Language Models