Exp-Force: Experience-Conditioned Pre-Grasp
Force Selection with Vision-Language Models
|
Accurate pre-contact grasp force selection is critical for safe and reliable robotic manipulation. Adaptive controllers regulate force after contact but still require a reasonable initial estimate. Starting a grasp with too little force requires reactive adjustment, while starting with too high a force risks damaging fragile objects. This trade-off is particularly challenging for compliant grippers, whose contact mechanics are difficult to model analytically. We propose Exp-Force, an experience-conditioned framework that predicts the minimum feasible grasping force from a single RGB image. The method retrieves a small set of relevant prior grasping experiences and conditions a vision–language model on these examples for in-context inference, without analytic contact models or manually designed heuristics. On 129 object instances, Exp-Force achieves a best-case MAE of 0.426 N, reducing error by 72% over zero-shot inference. In real-world tests on 30 unseen objects, it improves appropriate force selection from 67% to 91%. |
Zero-shot vs Exp-Force
Cheez-It Box |
Champagne Glass |