Dawei Cai / David Cai | ECE Portfolio

2026

Evaluating LocateAnything for Language-Guided Pre-Grasp Object Localization in Humanoid Robot Perception

This mini-pilot evaluates LocateAnything as a language-guided 2D visual grounding module for humanoid robot pre-grasp perception. The study uses cluttered tabletop and desk scenes with natural-language prompts, manual ground-truth boxes, and IoU-based evaluation. Across 39 valid image-prompt pairs, LocateAnything achieved a mean IoU of 0.646, a median center-point error of 17.2 pixels, and a 76.9% success rate using IoU >= 0.5.

39 valid tabletop image-prompt pairsMean IoU: 0.646Median center-point error: 17.2 pxOverall success rate: 76.9% at IoU >= 0.5Spatial-relation prompts: 93.8% successSimilar-object prompts: 100.0% success in the pilot setSmall-object and ambiguous prompts were the main failure sources

Poster Report

Evaluating LocateAnything for Language-Guided Pre-Grasp Object Localization in Humanoid Robot Perception

Metric	Observation
Overall Performance	39 valid samples; mean IoU 0.646; median center-point error 17.2 px; success rate 76.9% at IoU >= 0.5.
Strong Prompt Types	Spatial-relation prompts reached 93.8% success; similar-object prompts reached 100.0% success in this small pilot.
Difficult Cases	Small-object prompts reached 58.3% success, and ambiguous prompts reached 33.3% success.
Scope	The report evaluates 2D perception-stage localization only; it does not test physical grasping or robot control.

Research Experience

Evaluating LocateAnything for Language-Guided Pre-Grasp Object Localization in Humanoid Robot Perception