Research Experience

Research work connecting vision-language models, humanoid robot perception, and experimental evaluation.

2026

Evaluating LocateAnything for Language-Guided Pre-Grasp Object Localization in Humanoid Robot Perception

This mini-pilot evaluates LocateAnything as a language-guided 2D visual grounding module for humanoid robot pre-grasp perception. The study uses cluttered tabletop and desk scenes with natural-language prompts, manual ground-truth boxes, and IoU-based evaluation. Across 39 valid image-prompt pairs, LocateAnything achieved a mean IoU of 0.646, a median center-point error of 17.2 pixels, and a 76.9% success rate using IoU >= 0.5.

39 valid tabletop image-prompt pairsMean IoU: 0.646Median center-point error: 17.2 pxOverall success rate: 76.9% at IoU >= 0.5Spatial-relation prompts: 93.8% successSimilar-object prompts: 100.0% success in the pilot setSmall-object and ambiguous prompts were the main failure sources
Evaluating LocateAnything for Language-Guided Pre-Grasp Object Localization in Humanoid Robot Perception
MetricObservation
Overall Performance39 valid samples; mean IoU 0.646; median center-point error 17.2 px; success rate 76.9% at IoU >= 0.5.
Strong Prompt TypesSpatial-relation prompts reached 93.8% success; similar-object prompts reached 100.0% success in this small pilot.
Difficult CasesSmall-object prompts reached 58.3% success, and ambiguous prompts reached 33.3% success.
ScopeThe report evaluates 2D perception-stage localization only; it does not test physical grasping or robot control.