TongTest - General Embodied Interaction Testing Platform

Ability View
Dimension View

Test Task Introduction

This platform evaluates the performance of multimodal large language models as embodied agents on 8 daily household composite tasks, comprehensively measuring model capabilities in object understanding, spatial intelligence, social activities, and more.

Counting Objects

Evaluate the model's ability to identify and count specific objects in a scene.

Preparing Baggage

Test the model's ability to select and organize appropriate items based on travel needs.

Building Blocks

Evaluate the model's spatial reasoning and operational ability to understand and execute block building instructions.

Jigsaw Puzzle

Test the model's visual reasoning ability to recognize patterns and complete puzzle tasks.

Understanding Buttons

Evaluate the model's ability to identify button functions and predict operation results.

Setting Tables

Test the model's ability to arrange items reasonably based on categories and spatial relationships.

Tidying Up Rooms

Evaluate the model's ability to plan cleaning tasks and execute reasonable operation sequences.

Selecting Gifts

Test the model's ability to select appropriate gifts based on personal relationships and scenario requirements.

Model Performance Comparison

Model Equal Weighted Average Counting Objects Preparing Baggage Building Blocks Jigsaw Puzzle Understanding Buttons Setting Tables Tidying Up Rooms Selecting Gifts