Digitalization of the 4 Mountains Test for (4MT) for Robot-Administered Spatial Memory Assessment

Background

The 4 Mountains Test (4MT) [1] is a standardized cognitive assessment tool used to evaluate spatial memory, with proven sensitivity in detecting early signs of cognitive decline and dementia. Traditionally, the test is conducted by a human examiner who provides instruction, guides, and moderates the test-taker's progress.

The 4MT consists of:

Topographical tasks: Computer-generated landscapes containing four mountains where the topography (geometry of the surface) can be varied
Non-spatial information tasks: Tasks where non-spatial visual features can be independently manipulated

In both task types, the non-tested attributes (spatial/non-spatial) remain the same across four choices but differ from the sample. The changes in viewpoint and non-spatial properties between sample and target ensure that topographical tasks depend on matching allocentric topographical information rather than simple visual pattern matching. The 4MT is shown in Figure 1 and Figure 2.

Figure 1. The Four Mountains Test (4MT). (A) 4MT stimuli are computer-generated heightfields with four mountains, illustrated by a contour map example. Images are rendered from one of seven virtual camera positions. (B) Participants study a sample image, then view four alternatives (one target showing the same place from a different viewpoint, and three foils showing different places) and identify the target. (C) Sample image example. (D) Corresponding target and foils (target at top-left). All images are shown at the same scale, with viewpoint and non-spatial features systematically varied between sample and test images.

Figure 2. Types of tests in 4MT. Top: Timing and layout of test items. In perceptual tests, participants performed a concurrent match-to-sample task, selecting one of four alternatives within 30 s. In memory tests, a 2 s delay (blank page) separated sample and test images.
Bottom: Examples of non-spatial and topographical items. In non-spatial tasks, participants matched scenes by features such as cloud cover, lighting, texture, and vegetation color (target bottom-left). In topographical tasks, matches were based solely on landscape layout, with viewpoint and non-spatial features varied (target top-left). Spatial, configural, and elemental foils occupy the top-right, bottom-left, and bottom-right positions, respectively.

Objectives

The student will develop a prototype system with the following goals:

Level 0 (Core task):

Design a Digital 4MT: Create a functional prototype where participants can complete the 4MT on a digital medium without human supervision
Define the System Architecture: Clearly define the system's inputs (images, touch/voice responses, video monitoring) and outputs (performance metrics, logs, scores, user feedback) to ensure robust data collection
Prepare for Robotic Integration: Design the system with a clear interface for future integration into a larger framework where a robot or virtual agent administers the test autonomously

After finishing level 0, the student can do either level 1 or level 2:

Level 1 (Advanced Stimuli) Implement Personalized Stimuli Generation: Enable the system to capture live scenes (e.g., in a user’s home) and automatically generate the necessary test images using multi-view image generation [2]
Level 2 (Advanced Interaction) Integrate Multimodal Interaction: Utilize sensors for multi-modal input (ex. audio, visual, touch input) to record responses and monitoring user behavior for the system to react to [3]

Level of Work

This thesis is suitable for a Bachelor or Master student with interest in Human-Computer/Robot Interaction (HCI/HRI) with applications in the medical/care field. It combines software implementation with applied research questions on usability in design, software application development, and system integration
This work requires you to have experience with the Python programming language. Familiarity with ROS, deep learning techniques for computer vision and code versioning (Git/GitLab) can be advantageous
This work will give you the opportunity to (1) gain experience in Python and ROS programming; (2) learn to utilize machine learning models for graphical manipulation; (3) practice scientific writing

Starting Date

As soon as possible. Contact the supervisor of this thesis if you are interested.

Supervisor(s)

Victoria (Ya-Ting) Yang (victoria.yang∂kit.edu)

References

[1] T. Hartley, C. M. Bird, D. Chan, L. Cipolotti, M. Husain, F. Vargha-Khadem, and N. Burgess, “The hippocampus is required for short-term topographical memory in humans,” Hippocampus, vol. 17, pp. 34–48, 2007.

[2] X. Xie, C. Zou, M. G. Karumuri, J. E. Lenssen, and G. Pons-Moll, “MVGBench: Comprehensive benchmark for multi-view generation models,” arXiv preprint arXiv:2507.00006, 2025.

[3] C. Chirapornchai, F. Niyi-Odumosu, M. Giuliani, and others, “Design and evaluation of a robot telemedicine system for initial medical examination with UK and Thai doctors,” International Journal of Social Robotics, vol. 17, pp. 1769–1786, 2025, doi: 10.1007/s12369-024-01187-1.