Digitalization of the 4 Mountains Test for (4MT) for Robot-Administered Spatial Memory Assessment

Background
The 4 Mountains Test (4MT) [1] is a standardized cognitive assessment tool used to evaluate spatial memory, with proven sensitivity in detecting early signs of cognitive decline and dementia. Traditionally, the test is conducted by a human examiner who provides instruction, guides, and moderates the test-taker's progress.

The 4MT consists of: 

  • Topographical tasks: Computer-generated landscapes containing four mountains where the topography (geometry of the surface) can be varied
  • Non-spatial information tasks: Tasks where non-spatial visual features can be independently manipulated

In both task types, the non-tested attributes (spatial/non-spatial) remain the same across four choices but differ from the sample. The changes in viewpoint and non-spatial properties between sample and target ensure that topographical tasks depend on matching allocentric topographical information rather than simple visual pattern matching. The 4MT is shown in Figure 1 and Figure 2.

Figure 1. The Four Mountains Test (4MT). (A) 4MT stimuli are computer-generated heightfields with four mountains, illustrated by a contour map example. Images are rendered from one of seven virtual camera positions. (B) Participants study a sample image, then view four alternatives (one target showing the same place from a different viewpoint, and three foils showing different places) and identify the target. (C) Sample image example. (D) Corresponding target and foils (target at top-left). All images are shown at the same scale, with viewpoint and non-spatial features systematically varied between sample and test images.

Figure 2. Types of tests in 4MT. Top: Timing and layout of test items. In perceptual tests, participants performed a concurrent match-to-sample task, selecting one of four alternatives within 30 s. In memory tests, a 2 s delay (blank page) separated sample and test images.
Bottom: Examples of non-spatial and topographical items. In non-spatial tasks, participants matched scenes by features such as cloud cover, lighting, texture, and vegetation color (target bottom-left). In topographical tasks, matches were based solely on landscape layout, with viewpoint and non-spatial features varied (target top-left). Spatial, configural, and elemental foils occupy the top-right, bottom-left, and bottom-right positions, respectively.
Objectives
The student will develop a prototype system with the following goals:
Level 0 (Core task):
  • Design a Digital 4MT: Create a functional prototype where participants can complete the 4MT on a digital medium without human supervision

  • Define the System Architecture: Clearly define the system's inputs (images, touch/voice responses, video monitoring) and outputs (performance metrics, logs, scores, user feedback) to ensure robust data collection

  • Prepare for Robotic Integration: Design the system with a clear interface for future integration into a larger framework where a robot or virtual agent administers the test autonomously

After finishing level 0, the student can do either level 1 or level 2:
  • Level 1 (Advanced Stimuli) Implement Personalized Stimuli Generation: Enable the system to capture live scenes (e.g., in a user’s home) and automatically generate the necessary test images using multi-view image generation [2]

  • Level 2 (Advanced Interaction) Integrate Multimodal Interaction: Utilize sensors for multi-modal input (ex. audio, visual, touch input) to record responses and monitoring user behavior for the system to react to [3]

Level of Work
  • This thesis is suitable for a Bachelor or Master student with interest in Human-Computer/Robot Interaction (HCI/HRI) with applications in the medical/care field. It combines software implementation with applied research questions on usability in design, software application development, and system integration
  • This work requires you to have experience with the Python programming language. Familiarity with ROS, deep learning techniques for computer vision and code versioning (Git/GitLab) can be advantageous
  • This work will give you the opportunity to (1) gain experience in Python and ROS programming; (2) learn to utilize machine learning models for graphical manipulation; (3) practice scientific writing
Starting Date
As soon as possible. Contact the supervisor of this thesis if you are interested.
Supervisor(s)
References
[1] T. Hartley, C. M. Bird, D. Chan, L. Cipolotti, M. Husain, F. Vargha-Khadem, and N. Burgess, “The hippocampus is required for short-term topographical memory in humans,” Hippocampus, vol. 17, pp. 34–48, 2007.
[2] X. Xie, C. Zou, M. G. Karumuri, J. E. Lenssen, and G. Pons-Moll, “MVGBench: Comprehensive benchmark for multi-view generation models,” arXiv preprint arXiv:2507.00006, 2025.
[3] C. Chirapornchai, F. Niyi-Odumosu, M. Giuliani, and others, “Design and evaluation of a robot telemedicine system for initial medical examination with UK and Thai doctors,” International Journal of Social Robotics, vol. 17, pp. 1769–1786, 2025, doi: 10.1007/s12369-024-01187-1.