Underloc: Image-Based Relocalization and Alignment for Long-Term Monitoring of Dynamic Underwater Environments

1Queensland University of Technology

Underloc is a method for image-based relocalization and alignment for long-term monitoring of dynamic underwater environments.

Abstract

Effective monitoring of underwater ecosystems is crucial for tracking environmental changes, guiding conservation efforts, and ensuring long-term ecosystem health. However, automating underwater ecosystem management with robotic platforms remains challenging due to the complexities of underwater imagery, which pose significant difficulties for traditional visual localization methods. We propose an integrated pipeline that combines Visual Place Recognition (VPR), feature matching, and image segmentation on video-derived images. This method enables robust identification of revisited areas, estimation of rigid transformations, and downstream analysis of ecosystem changes. Furthermore, we introduce the SQUIDLE+ VPR Benchmark-the first large-scale underwater VPR benchmark designed to leverage an extensive collection of unstructured data from multiple robotic platforms, spanning time intervals from days to years. The dataset encompasses diverse trajectories, arbitrary overlap and diverse seafloor types captured under varying environmental conditions, including differences in depth, lighting, and turbidity.

Method

To enable reliable multi-year change detection, Underloc is an integrated pipeline that combines Visual Place Recognition (VPR), feature matching, and image segmentation on video-derived images. Using the state-of-the-art VPR method, MegaLoc, our hierarchical method takes the top K matched images per query and reranks these candidates using more computationally expensive local feature matching. We use LightGlue to establish keypoint correspondences between Superpoint features and compute the homography matrix for warping and aligning query-database matches. Using the inlier count, we rerank matched images and filter out those with reprojection errors greater than 10 pixels. To simulate a potential change detection method we automatically extract segmentation masks for each image using Segment-Anything 2 (SAM2). We then use the homography matrix to warp the masks into a common image space, enabling pixel-level comparison using intersection over union (IoU) as a similarity proxy.

Underloc pipeline.

Results

Precision Recall Curves

Our hierarchical method, combining MegaLoc with SuperPoint, achieves performance comparable to brute-force SuperPoint (average precision of 16% for our hierarchical method vs 18% for brute-force).

Okinawa (2017-2018)

Precision Recall Curve for Okinawa (2017-2018).

Tasman Fracture (2018/12/04-06)

Precision Recall Curve for Tasman Fracture (2018/12/04-06).

St Helens (2011-2013)

Precision Recall Curve for St Helens (2011-2013).

Qualitative Results

We evaluate the alignment between the two warped masks using an intersection over union (IoU) metric, where the intersection is defined as the number of shared pixels between the query and database masks, and the union represents the total number of pixels covered by both masks in the warped image.

Qualitative Results.

SQUIDLE+ VPR Benchmark

Built-In Datasets

The first large-scale underwater VPR benchmark designed to leverage an extensive collection of unstructured data from multiple robotic platforms, spanning time intervals from days to years.

Okinawa: 2017-2018

Main Image


Leveraging Other SQUIDLE+ Sequences

Using our publicly available code, any sequence from SQUIDLE+ can be exported and processed by the pipeline to create a dataset that encompasses diverse trajectories, arbitrary overlap and diverse seafloor types captured under varying environmental conditions.

BibTeX

@inproceedings{GorryIROS2025,
    title={Image-Based Relocalization and Alignment for Long-Term Monitoring of Dynamic Underwater Environments},
    author={Beverley Gorry and Tobias Fischer and Michael Milford and Alejandro Fontan},
    booktitle={IEEE/RSJ Conference on Intelligent Robots and Systems},
    year={2025},
}

Acknowledgements

This research was partially supported by funding from ARC Laureate Fellowship FL210100156 to MM and ARC DECRA Fellowship DE240100149 to TF. The authors acknowledge continued support from the Queensland University of Technology (QUT) through the Centre for Robotics.

We would particularly like to acknlowedge the authors of VPR-methods-evaluation, MegaLoc, LightGlue, SAM2, and VSLAM-LAB.