COG ImageMosaic from local storage to S3

Introduction

This tutorial provides instructions to update an existing ImageMosaic built on top of local granules to a COG ImageMosaic with granules stored on S3 bucket. It is aimed to users that want to move COG granules of an ImageMosaic to a remote bucket without the need of re-harvesting the whole collection of granules.

Assumptions

  • An ImageMosaic store already exists, with its index based on a DB (i.e. PostGIS).

  • Local GeoTIFF granules are already valid COGs.

  • User has experience with uploading data on S3.

Verifying data is valid COG

Verifying that a sample GeoTIFF is a valid COG can be achieved using COG validator service.

  1. Store a sample GeoTIFF to the target bucket (or to the server location) you will use for remote serving and copy its full URL location, i.e. https://modis-vi-nasa.s3-us-west-2.amazonaws.com/MOD13A1.006/2018.01.01.tif.

  2. Go to COG Validator

  3. Paste the sample COG URL in the text box and hit the submit button.

  4. In case the sample file is a valid COG, you will get a message like this:

Cloud Optimized GeoTIFF Validator: result Validation succeeded ! https://sample.s3.eu-central-1.amazonaws.com/test/cog.tif is a valid Cloud Optimized GeoTIFF.

In case the file isn’t a valid COG, you can use GDAL 3.1 or above to convert your file to COG format. See the related GDAL documentation for further details.

Once the data has been verified, all of your granules need to be stored to an S3 bucket.

ImageMosaic update

Next step is updating both the ImageMosaic’s config as well as the index.

ImageMosaic configuration update

A few new properties need to be added to the ImageMosaic configuration to support COG.

Locate the .properties file containing the mosaic configuration. It’s usually a .properties file having the same name of the parent folder. You may recognize it since it’s usually being autogenerated during first ImageMosaic configuration and it contains this header:

#-Automagically created from GeoTools-.

Let’s assume it’s named mosaic.properties for simplicity for future references in this documentation. Once located, edit that file by adding these new properties:

  • Cog=true

  • SuggestedSPI=it.geosolutions.imageioimpl.plugins.cog.CogImageReaderSpi

When storing your granules on a public bucket, you may stick with the default RangeReader implementation so no other flags are needed and you can skip to the ImageMosaic index update paragraph.

In case you are using a private bucket instead, you need to specify additional properties to the mosaic.properties file:

  • CogRangeReader=it.geosolutions.imageioimpl.plugins.cog.S3RangeReader

  • CogUser=S3AccessKeyID

  • CogPassword=S3SecretAccessKey

Where the S3AccessKeyID and S3SecretAccessKey are the actual values needed to access that bucket.

ImageMosaic index update

The next step is updating the ImageMosaic index which is a catalog of all the granules composing the mosaic. We need to update the location values to refer to remote URLs instead of local paths on disk. The location attribute initially contains the path of each granule on disk, which can be either a relative or an absolute path. Relative paths are relative to the ImageMosaic parent configuration folder whilst absolute paths are full paths.

The mosaic.properties file contains a PathType property set to RELATIVE or ABSOLUTE. On old mosaics, that property might be missing and AbsolutePath property exists instead with a boolean value true/false. Based on that, note that all the paths of the same mosaic will be either relative or absolute.

To give you an example, an ImageMosaic stored at /var/data/imageMosaic/mosaic with a granule at /var/data/imageMosaic/mosaic/2018.01.01.tif may have a record in the database with location attribute equal to :

  • 2018.01.01.tif in case of relative path

  • /var/data/imageMosaic/mosaic/2018.01.01.tif in case of absolute path.

The type of path affects the query to be executed to update the index.

Note

Make sure to backup your table for a quick recovery in case of messes with the updating query.

For this example, we are going to use the same public datasets from S3 Urls being used in the previous ImageMosaic example with Modis COG datasets section.

For location with relative paths a simple replacing query could be like this:

UPDATE schema.table SET location=CONCAT(
'https://modis-vi-nasa.s3-us-west-2.amazonaws.com/MOD13A1.006/', location);

So we are basically prepending the S3 bucket URL to the location value. By this way, based on the above examples, location=2018.01.01.tif will become location='https://modis-vi-nasa.s3-us-west-2.amazonaws.com/MOD13A1.006/2018.01.01.tif

For location with absolute path, a replacing query may be like this (for our example):

UPDATE schema.table SET location=REPLACE(location,'/var/data/imageMosaic/mosaic/',
'https://modis-vi-nasa.s3-us-west-2.amazonaws.com/MOD13A1.006/');

GeoServer reload

Once everything is done, reload the GeoServer configuration.