Configuring GeoParquet Data Stores

This page describes how to configure GeoParquet data stores in GeoServer, including local files, remote HTTP/HTTPS resources, and S3-hosted datasets.

Creating a GeoParquet Data Store

To create a new GeoParquet data store:

  1. Navigate to Stores ‣ Add new Store in the GeoServer web admin interface

  2. Select GeoParquet under “Vector Data Sources”

  3. Configure the connection parameters as described below

../../_images/geoparquet-config.png

GeoParquet data store configuration panel

Note

TODO: Add screenshot of the GeoParquet data store configuration panel showing all parameters including the new AWS authentication options.

Connection Parameters

The following parameters are available when configuring a GeoParquet data store:

Parameter

Required

Description

Workspace

Yes

The workspace in which to create the store

Data Source Name

Yes

A unique name for this data store

Description

No

A human-readable description of the data store

Enabled

Yes

Whether the store should be enabled (checked by default)

GeoParquet URI (may include glob)

Yes

URI pointing to GeoParquet file(s). Supports local files, HTTP/HTTPS URLs, S3 URIs, and glob patterns

Hive-partitions max depth

No

Maximum depth of Hive partition hierarchy to use (null = all levels, 0 = none, 1+ = specific level)

Use AWS Credential Chain

No

Enable AWS SDK credential chain for S3 authentication (recommended for S3 access)

AWS Region

No

AWS region for S3 access (e.g., us-east-1, eu-west-1). Overrides automatic region detection

AWS Profile

No

AWS profile name to load credentials from ~/.aws/credentials

URI Examples

The GeoParquet URI parameter supports various formats:

Local Files

Point to a local GeoParquet file or directory:

file:///data/countries.parquet
/data/countries.parquet
file:///data/geoparquet-files/

With glob patterns:

/data/**/*.parquet
file:///data/year=2024/month=*/*.parquet

Remote HTTP/HTTPS

Access GeoParquet files over HTTP or HTTPS:

https://example.com/data/countries.parquet
https://storage.googleapis.com/bucket/data.parquet

Amazon S3

Access GeoParquet files stored in Amazon S3:

s3://my-bucket/data/countries.parquet
s3://my-bucket/data/**/*.parquet

Note

For S3 URIs, it is strongly recommended to use the AWS credential chain authentication method described below rather than embedding credentials in the URI.

S3 Authentication

GeoServer provides two methods for authenticating to Amazon S3. The credential chain method is strongly recommended for security and flexibility.

Legacy URI-Based Authentication (Deprecated)

Warning

Security Risk: Embedding credentials directly in URIs is deprecated and strongly discouraged. Credentials in URIs can be exposed through:

  • GeoServer logs and error messages

  • Stack traces

  • Configuration backups

  • Browser history (if accessed through REST API)

This method is maintained only for backward compatibility. Always use the credential chain method for production environments.

If you must use URI-based authentication, you can include credentials as query parameters:

s3://my-bucket/data.parquet?s3_region=us-west-2&s3_access_key_id=AKIAIOSFODNN7EXAMPLE&s3_secret_access_key=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

Hive Partitioning Support

GeoParquet data stores fully support Hive-partitioned datasets, where data is organized in directory hierarchies with key=value patterns:

/data/year=2023/month=01/day=01/file.parquet
/data/year=2023/month=01/day=02/file.parquet
/data/year=2023/month=02/day=01/file.parquet

The data store automatically detects these partitions and creates separate feature types for each unique partition path. You can control the partition depth using the Hive-partitions max depth parameter:

  • null (empty): Use all partition levels

  • 0: Ignore partitioning entirely

  • 1, 2, 3, etc.: Use only the specified number of partition levels

Example: With max depth = 2 on the data above, GeoServer would create feature types like year_2023_month_01 and year_2023_month_02, ignoring the day-level partitioning.

Publishing Layers

After creating and saving a GeoParquet data store:

  1. Navigate to Layers ‣ Add a new layer

  2. Select your GeoParquet data store from the dropdown

  3. Click Publish next to the layer you want to publish

  4. Configure the layer settings (bounding box, CRS, styles, etc.)

  5. Click Save

Your GeoParquet data is now available through GeoServer’s standard services (WMS, WFS, WCS).

Working with Overture Maps Data

The GeoParquet extension works excellently with Overture Maps data, which is distributed as Hive-partitioned GeoParquet datasets. To use Overture Maps data:

  1. Download the desired theme from the Overture Maps Foundation releases

  2. Upload to S3 or use directly from Overture’s public S3 bucket

  3. Create a GeoParquet data store pointing to the Overture data location

  4. Use glob patterns and Hive partitioning to efficiently access specific themes and types

For detailed information about working with Overture Maps data, refer to the GeoParquet module documentation in GeoTools.

Troubleshooting

Common Issues and Solutions

Store creation fails with “Invalid Configuration Error: Secret Validation Failure”

When using AWS credential chain authentication, you may encounter this error:

Failed to create AWS credential chain secret: Invalid Configuration Error:
Secret Validation Failure: during `create` using the following: Credential Chain: 'config'

Cause: This error occurs when Use AWS Credential Chain is enabled but no AWS credentials are available in the expected locations.

Solutions:

  • Set up AWS credentials: Create the ~/.aws/credentials file with valid credentials:

    [default]
    aws_access_key_id = YOUR_ACCESS_KEY_ID
    aws_secret_access_key = YOUR_SECRET_ACCESS_KEY
    
  • Use environment variables: Set AWS credentials as environment variables before starting GeoServer:

    export AWS_ACCESS_KEY_ID=YOUR_ACCESS_KEY_ID
    export AWS_SECRET_ACCESS_KEY=YOUR_SECRET_ACCESS_KEY
    
  • Use IAM roles: If running on EC2, ensure an IAM role is attached to the instance with appropriate S3 permissions

  • Disable credential chain: If you don’t need credential chain authentication, uncheck Use AWS Credential Chain and use URI-based credentials (not recommended for production)

No feature types/layers appear after creating the store

This commonly happens when the AWS region is not correctly detected or configured.

Cause: S3 buckets are region-specific, and the datastore needs to know the correct region to access the data. While the credential chain can sometimes auto-detect the region, it may fail or use the wrong region.

Solution: Explicitly set the AWS Region parameter to match the bucket’s region.

Example: For Overture Maps data in the public bucket:

  • URI: s3://overturemaps-us-west-2/release/2025-11-19.0/theme=base/**/*

  • AWS Region: us-west-2 (must be set explicitly)

The bucket name often hints at the region (overturemaps-us-west-2), but you can also:

  • Check the bucket’s region in the AWS S3 Console

  • Look for region information in the data provider’s documentation

  • Try common regions: us-east-1, us-west-2, eu-west-1

Tip

Always specify the AWS Region when accessing S3 data, even if you think it might be auto-detected. This is especially important for:

  • Public buckets (like Overture Maps)

  • Cross-region access

  • Buckets in non-default regions

Store creation fails with “Access Denied” for S3 URIs

  • Verify that AWS credentials are properly configured

  • Check IAM permissions for the S3 bucket

  • Ensure the bucket region matches the configured region

  • For public buckets, verify they allow anonymous access or provide appropriate credentials

GeoServer logs show parsing errors

  • Verify the URI points to valid GeoParquet files

  • Check that the GeoParquet files have valid geo metadata

  • Ensure the files are not corrupted

  • Verify the glob pattern matches the intended files

Performance issues with large remote datasets

  • Use glob patterns to limit the files being accessed

  • Leverage Hive partitioning to reduce data scanned

  • Consider adjusting the max_hive_depth parameter

  • For S3, ensure GeoServer is deployed in the same AWS region as the data

  • Consider caching frequently accessed data locally

Performance Considerations

  • Columnar Storage: GeoParquet’s columnar format allows efficient access to specific attributes without reading entire records

  • Cloud Optimization: GeoParquet files are optimized for cloud storage with efficient range reads

  • Partitioning: Use Hive partitioning to organize large datasets and enable partition pruning

  • Local Caching: For frequently accessed remote data, consider caching locally or using GeoServer’s tile caching

  • S3 Colocation: Deploy GeoServer in the same AWS region as your S3 data for optimal performance

Limitations

  • Read-Only: The GeoParquet extension is currently read-only; you cannot modify or write data through GeoServer

  • Large Files: Very large individual Parquet files may have slower initial access times

  • Complex Geometries: Extremely complex geometries may impact rendering performance

  • Community Module: As a community module, this extension has limited official support

Further Reading