Leveraging PostGIS to Write Your FlatGeobuf Files

Posted by spwoodcock on 12/7/2023

To GDAL or not to GDAL

GDAL is an incredible geospatial library and underpins so much of what we do, including our databases (PostGIS).

However, sometimes it might be a bit heavyweight for what we are trying to achieve.

Installing it as a base system dependency inevitably installs everything - there are no options.

image

Install size is especially important when building container images, that we want to be as small as possible for distribution.

GDAL in PostGIS

PostGIS uses GDAL for most of it’s geospatial processing, including reading and writing various geospatial file formats.

FMTM is starting to use FlatGeobuf format for various purposes (OSM data extracts, storing submissions).

It also uses a PostGIS database as part of the software stack.

So today I thought: why not just use the geospatial processing built into PostGIS for reading and writing flatgeobuf data?

The solution was surprisingly painless!

Database Access

First we need a way to access the database.

FMTM is using FastAPI and SQLAlchemy, so ideally we want to pass through and reuse the database session created when an endpoint is accessed.

To make this standalone, I also added functionality to create a database engine from scratch.

image

The nitty-gritty SQL

image

The function requires a FeatureCollection geojson.

Now I’m sure this is a much more efficient way to write this by nesting SQL SELECTs, but I was too lazy to debug and I find this approach quite readable, albeit slightly less efficient.

Using the code

An example of using in FastAPI:

image

Limitations

There is one glaringly obvious limitation of this approach: if reading the FlatGeobuf is implemented in the same way then we lose the benefit of it’s ‘cloud native’ encoding.

Reading requires downloading the entire file, passing to PostGIS, and returning a GeoJSON.

However, that was not the intended purpose of this workaround.

FlatGeobuf is primarily a format meant for browser consumption. With excellent support via the npm package.

So while the backend API can write data to FlatGeobuf without requiring dependencies, the frontend can then read the data if it’s hosted somewhere online (i.e. an S3 bucket).

Code

Apologies for the code screenshots: OSM Diaries does not support code syntax highlighting, nor spaces in code blocks.

Database code

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 from sqlalchemy.engine import create_engine from sqlalchemy.orm import DeclarativeBase, Session def get_engine(db: Union[str, Session]): """Get engine from existing Session, or connection string. If `db` is a connection string, a new engine is generated. """ if isinstance(db, Session): return db.get_bind() elif isinstance(db, str): return create_engine(db) else: msg = "The `db` variable is not a valid string or Session" log.error(msg) raise ValueError(msg)

SQL code

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 from geojson import FeatureCollection from sqlalchemy.orm import Session def geojson_to_flatgeobuf(db: Session, geojson: FeatureCollection): """From a given FeatureCollection, return a memory flatgeobuf obj.""" sql = f""" DROP TABLE IF EXISTS public.temp_features CASCADE; CREATE TABLE IF NOT EXISTS public.temp_features( id serial PRIMARY KEY, geom geometry ); WITH data AS (SELECT '{geojson}'::json AS fc) INSERT INTO public.temp_features (geom) SELECT ST_AsText(ST_GeomFromGeoJSON(feat->>'geometry')) AS geom FROM ( SELECT json_array_elements(fc->'features') AS feat FROM data ) AS f; WITH thegeom AS (SELECT * FROM public.temp_features) SELECT ST_AsFlatGeobuf(thegeom.*) FROM thegeom; """ # Run the SQL result = db.execute(text(sql)) # Get a memoryview object, then extract to Bytes flatgeobuf = result.fetchone()[0].tobytes() # Cleanup table db.execute(text("DROP TABLE IF EXISTS public.temp_features CASCADE;")) return flatgeobuf

Usage code:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 from sqlalchemy.engine import create_engine from sqlalchemy.orm import DeclarativeBase, Session def get_engine(db: Union[str, Session]): """Get engine from existing Session, or connection string. If `db` is a connection string, a new engine is generated. """ if isinstance(db, Session): return db.get_bind() elif isinstance(db, str): return create_engine(db) else: msg = "The `db` variable is not a valid string or Session" log.error(msg) raise ValueError(msg)