Minutely updated tile volume: Technical details

Posted by pnorman on 1/15/2024

I’ve been looking at how many tiles are changed when updating OSM data in order to better guide resource estimations, and have completed some benchmarks. This is the technical post with details, I’ll be doing a high-level post later.

Software like Tilemaker and Planetiler is great for generating a complete set of tiles, updated about once a day, but they can’t handle minutely updates. Most users are fine with daily or slower updates, but OSM.org users are different, and minutely updates are critical for them. All the current minutely ways to generate map tiles involve loading the changes and regenerating tiles when data in them may have changed. I used osm2pgsql, the standard way to load OSM data for rendering, but the results should be applicable to other ways including different schemas.

Using the Shortbread schemea from osm2pgsql-themepark I loaded the data with osm2pgsql and ran updates. osm2pgsql can output a list of changed tiles (“expired tiles”) and I did this for zoom 1 to 14 for each update. Because I was running this on real data sometimes an update took longer than 60 seconds to process if it was particularly large, and in this case the next run would combine multiple updates from OSM. Combining multiple updates reduces how much work the server has to do at the cost of less frequent updates, and this has been well documented since 2012, but no one has looked at the impact from combining tiles.

To do this testing I was using a Hezner server with 2x1TB NVMe drives in RAID0, 64GB of RAM, and an Intel i7-8700 @ 3.2 GHz. Osm2pgsql 1.10 was used, the latest version at the time. The version of themepark was equivalent to the latest version

The updates were run for a week from 2023-12-30T08:24:00Z to 2024-01-06T20:31:45Z. There were some interruptions in the updates, but I did an update without expiring tiles after the interruptions so they wouldn’t impact the results.

To run the updates I used a simple shell script

1 2 3 4 5 6 7 8 #!/bin/bash set -e while : do SEQUENCE=$(osm2pgsql-replication status -d shortbread --json | jq '.local.sequence') osm2pgsql-replication update -d shortbread --once -- --expire-tiles=1-14 -o "expire_files/$SEQUENCE.txt" sleep 60 done

Normally I’d set up a systemd service and timer as described in the manual, but this setup was an unusual test where I didn’t want it to automatically restart.

I then used grep to count the number by zoom in each file, creating lists for each zoom.

1 2 3 for z in `seq 1 14`; do find "$@" -type f -exec grep -Ech "^$z/" {} + >> $z.txt done

This let me use a crude script to get percentiles and the mean, and assemble them into a CSV.

1 2 3 4 5 6 7 8 #!/usr/bin/env python3 import numpy import sys nums = numpy.fromfile(sys.argv[1], dtype=int, sep=' ') mean = numpy.mean(nums) percentiles = numpy.percentile(nums, [0, 1, 5, 25, 50, 75, 95, 99, 100]) numpy.set_printoptions(precision=2, suppress=True, floatmode='fixed') print(str(mean) + ',' + ','.join([str(p) for p in percentiles]))

A look at the percentiles for zoom 14 immediately reveals some outliers, with a mean of 249 tiles, median of 113, p99 of 6854, and p100 of 101824. I was curious what was making this so large and found the p100 was with sequence number 5880335, which was also the largest diff. This diff was surrounded by normal sized diffs, so it wasn’t a lot of data. The data consumed would have been the diff 005/880/336

A bit of shell magic got me a list of changesets that did something other than add a node: osmium cat 005880336.osc.gz -f opl| egrep -v '^n[[:digit:]]+ v1' | cut -d ' ' -f 4 | sort | uniq | sed 's/c\(.*\)/\1/' Looking at the changesets with achavi, 145229319 stood out as taking some time to load. Two of the nodes modified were information boards that were part of the Belarus - Ukraine border and Belarus-Russia border. Thus, this changeset changed the Russia, Ukraine, and Belarus polygons. As these are large polygons, only the tiles along the edge were considered dirty, but this is still a lot of tiles!

After validating that the results make sense, I got the following means and percentiles, which may be useful to others.

Tiles per minute, with updates every minute

zoommeanp0p1p5p25p50p75p95p99p100
z13.3122334444
z25.112.634567710
z39.11458911131524
z412.8157101215202452
z517.11581317202835114
z621.71691521263748262
z725.61691724314663591
z829.216917263455921299
z934.51610182837641732699
z1044.61710203141803305588
z1165.6171223354912566811639
z121111814294464238140924506
z13215110184064102527315052824
z14468114276611319912247306119801

Based on historical OpenStreetMap Carto data the capacity of a rendering server is about 1 req/s per hardware thread. Current performance is slower, but includes The new OSMF general purpose servers are mid-range servers and have 80 threads, so should be able to render about 4800 tiles per second. This means that approximately 95% of the time the server will be able to complete re-rendering tiles within the 60 seconds between updates. A couple of times an hour it will be slower.

As mentioned earlier, when updates take over 60 seconds, multiple updates combine into one and reduce the amount of work to be done. I simulated this by merging every k files together. Contuining the theme of patched-together scripts I did this with a shell script, based on StackExchange

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 k=2 indir="expire_files_2/" dir="expire_2_mod$k" readarray -td $'\0' files < <( for f in ./"$indir"/*.txt; do if [[ -f "$f" ]]; then printf '%s\0' "$f"; fi done | sort -zV ) rm -f ./"$dir"/joined-files*.txt for i in "${!files[@]}"; do n=$((i/k+1)) touch ./"$dir"/joined-files$n.txt cat "${files[i]}" ./"$dir"/joined-files$n.txt | sort | uniq > ./"$dir"/joined-files$n.txt done

Running the results through the same process for percentiles generates numbers in tiles per update - but updates are half as often, so in terms of work done per time, all the numbers need to be divided by k. For a few k, here’s the results.

k=2

zoommeanp0p1p5p25p50p75p95p99p100
z11.70.5111.51.52222
z22.50.511.522.533.53.55
z34.50.522.544.55.56.57.512
z46.40.52.53.5567.51012.526
z58.60.52.546.58.5101417.551
z610.90.52.94.57.510.51318.524.5107
z713.00.534.58.51215.52332239
z814.90.534.5913172750535
z917.80.5359.51418.532971127
z10240.5351015.520.5411922347
z11360.53.5611.517.524653954888
z12640.54714.5223212084410338
z131200.559203250265178622379
z142630.5714335699617398850912

k=5

zoommeanp0p1p5p25p50p75p95p99p100
z10.660.200.400.400.600.600.800.800.800.80
z21.010.200.400.600.801.001.201.401.402.00
z31.820.200.801.001.601.802.202.603.004.60
z42.540.201.001.402.002.403.004.004.808.00
z53.400.201.001.602.603.404.005.407.0018.80
z64.310.201.021.803.204.205.207.409.8042.60
z75.080.201.201.803.404.806.209.2012.6093.60
z85.780.201.201.803.405.206.8011.0018.93206.20
z96.780.201.202.003.605.607.4013.0035.40430.40
z108.730.201.402.004.006.208.2016.4067.48895.20
z1112.760.201.402.404.607.009.6025.16150.321,865.40
z1221.600.401.602.805.808.8012.8047.00328.893,932.40
z1341.880.402.003.608.0012.8020.60102.08712.368,486.80
z1491.760.402.805.4013.0022.8040.40239.881,597.6619,274.40

Finally, we can reproduce the Geofabrik graph, looking at tiles per minute with update interval and get approximately work ∝ update ^ -1.05, where update is the number of minutes between updates. This means combining multiple updates is very effective at reducing load.

Usage of standard layer in May