Check if POI website is active OpenStreetMap diary

Websites URLs in many cases are not stone steady, hence monitoring their status can be wortly. Here is how to do it in a semi-automatic flavour:
* download POIs,
* ask their URL a reply
* store unresponsive websites OSM objects

Let’s start gathering a list of shops with website tag. This Overpass example query yelds a CSV with essential data separated by comma. You can see the result in Overpass data window.

To automate process (bash), we need the Overpass query string and provide it as an argument of wget command. In “Export”, simply copy the link you find in “raw data directly from Overpass API”, then (remembering to enclose link in double quotes)

1
2

$ wget -O mylist.csv "http://overpass-api.de/api/interpreter?data=%5Bout%3Acsvblablablabla"

at this point mylist.csv contains something like:

1
2
3
4
5

@id,@type,name,website
194581793,node,Sirene Blu,http://www.sireneblu.it/   
228109189,node,Ecoscaligera,http://www.ecoscaligera.com/   
[ETC, ETC]

Now we need to scan each line of mylist.csv and wait for an http reply (ie: 200 OK, 300 moved, etc). It’s done running the following code:

1
2
3
4
5
6
7

#! /bin/bash
while IFS="," read -r OSMid OSMtype OSMname url
do
  REPLY=`curl --silent --head $url | awk '/^HTTP/{print $url}'`
  echo "/"$OSMtype/$OSMid","$REPLY
done < <(tail -n +2 mylist.csv)

Let’s call the above script replies.sh. The output could be something like:

1
2
3
4
5
6
7

$ ./replies.sh 
/node/287058106,HTTP/1.1 301 Moved Permanently
/node/424738144,HTTP/1.1 301 Moved Permanently
/node/534834927,HTTP/2 301 
/node/766863973,HTTP/1.1 200 OK
[ETC, ETC]

Redirecting to a file, such output can be easily filtered with grep in order to obtain a list of OSM objects whole website tag needs to be updated (to null):

1
2

$ ./replies.sh | grep  " 403 " > shops-to-update

tags: linux, bash, URL