Michael Markieta | michael@spatialanalysis.ca
For many, free and open-source data and software represents accessibility to otherwise inaccessible geospatial workflows in terms of cost and availability. Commercial data used in geographic information systems (GIS) is available through a relatively small number of merchants or vendors, which produce highly accurate, precise, and detailed information. This is produced, however, at a cost that many small and large businesses, private consultants, and startups cannot afford. Open-source data, such as the volunteer geographic information on OpenStreetMap (OSM), represents a community effort to build one of the best web maps, and subsequently the best GIS database, available for free to the public. OpenStreetMap is a web-based map to which any registered user can submit data. These updates, over time, populate the now extensive web map that is the OpenStreetMap. At the same time, the data that lives on the OpenStreetMap can be downloaded and used inside of a GIS for geospatial analysis, cartographic rendering, and other geo-related tasks.
There are various workflows for extracting and consuming the data that is made available by the OpenStreetMap project. One of these methods is outlined in this tutorial. This tutorial will take Mac OSX users through a typical setup of a local PostgreSQL database, downloading and parsing raw OpenStreetMap data, and querying the database to extract data for use in QGIS, an open-source GIS package. Upon completing this tutorial, users will have hit the ground running, with the ability to run spatial queries—such as locating all the coffee shops within 500 metres of a subway station—or building cartographically pleasing reference map books with data that is of interest to the map reader.
The tutorial will follow a guided, step-by-step instruction that will assume the role of a new user installing and processing all data from scratch. The included links and information above are for background to the project that we are about to begin. Please follow all the instructions (don’t skip any steps if you don’t know what you are doing) and download all the required data when prompted to.
We will be using Terminal in this guide but it is not expected, nor is it required, that any readers of this tutorial have any Terminal.app background or knowledge. Terminal allows users to interact with the computer through a command-line interface. If you have not seen or used the Terminal before, you may have come across some instance of command-line interfaces (perhaps on Windows machines, à la Command Prompt). I will do my best to explain what we are doing during the phase of the tutorial which utilizes the Terminal.app.
If you would like more information about any of the items mentioned above, please feel free to visit the respective website/wiki as listed below (but remember to come back!):
OpenStreetMap http://www.openstreetmap.org/
Planet.osm http://wiki.openstreetmap.org/wiki/Planet.osm
PostgreSQL http://en.wikipedia.org/wiki/PostgreSQL
PostGIS http://en.wikipedia.org/wiki/PostGIS
Osm2pgsql http://wiki.openstreetmap.org/wiki/Osm2pgsql
QGIS http://www.qgis.org/wiki/Welcome_to_the_QGIS_Wiki
OpenStreetMap (OSM) is a collection of free geographic data that can be viewed within a browser (http://www.openstreetmap.org/). We can also access, download, and utilize the underlying OSM project data from various repositories. What makes this so attractive is the fact that it’s a community driven project, which means that anyone can contribute to it. It is also free to use and distribute under the CC-BY-SA license (as long as we attribute OSM and the license itself). The standard package that OSM distributes is called the “planet.osm.” It is a standard XML-formatted .osm file of global data which, at the time of writing (Aug 22, 2011), is over 220 GB (17 GB compressed). The planet.osm is updated weekly (every Thursday) and includes the latest revisions of nodes, ways and polygons (points, lines, and polygons). I strongly encourage you NOT to download the entire planet.osm and work with it, due to its size and the required computing power needed to work with such a database. We will be exploring an extract of the planet.osm—specifically the province of Ontario, Canada—which comes in a much smaller/cpu-friendly size (2 GB uncompressed/350 MB compressed).
http://wiki.openstreetmap.org/wiki/Planet.osm provides a list of mirrors from which the planet.osm file and its extracts can be downloaded. We will specifically be using the CloudMade directory located on http://downloads.cloudmade.com/
This tutorial will reflect what I am doing on my computer. This is done in an effort to address commonly made mistakes. Tailor the tutorial to suit your own needs or follow my instructions to a tee.
Let us make sure that we stay somewhat organized during this tutorial.
We will download all of our installation files to the “downloads” directory and we will download the OSM data to the “data” directory.
Let’s download an extract of the planet.osm file (remember that the planet.osm file is too large to handle on its own). We will use the CloudMade repository which updates their planet.osm and “extract.”osm (such that “extract” is the name of a location/place) weekly.
Go to http://downloads.cloudmade.com/
The repository is organized in a hierarchal structure (Region > SubRegion > Country > Province or State). Note that CloudMade does not have the entire planet.osm parsed into smaller extracts, such that only the most popular or demanded areas have been extracted for us to use. Some repositories will extract different regions than others. There are also some repositories that extract smaller scale areas (Cities, Towns, etc).
Here we will use the Ontario extract, which is located in Americas > Northern America > Canada > Ontario. Feel free to use any of the other extracts. However, I recommend that you choose an extract from the lowest level in the hierarchy (e.g., provinces or cities), and stay away from the larger extracts such as regions or countries.
Do not extract this file. A benefit of using the tools mentioned in this tutorial, is the ability to work with fully compressed OSM data. Why is the file named “ontario.osm.bz2”? BZIP2 (bz2) is an open-source compression tool that OSMutilizes to produce small production packages (eg. a 17 GB “planet.osm. bz2” v. 220 GB uncompressed “planet.osm”).
The next step is to download our database client, PostgreSQL. PostgreSQL is an enterprise-level database that is able to scale efficiently upon the demands of a single user or multiple users. By enabling spatial data storage with the PostGIS extension, PostgreSQL can store a wide variety of geometric objects, such as point, line polygons, multipoint, multiline, multipolygon, and geometric collections. Spatial databases empower the user by providing spatial functions, such as calculating the number of features within a specified radius of a point, or identifying the distance between two objects on the surface of the earth. PostGIS can also handle the reprojection of data as it is retrieved from the database, such that data can be stored in one common projection, but retrieved in a user-specified coordinate system.
The ESRI shapefile datatype is a common format for GIS data storage; however, with the amount of data provided by OpenStreetMap, it is unlikely that the shapefile specification can be used effectively. For example, the linestrings (polylines in ESRI-speak) table in our OpenStreetMap database contains all of the roads, paths, footways, creeks, rivers and so on. To effectively work with this data in an ESRI shapefile datatype, the features would require separation based on thematic content (roadways, pedestrian-ways, natural). However, while the data is in our PostgreSQL database, we can easily create three separate queries on our data to extract the same thematic content. This saves both space and time, as our data can be precisely extracted from one source database, based on the tagging system used in OpenStreetMap (http://wiki.openstreetmap.org/wiki/Map_Features), as opposed to producing three separate feature classes in the ESRI shapefile format. Lastly, it is important to note that queries on our database do not make changes to the data itself, and therefore can be easily modified if the user if the query results are not as expected.
There are many iterations of PostgreSQL and it can get quite confusing for the beginner user. Luckily, Dave Page at Enterprise DB maintains an easy-to-use, all-in-one installer that is available for Mac OS X (and other platforms as well).
We will now try to install PostgreSQL. If this is your first attempt at installing PostgreSQL, you will be prompted with regard to your computer’s “Shared Memory” configuration. Not to worry: PostgreSQL handles the changes that are necessary, and yes, it is safe to allow PostgreSQL make these changes. I’ve included a snippet from the PostgreSQL “readme.”
PostgreSQL uses shared memory extensively for caching and inter-process communication. Unfortunately, the default configuration of Mac OS X does not allow suitable amounts of shared memory to be created to run the database server.
The installer will take a minute or so to complete, and will then ask if you would like to “Launch Stack Builder at exit?” This is required; we will need to install the PostGIS extension because PostgreSQL cannot handle our OSM data on its own. PostGIS will act as our database’s forerunner, handling the spatial information in the OSM data for our PostgreSQL database.
Our next step is to download and install osm2pgsql, which will expedite the “ontario.osm.bz2” file into our PostgreSQL database.
We will use osm2pgsql to parse our OSM data into the PostgreSQL database. Here is where we will encounter the use of the Terminal.app. As I mentioned earlier, I will try to explain what exactly we are telling Terminal to do.
The first line of code—if you haven’t opened terminal recently—should say something similar to this (replace my computer name and username with your own). Terminal is letting us know what {whose} computer we are on (michaelmarkietas- mac), and where are we performing the tasks (a directory; in this case, michael markieta’s home directory).
We need to change the working directory from our user directory (michaelmarkieta) to the desktop. We use the “cd” command, which intuitively means “change directory.” We also pass in the location to which we would like to change our directory. In this case, we will change directory to the “desktop.” Type the following into Terminal (or copy and paste):
If done correctly, Terminal will switch the current working directory to the desktop and the repeating line of code should read something like this:
Terminal should now be working from within the “data” folder, which is located inside the “osm_tutorial” folder, which is also located inside {on} the “desktop”, which is also part of michael markieta’s user folder. The full path would look something like “/Users/Michael Markieta/Desktop/osm_tutorial/data.”
Now to perform some osm2pgl magic. The tool offers many configurable parameters. These can be seen by typing the following into Terminal:
We will concern ourselves with a few of these parameters, but it’s useful to look over what is included with a tool when you install something new. We will be using –U, username, –d, database name to guide osm2pgsql in parsing the ontario.osm.bz2 file into our PostgreSQL database. The code will need to look like this:
We will need to let it run for a few minutes. If everything went smoothly, you should have received the following confirmation in Terminal:
QGIS is an open-source GIS software licensed under the GNU public license. This tutorial makes use of QGIS to query, extract, and visualize the OpenStreetMap data that currently lives inside of our PostgreSQL database. QGIS provides a vast array of functionality which makes it a competitor of commercial solutions. Also, due to its open-source roots, users will find direct compatibility with many other open-source projects such as the PostgreSQL database. QGIS also provides powerful Python scripting capabilities, making it a viable option for automating spatial workflows. Lastly, the QGIS community has a large and growing plugin repository where users can find useful tools or scripts that enhance the usability and efficiency of QGIS.
The QGIS dependencies and the QGIS installer are located on the KyngChaos Wiki.
Let’s launch QGIS for the first time and query our PostgreSQL/PostGIS database and load some OpenStreetMap data.
If this works, you will be able to connect to your OpenStreetMap PostGIS layer and you will get the following screen:
I will show you how to add some data using the query builder. All that you will need to do is get a grasp of the query builder syntax and the OSM tables so you can build your own custom queries and add OSM data to your projects at your heart’s content.
The data will be queried from your PostgreSQL/PostGIS database and added to your map. If you have followed the tutorial using the Ontario extract you should have something that looks similar to the following image:
That is it! You are now ready to start using OSM data in your GIS. Note that you can export to shapefile in QGIS so that you can take the data and move it across computers and open it up in other GIS applications. Remember to save often in case QGIS crashes!