A Conversation for Geographical Information System

Enhancements to this entry

Post 1

Gordon, Ringer of Bells, Keeper of Postal Codes and Maps No One Can Re-fold Properly

I think this entry could be improved by drawing on the following. smiley - smiley

Whenever I have to explain what a GIS is to a layman, I describe it as a spatial database that you can look at like a map but do all sorts of things that are difficult to do with a map.

When talking about how geographic data is handled, it is more accurate to talk about raster and vector data. Cadastral data (not cadastra) describes land ownership and similar data.

Vector data refers to data that are stored using X,Y (and Z) coordinates rather as cells in a matrix (ie raster or matrix data). Features such as roads, rivers, elevation and most other features found on a topographic map are best stored as vector data (ie: points, lines and polygons). This is because their locations and areas can be very precisely stored and represented. Also, you can zoom in or out without being as concerned about aliasing ("jaggies") that you see in raster data sets. Also, vector datasets tend to be less bulky than a raster dataset covering the same landarea.

Raster and matrix data refers to data that are stored in a regular grid. Though they are both stored in grids or arrays, raster data are data that "real", while matrix data are the output of models and similar processes. Raster data also tend to be stored in two dimensional arrays while matrix data can have two or more dimensions. (You could have three or more dimensions if your dataset is a timeseries, for example.) For the balance of this, when I say raster, I am referring to both raster and matrix data. In raster data, you have a grid in which all the cells have the same fixed dimensions, which is sometimes referred to as the resolution of the dataset.

Some types of data lend themselves very well to being represented as rasters. Remote sensing data such as airphotos and satellite imagery, are well suited to being stored in rasters. These data are generated by sensors that generate pixels. You store each pixel in its own cell with the same dimensions as the pixels.

Certain operations, such as neighbour functions, are easier to perform on raster data than on vector data. In a raster, you simply need to know the X and Y (and Z in a 3D matrix) offsets and from those you can quickly identify the eight adjoining cells (in 2D). Performing the same operation in a vector dataset is much more involved because you have to walk through the topology of the data. While relatively easy to identify rook neighbours (those neighbours that share a common boundary), identifying bishop neighbours (those neighbours that only share a node or "corner" with each other) is more problematic.

I mentioned aliasing earlier. If you represent a circle using vector datatypes, the circle will (should) be round regardless of how close in you zoom. If you represent a circle in a raster, it may look round when you zoom out, but as you zoom in closer and closer, the edge of the circle will quickly become "jaggy" because the cells are square or rectangular and you cannot precisely represent a circle using squares or rectangles.

With respect to dataset sizes, unless you have an inordinate number of features, a vector dataset will tend to be smaller than a raster dataset covering the same area. However, there are compression algorithms used in some raster datastructures, such as run length encoding, which can reduce the dataset by exploiting the fact that a given cell will tend to be like its neighbours. (Run length encoding looks for repeating values and replacing the repeated values with a count of how many times it repeats. For example: if you have this set of values: 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3,4, 5 you could minimise it using RLE to 1(5),2(3),3(4),4,5 which would probably take less space to store.

Some GIS appications only handle raster data, some only handle vector data while the more advanced GIS integrate the two.

In general, geographic information systems allow you to manipulate huge amounts of data in ways that were not possible not that long ago. Though they've been around for a long time, they are one of the leading applications because the technology has matured. With the tools found in most GIS applications, you can identify relationships in your data that you might not have suspected as being there.


Key: Complain about this post

Write an Entry

"The Hitchhiker's Guide to the Galaxy is a wholly remarkable book. It has been compiled and recompiled many times and under many different editorships. It contains contributions from countless numbers of travellers and researchers."

Write an entry
Read more