f(GIS)
concepts
- Data model
- Functional model
- Layer classification
- Implementation of data model
- Regions and chartographic projection
- Program design
- Layer as Tcl object
- Planchet - object for displaying maps
- Drawing modes for raster layer
- Low level objects
- GIS operation
- Utilities
- Data access library
Data model
GIS is a software system for processing spatial data. So, adequate model
of spatial phenomena is most important thing for GIS.
It should provide way to represent spatial phenomena in computer memory,
allow to perform desired operation on this representation and let user
see the results in form, he used to. Ideally, GIS system should hide
complicated issues of internal data storage from user as well as text
processor hides questions of font rendering or kerning or SQL database
hides actual file layout and search technologies, providing simple,
but powerful relational operations instead.
Many modern
GIS systems, especially vector based, like ARC/Info, try to
represent map of spatial phenomena rather than spatial phenomena
itself. It leads to overcomplication of storage format and processing
algorithms, and makes user worry about such technical things as polygon
topology, which are completely irrelevant to his problem (say geology
or soil science), as font rendering hints and kerning is irrelevant to
contents of article, typesetted with some partcular font. Maps are
tool for analyse spatial data, widely used, but no more than tool.
GIS system should deal with them, becouse it is neccesary to use
existing data, which are represented on maps, and present results to
user in understandable form of maps, but while processing data we should
take into account properties of actual phenomena, rather then properties
of chartographic representation like polygons.
Functional model
In f(GIS) we use term layer to denote computer representation of
spatial phenomena. We define layer as function which maps geographical
coordinates to value of some property. Closest analogue of our
layer is spatial variable in geostatistics.
Layer values can be either real numbers or elements of some finite sets.
If you want to study more complicated spatial phenomena, it is better
to describe it as set of layers rather then individual layer with
structured value. Obvoisly you'll not need values of all attributes in
question for all desired calculations, and separating them makes your
actions more clear.
Becouse layers are defined as functions it is theoretically possible to
apply well develped mathematical apparatus of functional analysis to
them.
Layer classification
Layers can be classified by their area of definition and their set of
values. By area of definition we can distinguish between:
- Two-dimensional layers
- which are defined on some contineous area.
It is most frequently used type of layers for physical geography.
Relief and soil type are perfect examples of such layers. Area of
definition of two-dimensional layers is usially finite, limited by
boundaries of study area or by availability of data. Areas which are
outside of area of definition are called offsite areas.
- One-dimensional layers
- are defined on set of lines within study area. Examples of such
layers are hydrography or railroad network.
- Zero-dimensional layers
- are defined on set of separate points. This layers can be used
for store information about sampling points or weather station
networks.
By the set of values layers can be classified to:
- Numeric layers
- whose values belong to some contineous interval on numeric axis,
for example relief layers, which have any value between lowest and
highest altitude in the study area.
- Classification layers
- which have finite set of values. f(GIS) allows to use arbitrary
strings as elements of such set. Soil map which has names of soil series
as values can be used as an example.
This simple classification covers all theoretically important types of
layers. Dealing with implementation we'll have to classify layers
further, for example, according to source of thematic data. But for
data analysis it is not significant whether data are stored in disk
file or come from some data asquition system on the fly. It is only
important to know type of values and whether they are defined for
any point of study area or not.
Implementation of data model
Spatial phenomena seldom can be expressed by some mathematical equation.
Even if they can, finding of this equation is usially aim of analysis,
not a starting point. So, we need to store values of layers in any
point they are defined. Raster is natural way to store data for
two-dimensional layers.
(Raster is just big matrix of numeric values, stored
in special format to reduce storage space. If raster is used in GIS
processing, it should be known, how to find row and column numbers given
real word coordinates and vice versa)
f(GIS) uses raster data format developed for EPPL7 GIS system. This
format have several advantages - it is compressed and allows random
access at the same time and it is able to deal with very fine
resolution. For example Landscape map of exUSSR with spatial resolution
(raster cell size) 500m and more than 3000 distinct kinds of landscapes
occupies about 9MB of disk space. Due to such properties of data
format, it is advisable to work with raster cell size significantly less
then known accuracy of data. Resolution of maps can be compatible with
resolution of your scanner and printer - modern processors are powerful
enough to bear it, so raster doesn't mean loss of precession.
This data format is able to hold values in range 0..65535. While it is
always sufficient for classification layers, it can look that for
numeric layers it is better to use real numbers. But data always have
finite accuracy, which is usially less than 1/65535 of total range,
and even if we can take measurements with larger precession, we should
take into account spatial variability within one raster cell.
For example, if we have map of relief of Russia with 500 meter cell,
we need to represent range from -28 (Caspian coast) to 5642 (Elbrus)
meters above sea level. Thus smallest usable unit is about 10 cm.
Some points' altitude may be measured with more accuracy (for example,
triangualtion points), but each raster cell represents 500x500 meters
square which always would have more than 10cm of variability.
Even if value of our layer should have more precession in some part
of its range, we could use non-linear (for instance logarithmic) mapping
of raster cell values to layer values.
But even with compression, raster files occupy significant storage
space. So, we should avoid duplication of them if possible. Thus we
introduce concept of reclass tables. Reclass table maps values
of raster cell to another set of integer in arbitrary order. Don't mix
reclass table with mapping function which is used for convert raster
cell values to real units of numeric layer. For example if we have
statistical data of populations by county and want to create population
them as map, we can use reclass table over county map. Several counties
with different names, which have distinct values in county map raster,
can be mapped to same class in population density map if their population
density is same.
Point layer is just list of triplets < X, Y, Value >. Typically
point layer doesn't contain more than few thousands of points, so there
is no need to optimize performance or storage space.
Natural storage form of one-dimensional layer is vector format.
It is most questionable area in current fGIS design. There are a lot of
advantages of EPPL7 vector format (compactness, speed of processing),
but it have only one drawback, which overcomes them all - it can
associate only one value with whole vector object (polyline). But
if we are talking about the function, defined on set of lines, whe
should be prepared that this function (stream depth for instance) would
vary from one end of line to other.
It is also a question how intersections and joints of lines should
be stored/interpreted, becouse most interesting network analysis
algorithmes require ability to cross joints and intersections.
Regions and chartographic projection
Study area usially have hierarchical structure. For example Russia
can be subdivided to administrative regions, which consists of
districts. United States consists of states, which are divided into
counties. Often study is concerned only with one of such hierarchy
levels, but there are opposite examples.
Each hierarchy level have its typical data accuracy (which is rough
representation of map scale in GIS world, becouse GIS maps can be
arbitrarily scaled, but only certain scale range make sense for
particular data accuracy), chartographic projection (especially
significant for large areas like whole country or continent).
On thematic maps like soils or vegetation, different classifications
can be used in different scales.
So, f(GIS) uses concept of regions. Region is set of layers,
which cover almost same territory, have exactly same projection and
simular spatial resolution. Regions can be nested, i.e. region of
Russia can have several subregions of administrative regions, which
have subregions of districts etc. In this case there should be base
layer
which have subregion names as values. When copiing data between regions
f(GIS) authomatically performs neccessary projection and resolution
conversion using base layer as reference. Classification conversion,
if neccessary, should be performed by user, becouse it requires
knowledge in problem area.
Program design
f(GIS) is designed as set of extensions to Tcl programming language
and set of independent utilities, which perform most time consuming
raster and vector processing tasks. Thus long operations can be launched
in background as separate while user continues to view/analyze data in
main program.
From users point of view, fGIS is Tcl application which allows him
to operate with set of layers from GUI as well as from Tcl command line.
It is essential design constraing that there should be no operation,
which can be performed from GUI, but couldn't be from Tcl script. There
should be way to automate everything. Other way around is enusred by
very nature of Tcl. Nothing prevent user, which have direct access to
Tcl interpreter from creating new button or menu item and binding any
Tcl command to it.
From programmers point of view, fGIS consists of several abstraction
levels, all available for extension and modification. And I think that
every fGIS user can eventually become programmer, if he discoveres need
to implement some, just invented, data analysis algorithm, or customize
graphical user interface to his needs. Relationship between fGIS
abstraction levels is shown on this figure.
Layer as Tcl object
Layers in fGIS behave like objects in object-oriented programming
language. Once created with layer command they become tcl
commands itself (i.e. name of layer can be used as Tcl command),
just like Tk widget. Options of layer command allow to manipulate
properties of layer and store layer definition to file. This file
is just Tcl script which creates neccessary subobjects and invokes
appropriate command to create layer.
Layer have following properties
- It can return value by coordinates
- It is why whole thing is about
- It can one or more ways to draw itself
- Raster layer can be drawn in opaque colors, so only offsite area is
transparent or using transparent monochrome patterns, thus allowing to
overlay one raster over another. In most existing raster GIS, like
Idrisi only vector or point layers can be overlayed over raster.
In f(GIS) any
layer can be drawn as overlay
There are three drawing modes for raster layer,
color, pattern and symbol.
- It has underlying data source
- Data source for layers typically consist of some object which can
return integer value given coordinate (raster file, combined with
reclass table, for example) and legend table or map
function which maps values of underlying raster object to
thematically meaningful values.
- It has visualization parameters
- visualization fo layer is controlled by several parameters such as
color palette, pattern set, flag, indicating if boundaries between
classes are drawn or not. All these parameters can be changed
interactively.
- It has metadata
- Metadata for layer typically include layer title, units in which
its values are managed, spatial resolution and value precession.
Chartographic projection is property of region rather than layer.
Besides layer types described above fGIS have
object layer type. This layer type can consist of any objects
allowed in Tcl canvas - lines, arcs, polygons, images with only one
thematic value for each object. This type is primarily for annotation
purposes, but also can be used as substitute for vector layers, while
later are not developed
Planchet - object for displaying maps
Another type of object which is essential for fGIS user is
planchet. It is Tk widget like canvas (and actially derived from
canvas) which has chartographic projection and real-world coordinates.
It is used for displaying layers and picking points on them. Becouse
it has real-world coordinates and physical size on the screen, it always
knows its scale. When scale is changed (via zoom or window resize operation),
all layers currently displayed on planchet are redrawn appropriately.
Planchet also have look feature. If right mouse button is pressed
on some point in planchet, it displays values of several layers in this
point in pop-up window.
There can be also "friend widgets" like status line which
display current coordinates if mouse is over planchet or zoom/unzoom
buttons which change its state depending of current state of planchet.
Drawing modes for raster layer
f(GIS) supports three drawing modes for raster layers - color, pattern
and symbol mode.
- In color mode,
- each value (or range of values, if
values are real number) of layer corresponds with particular color on
screen/paper. This is simplest drawing mode and it is supported by all
raster-oriented GIS.
- In pattern mode
- contineous areas of same class are filled
by black and white patterns, which is suitable for black and white
printers. But this mode allows much more - patterns can have any color
and background of pattern is transparent rather then white, so patterned
layer can be overlaid over other raster layers. Boundaries between
areas with different classes (polygons) can be highlighted in this mode as well as
in color mode.
- Symbol mode
- looks much like pattern mode and use same pattern
sets as it. But it handles patterns differently. In pattern mode,
patterns can be cut if polygon boundary crosses rectangle, representing
pattern element. In symbol mode pattern elemet is interpreted like icon,
which can be either drawn entirely, or not drawn at all. So visible area
of map is divided into rectangular grid of size of pattern element and
each cell of this grid is filled with pattern, apropriate for central
point of this cell.
Differences between these thre modes are shown on following figure:
Low level objects
There are additional objects like rasters, palettes and pattern sets.
But user seldom need to operate on them directly. They are primarily
for developers of new layer types.
GIS operation
GIS operation like calculationg buffer zones or computing new layer
from several existing are performed by separate utilities running in background. For user convinience
there are tcl procedures which take one or more layer names as arguments
and call appropriate utility.
Example of such procedure is interregion copy command, which tooks
layer name and name of target region, determines projections and calls
projection conversion program.
In some cases such procedures need to perform sufficient preprocessing
of user-supplied arguments
Utilities
GIS processing utilities are more general than fGIS. They use just
data files and user-supplied arguments. So they can be used separately
from fGIS, for example by users of EPPL7 GIS. Utilities are designed
for batch environment, so they use exit codes to report status and
stdin/stdout to recieve and return values which are not fit in command
line. Important concept of these utilities is that user shouldn't worry
about raster cell size. All utilites which operate on several raster
files are able to deal with files with different cell sizes as long
as there is non-empty intersection in terms of real-world coordinates.
Data access library
Both low-level Tcl objects (rasters, vectors) and utilites use common
C library to access data files. This library provides appropriately
high-level framework for those who want implement own data analysis
algorithmes. For example it includes iterator routines, which recieve
user-written function and open raster file and perform this function
on every cell of given file. While library operates primarily in terms
of raster cells (which can be important for cellular automata
algorithmes, which need to distinguish between ``this cell'' and
``neighbouring cell'') it provides ways to process files with different
cell sizes simulateously.