f(GIS) concepts

Data model
Program design

Data model

GIS is a software system for processing spatial data. So, adequate model of spatial phenomena is most important thing for GIS.

It should provide way to represent spatial phenomena in computer memory, allow to perform desired operation on this representation and let user see the results in form, he used to. Ideally, GIS system should hide complicated issues of internal data storage from user as well as text processor hides questions of font rendering or kerning or SQL database hides actual file layout and search technologies, providing simple, but powerful relational operations instead.

Many modern GIS systems, especially vector based, like ARC/Info, try to represent map of spatial phenomena rather than spatial phenomena itself. It leads to overcomplication of storage format and processing algorithms, and makes user worry about such technical things as polygon topology, which are completely irrelevant to his problem (say geology or soil science), as font rendering hints and kerning is irrelevant to contents of article, typesetted with some partcular font. Maps are tool for analyse spatial data, widely used, but no more than tool. GIS system should deal with them, becouse it is neccesary to use existing data, which are represented on maps, and present results to user in understandable form of maps, but while processing data we should take into account properties of actual phenomena, rather then properties of chartographic representation like polygons.

Functional model

In f(GIS) we use term layer to denote computer representation of spatial phenomena. We define layer as function which maps geographical coordinates to value of some property. Closest analogue of our layer is spatial variable in geostatistics.

Layer values can be either real numbers or elements of some finite sets. If you want to study more complicated spatial phenomena, it is better to describe it as set of layers rather then individual layer with structured value. Obvoisly you'll not need values of all attributes in question for all desired calculations, and separating them makes your actions more clear.

Becouse layers are defined as functions it is theoretically possible to apply well develped mathematical apparatus of functional analysis to them.

Layer classification

Layers can be classified by their area of definition and their set of values. By area of definition we can distinguish between:

Two-dimensional layers: which are defined on some contineous area. It is most frequently used type of layers for physical geography. Relief and soil type are perfect examples of such layers. Area of definition of two-dimensional layers is usially finite, limited by boundaries of study area or by availability of data. Areas which are outside of area of definition are called offsite areas.
One-dimensional layers: are defined on set of lines within study area. Examples of such layers are hydrography or railroad network.
Zero-dimensional layers: are defined on set of separate points. This layers can be used for store information about sampling points or weather station networks.

By the set of values layers can be classified to:

Numeric layers: whose values belong to some contineous interval on numeric axis, for example relief layers, which have any value between lowest and highest altitude in the study area.
Classification layers: which have finite set of values. f(GIS) allows to use arbitrary strings as elements of such set. Soil map which has names of soil series as values can be used as an example.

This simple classification covers all theoretically important types of layers. Dealing with implementation we'll have to classify layers further, for example, according to source of thematic data. But for data analysis it is not significant whether data are stored in disk file or come from some data asquition system on the fly. It is only important to know type of values and whether they are defined for any point of study area or not.

Implementation of data model

Spatial phenomena seldom can be expressed by some mathematical equation. Even if they can, finding of this equation is usially aim of analysis, not a starting point. So, we need to store values of layers in any point they are defined. Raster is natural way to store data for two-dimensional layers.

(Raster is just big matrix of numeric values, stored in special format to reduce storage space. If raster is used in GIS processing, it should be known, how to find row and column numbers given real word coordinates and vice versa)

f(GIS) uses raster data format developed for EPPL7 GIS system. This format have several advantages - it is compressed and allows random access at the same time and it is able to deal with very fine resolution. For example Landscape map of exUSSR with spatial resolution (raster cell size) 500m and more than 3000 distinct kinds of landscapes occupies about 9MB of disk space. Due to such properties of data format, it is advisable to work with raster cell size significantly less then known accuracy of data. Resolution of maps can be compatible with resolution of your scanner and printer - modern processors are powerful enough to bear it, so raster doesn't mean loss of precession.

This data format is able to hold values in range 0..65535. While it is always sufficient for classification layers, it can look that for numeric layers it is better to use real numbers. But data always have finite accuracy, which is usially less than 1/65535 of total range, and even if we can take measurements with larger precession, we should take into account spatial variability within one raster cell.

For example, if we have map of relief of Russia with 500 meter cell, we need to represent range from -28 (Caspian coast) to 5642 (Elbrus) meters above sea level. Thus smallest usable unit is about 10 cm. Some points' altitude may be measured with more accuracy (for example, triangualtion points), but each raster cell represents 500x500 meters square which always would have more than 10cm of variability. Even if value of our layer should have more precession in some part of its range, we could use non-linear (for instance logarithmic) mapping of raster cell values to layer values.

But even with compression, raster files occupy significant storage space. So, we should avoid duplication of them if possible. Thus we introduce concept of reclass tables. Reclass table maps values of raster cell to another set of integer in arbitrary order. Don't mix reclass table with mapping function which is used for convert raster cell values to real units of numeric layer. For example if we have statistical data of populations by county and want to create population them as map, we can use reclass table over county map. Several counties with different names, which have distinct values in county map raster, can be mapped to same class in population density map if their population density is same.

Point layer is just list of triplets < X, Y, Value >. Typically point layer doesn't contain more than few thousands of points, so there is no need to optimize performance or storage space.

Natural storage form of one-dimensional layer is vector format. It is most questionable area in current fGIS design. There are a lot of advantages of EPPL7 vector format (compactness, speed of processing), but it have only one drawback, which overcomes them all - it can associate only one value with whole vector object (polyline). But if we are talking about the function, defined on set of lines, whe should be prepared that this function (stream depth for instance) would vary from one end of line to other.

It is also a question how intersections and joints of lines should be stored/interpreted, becouse most interesting network analysis algorithmes require ability to cross joints and intersections.

Regions and chartographic projection

Study area usially have hierarchical structure. For example Russia can be subdivided to administrative regions, which consists of districts. United States consists of states, which are divided into counties. Often study is concerned only with one of such hierarchy levels, but there are opposite examples.

Each hierarchy level have its typical data accuracy (which is rough representation of map scale in GIS world, becouse GIS maps can be arbitrarily scaled, but only certain scale range make sense for particular data accuracy), chartographic projection (especially significant for large areas like whole country or continent). On thematic maps like soils or vegetation, different classifications can be used in different scales.

So, f(GIS) uses concept of regions. Region is set of layers, which cover almost same territory, have exactly same projection and simular spatial resolution. Regions can be nested, i.e. region of Russia can have several subregions of administrative regions, which have subregions of districts etc. In this case there should be base layer which have subregion names as values. When copiing data between regions f(GIS) authomatically performs neccessary projection and resolution conversion using base layer as reference. Classification conversion, if neccessary, should be performed by user, becouse it requires knowledge in problem area.

Program design

f(GIS) is designed as set of extensions to Tcl programming language and set of independent utilities, which perform most time consuming raster and vector processing tasks. Thus long operations can be launched in background as separate while user continues to view/analyze data in main program.

From users point of view, fGIS is Tcl application which allows him to operate with set of layers from GUI as well as from Tcl command line. It is essential design constraing that there should be no operation, which can be performed from GUI, but couldn't be from Tcl script. There should be way to automate everything. Other way around is enusred by very nature of Tcl. Nothing prevent user, which have direct access to Tcl interpreter from creating new button or menu item and binding any Tcl command to it.

From programmers point of view, fGIS consists of several abstraction levels, all available for extension and modification. And I think that every fGIS user can eventually become programmer, if he discoveres need to implement some, just invented, data analysis algorithm, or customize graphical user interface to his needs. Relationship between fGIS abstraction levels is shown on this figure.

Layer as Tcl object

Layers in fGIS behave like objects in object-oriented programming language. Once created with layer command they become tcl commands itself (i.e. name of layer can be used as Tcl command), just like Tk widget. Options of layer command allow to manipulate properties of layer and store layer definition to file. This file is just Tcl script which creates neccessary subobjects and invokes appropriate command to create layer.

Layer have following properties

It can return value by coordinates: It is why whole thing is about
It can one or more ways to draw itself: Raster layer can be drawn in opaque colors, so only offsite area is transparent or using transparent monochrome patterns, thus allowing to overlay one raster over another. In most existing raster GIS, like Idrisi only vector or point layers can be overlayed over raster.
In f(GIS) any layer can be drawn as overlay There are three drawing modes for raster layer, color, pattern and symbol.
It has underlying data source: Data source for layers typically consist of some object which can return integer value given coordinate (raster file, combined with reclass table, for example) and legend table or map function which maps values of underlying raster object to thematically meaningful values.
It has visualization parameters: visualization fo layer is controlled by several parameters such as color palette, pattern set, flag, indicating if boundaries between classes are drawn or not. All these parameters can be changed interactively.
It has metadata: Metadata for layer typically include layer title, units in which its values are managed, spatial resolution and value precession. Chartographic projection is property of region rather than layer.

Besides layer types described above fGIS have object layer type. This layer type can consist of any objects allowed in Tcl canvas - lines, arcs, polygons, images with only one thematic value for each object. This type is primarily for annotation purposes, but also can be used as substitute for vector layers, while later are not developed

Planchet - object for displaying maps

Another type of object which is essential for fGIS user is planchet. It is Tk widget like canvas (and actially derived from canvas) which has chartographic projection and real-world coordinates. It is used for displaying layers and picking points on them. Becouse it has real-world coordinates and physical size on the screen, it always knows its scale. When scale is changed (via zoom or window resize operation), all layers currently displayed on planchet are redrawn appropriately.

Planchet also have look feature. If right mouse button is pressed on some point in planchet, it displays values of several layers in this point in pop-up window.

There can be also "friend widgets" like status line which display current coordinates if mouse is over planchet or zoom/unzoom buttons which change its state depending of current state of planchet.

Drawing modes for raster layer

f(GIS) supports three drawing modes for raster layers - color, pattern and symbol mode.

In color mode,: each value (or range of values, if values are real number) of layer corresponds with particular color on screen/paper. This is simplest drawing mode and it is supported by all raster-oriented GIS.
In pattern mode: contineous areas of same class are filled by black and white patterns, which is suitable for black and white printers. But this mode allows much more - patterns can have any color and background of pattern is transparent rather then white, so patterned layer can be overlaid over other raster layers. Boundaries between areas with different classes (polygons) can be highlighted in this mode as well as in color mode.
Symbol mode: looks much like pattern mode and use same pattern sets as it. But it handles patterns differently. In pattern mode, patterns can be cut if polygon boundary crosses rectangle, representing pattern element. In symbol mode pattern elemet is interpreted like icon, which can be either drawn entirely, or not drawn at all. So visible area of map is divided into rectangular grid of size of pattern element and each cell of this grid is filled with pattern, apropriate for central point of this cell.

Differences between these thre modes are shown on following figure:

Low level objects

There are additional objects like rasters, palettes and pattern sets. But user seldom need to operate on them directly. They are primarily for developers of new layer types.

GIS operation

GIS operation like calculationg buffer zones or computing new layer from several existing are performed by separate utilities running in background. For user convinience there are tcl procedures which take one or more layer names as arguments and call appropriate utility.

Example of such procedure is interregion copy command, which tooks layer name and name of target region, determines projections and calls projection conversion program.

In some cases such procedures need to perform sufficient preprocessing of user-supplied arguments

Utilities

GIS processing utilities are more general than fGIS. They use just data files and user-supplied arguments. So they can be used separately from fGIS, for example by users of EPPL7 GIS. Utilities are designed for batch environment, so they use exit codes to report status and stdin/stdout to recieve and return values which are not fit in command line. Important concept of these utilities is that user shouldn't worry about raster cell size. All utilites which operate on several raster files are able to deal with files with different cell sizes as long as there is non-empty intersection in terms of real-world coordinates.

Data access library

Both low-level Tcl objects (rasters, vectors) and utilites use common C library to access data files. This library provides appropriately high-level framework for those who want implement own data analysis algorithmes. For example it includes iterator routines, which recieve user-written function and open raster file and perform this function on every cell of given file. While library operates primarily in terms of raster cells (which can be important for cellular automata algorithmes, which need to distinguish between ``this cell'' and ``neighbouring cell'') it provides ways to process files with different cell sizes simulateously.