DEIMOS Information Management -- Section 9.5 Design

9.5: Preliminary Design Notes

$Id: design.html,v 1.7 1996/03/13 22:24:52 de Exp de $

Three software components are needed to implement the specifications outlined in Section 9.4:

Database Management Software
Data Collection Software
User Interface (Query/Analysis/Entry) Software

This section describes a "paper model" design using methods and tools well-known to us, in which we have confidence. It may be that by the time DEIMOS approaches commissioning, we will have decided that some specific choices of software or details of strategy will be different from what we know today. However, any specific approach will still be divided into the three basic components above.

Database Management Software

In order to manage, and offer open access to, the volume and variety of data described in the specifications, something more powerful than ASCII files and shell scripts (or C programs) is indicated. The most stable and well-understood currently available technology is the relational database management system (RDBMS) using some superset of ANSI SQL for data retrieval.

Please consult the Glossary for explication of any specialized terms and abbreviations used in this document.

Commercial Software

Only a small handful of vendors offer full-featured RDBMS: Oracle, Sybase, and Ingres (now owned by Computer Associates) are the traditional major players. Other vendors probably have too little market share to be a safe investment. Sybase is widely accepted in scientific research institutions as the database engine of choice for mid-size applications, whereas Oracle has a strong presence in financial and administrative, large-scale systems. Ingres has less market share than the other two of the Big Three.

Sybase has an attractive contractual agreement with the UC system, which makes it possible for us to acquire Sybase software at only 20% of list price. The Sybase product is also technically more advanced than the Oracle engine (the Oracle RDBMS suffers from a few basic design/implementation flaws, such as its inability to use multiple indices properly in query plan optimization).

Lick already has three years' experience with Sybase servers used both for science and administration. The existing body of expertise, tools, etc. makes the ramp-up cost of development using Sybase lower than the cost of a different product.

For the purposes of this specification, therefore, I will assume that Sybase is a good choice; but we should bear in mind that a schema designed according to good relational principles can be implemented using any RDBMS engine, with greater or lesser difficulty depending on the degree to which that manufacturer's features happen to suit the application. We do have some freedom of choice here.

Free Software

There is a free RDBMS, called Postgres-95, which perhaps should be considered as an alternative to the commercial products. I will install and test this engine, but I have a couple of reservations about it. It does not at this time support the use of raw devices for data storage, but only filesystem files. In the case of "live" (frequently updated) tables this can expose the application to the risk of lost data due to system crashes (if no sync is done to flush pending changes to the filesystem). It does not support volume mirroring, atomic transaction commit and rollback, or complete transaction logging. All these are standard features of Sybase (and similar features are provided by other vendors) which ensure data integrity.

Given the extremely dynamic nature of many of the tables in our proposed schema, I would advise the use of an RDBMS with good error recovery and volume mirroring features. Although this application can't be equated to a true OLTP (online transaction processing) app, it is lively enough that loss of a few log entries could reduce our ability to make effective use of the surviving data. Unless Postgres provides good transaction logging and other recovery features, I would have to advise against using it for critical dynamic data. It would be more applicable in that case for static "library" data.

Postgres-95 is also somewhat lacking in access control sophistication, which (given our emphasis on control over data access and publication dates) might be a disqualifying weakness.

See below for comments on the applicability of object-oriented databases/languages/tools to the DEIMOS project.

Data Collection Software

The data collection software is in some ways the simplest element of the information management system. Our design assumption is that all telescope and instrument control processes are written so as to provide information in response to keyword requests using standard network protocols (UDP, TCP/IP, etc.).

We might therefore envision the data collection process as a very simple-minded daemon which loops endlessly through a series of conversations with other processes, in which it elicits status information and then sends that information to the database server for storage. However, in some cases it may be inappropriate for this kind of polling to take place. Some of the other software components will send out broadcast or multicast messages to their peers when a configuration change, status change, or alarm takes place. The collection process need not poll in this case, but can simply "listen" for these events and log them.

We are assuming that all information-gathering subsystems at Keck-II, such as weather stations and mirror control systems, respond to a keyword query interface or broadcast information as described above.

All collected data should be timestamped, and sufficient "sanity" checks should be implemented (by the Sybase "trigger" mechanism, for example) to ensure a very low probability that corrupted data will make its way into the permanent record. Since database engines provide for data update and deletion, some restricted-access, highly-privileged server account can be used when necessary to repair any damaged data.

User Interface Software

Several types of user interface are implied by our specifications:

Data Entry interface (for maintenance/event entries)
Observer interface (for query and entry during the run)
Public interface (for general access to the data archive)

The data entry interface should be a forms GUI, easy to use and easy to modify and customize. It should be capable of generating simple reports and reference lists for the benefit of telescope maintenance and support personnel. Software of this description is already developed and in daily use at UCO/Lick (see Section 9.6)). A sample data entry form from this existing system appears in Figure 9.2.11

The observer interface should be less forms-oriented, although a couple of friendly "forms" should exist for the observer's logbook and for annotations. This interface should be more "query oriented," offering a quick menu of commonly-requested information as well as a more expert mode in which the user creates and submits queries of arbitrary complexity. Retrieved data should be disposable by the observer in various ways: at least as disk files, email messages, and plots. Some features of this kind are already implemented in the database GUI at Lick, and others could be added without great difficulty. A crude sketch of such an "observer information interface" appears in Figure 9.2.13.

At this time the most flexible, popular, and successful interface to databases for public query is the WWW. We have had success using WWW query pages as a front end to various "databased" information, from the campus phone book to standard star catalogs. We feel that these tools and methods will evolve and will continue to be the correct approach when we come to offer public DEIMOS data to the Net. Once again, a working system of this kind is already in place at Lick (and many other sites). (A sample Web page from the star catalog interface appears as Figure 9.2.12)

The correct approach to WWW publishing of database info at DEIMOS commissioning time may be any combination of vendor-supplied database-to-WWW interfaces, generally accessible free software, or software written here. It's impossible to predict today what the best specific strategy will be two years from now. However, the continued existence of the World Wide Web seems to be a very safe bet.

Object-Oriented Design?

During the last few years, various players in the software industry have been heavily promoting "Object Oriented" languages, interfaces, databases, etc. We should note here that "object oriented" refers to an application design philosophy and style, and not to any startling new set of actual features or resources. Object oriented design can speed code development under certain circumstances (particularly where there is need for repetitive coding of fairly similar modules).

However, bad code can be still be produced using object-oriented tools and languages; as always, good design and coding practise and good project management are the deciding factors in the quality of software produced. We consider that the decision to use or not use object-oriented design methods for this project is a procedural or management decision, affecting the entire project. The basic specifications and general design are not changed; only the specifics of implementation would be altered by such a decision.

We are looking into the advantages and costs of the object-oriented approach. It is likely that by a year from now, most major database vendors will have introduced a basic set of object-oriented extensions into their implementations of SQL. Gnu C++ is an object-oriented C compiler; object-oriented extensions are available for the Tcl/Tk language. If we determine that cost and time savings can be achieved by adopting the "OO" approach, then there should be no difficulty acquiring the tools to do so.

We should bear in mind that there are certain costs or risks involved in the OO approach, partially offsetting its benefits. The cost of retraining programming staff to a novel and (initially) confusing design methodology is nontrivial; this could result in a delay in the project schedule. We perceive also a lack of accepted standards in this area; a commitment to one particular toolset might prove unfortunate if a competing toolset became the standard. OO design also seems to require a heavier investment in documentation than traditional methods, if the code is to be maintainable over the long term (because the code is more opaque and deeply "layered" than conventionally-designed code); since documentation is often the last and most neglected task in major software projects with harsh deadlines, this factor should not be overlooked. There might be implications for the use of existing (non-OO) source code; we might not be able to incorporate existing code without significant re-writing (which would lessen the advantage of using it). There is a fairly large degree of uncertainty about the real benefits of following the new trend at this particular juncture.

In summary, while we can't afford to ignore this new technology, we can't afford to accept it unquestioningly as the panacea which some industry sources would like to claim it is. In this design document I have followed a very conservative path, proposing nothing that has not already been understood, tested, used, and found adequate by me personally. I therefore can say with confidence that the specification goals can be achieved as described here; I cannot say that they might not be achieved faster, or in some way better, by more novel approaches. I suggest that we attempt to find some case histories of similarly-sized projects in which a decision was made to switch to OO software design at the project outset, to determine whether the results were advantageous or disadvantageous for the project overall.

de@ucolick.org
webmaster@ucolick.org
De Clarke
UCO/Lick Observatory
University of California