Note: We are using the term "meta-data" in the context of the science data, not of the database internals. In this preliminary specifications document, "meta-data" will be used to mean all environmental, inventorial, historical, etc. data relevant to the science and calibration images acquired using the instrument. It will include all those database constructs more restrictively defined by database designers as "meta-data," i.e. data dictionary, access control, and similar tables.
Physical Components
Certain components of the instrument have innate or relatively static
characteristics which affect the performance of the instrument as
a whole. These static, physical components should be catalogged and
described in sufficient detail to explain or analyze their impact
on the quality of acquired data.
Operation of the Instrument
Logging of environmental and operating conditions throughout
each night is essential, whether we are engineers seeking reasons
for vagaries of instrument and telescope performance or astronomers
seeking to understand our acquired data. A consistent and complete log
of such information should be maintained indefinitely and
cross-referenced to trouble logs as well as to phenomena and
artifacts in observed data.
These operational data are more dynamic than catalogs of components and their characteristics. The operation log should record all rapidly-changing status information during the night or run; this includes but may not be limited to:
Acquired Data
An historical record of data acquired at the telescope, together with
all meta-data which could affect the quality or interpretation of
the acquired images, can be of enormous value to the original observer
and to later researchers. The meta-data should be public immediately
and always, but the acquired data (and slit mask definitions for that
matter) may be considered confidential for some period of months or
years determined by the initial conditions of the construction grant.
Therefore some mechanism for control over publication date must be
provided. These data include but are not limited to:
To manage large sets of data when rapid access and complex ad hoc analytical queries are required, a relational database management system is the best choice of software tool. The DEIMOS software suite should include a standard RDBMS running on a standard Unix platform. (see Software and Hardware Requirements for more detail).
As little as possible proprietary and commercial software should be used in the information management component of the DEIMOS software suite. There should be minimal dependence on vendor response, licensing mechanisms, etc; as much source code as possible should be visible to users and maintainers of the DEIMOS instrument. A standard non-commercial language set such as gcc, g++, and Tcl should be used to develop all modules of the information management component.
Due to the unavailability of reliable and robust free RDBMS, the RDBMS is the only software element in the information management component which should be acquired from commercial sources. Either Sybase or Oracle would be adequate, and there are some reasons for preferring Sybase.
As little data as possible should be acquired by manual key entry on the part of observers or observatory staff. As much data as possible should therefore be acquired automatically by interprocess communications, telemetry, etc. The Data Element List is a non-exhaustive list of useful information and likely sources.
However, the observer may wish to annotate data or maintain an observing log during the night, so manual annotation and some manual data entry should be permitted, and friendly tools should be provided for these functions.
The database server, whether during collection or retrieval, should be integrated seamlessly with the data taking and quick-look software, rather than appearing as a separate user interface, tool, or environment.
Some types of data define and document individual mechanical, optical, and electronic components of the instrument, such as mirros, filters, lamps, gratings, slit masks and CCD detectors. Physical instrument components should be fully documented by key entry at their point of manufacture or verification, and this information should be copied to the archive on the Mountain for ready reference during any observing run; maintenance and adjustment of optical/mechanical components should be logged to the database by key entry.
As much data as observers find useful should be exportable quickly and easily to the standard FITS-table format, so that the observer can take these tables home with the acquired data.
The observer should also be able to extract and save the results of any arbitrary query against the public portion of the database and his/her own data. Certain standard queries should be offered in the form of plots and charts. A friendly tool for making ad hoc analytical forays into the data during the observing run should be provided.
All non-confidential archived data should be offered for public interest via a simple WWW interface, using a backend server different from the one used on the Mountain to collect and initially store the meta-data. The WWW interface should offer basic analytical query support including field selection, record selection by the standard set of Boolean operators, and statistical functions. The non-confidential data would include libraries of standard instrument calibration data.
An alternate query interface using a lowest-common-denominator (SMTP) e-mail protocol should be supported as a fallback and as an access path for those without WWW clients. A query mailed to a "mail robot" would return an email reply containing the requested data.
The private and public servers should both offer a data dictionary defining every FITS keyword used in tables produced by DEIMOS software.
The RDBMS (see above) approach can work well, in combination with large-capacity jukebox technology, to offer an archive of acquired data. The DEIMOS software should provide the option of permanently archiving acquired data, or selected images from the night, to a public or semi-public library. The observer should have control over which images are offered publicly, and on what date they become public. It may be easiest to accomplish this if the standard medium for taking data away from the Mountain is a high density CDROM (in this case the archive version is simply another CDROM of the same format). The archiving process should be integrated with the routine Observatory backup strategy for acquired data, if possible.
The database information should be usable by an automated data reduction pipeline to produce 'first-cut reduced' images to be archived along with the raw data. First-cut extracted spectra could also be archived with the raw data (? or produced on request by the same pipeline running on the archive server).
The database server used to capture meta-data during the night (and to offer less dynamic meta-data during the night) should reside on the Mountain, on the same local network with the computers used to control the telescope and instrument. Failure of the network connection to this server could result in loss of historical record and could also have negative impacts on the observing process, making it less automated and therefore slower.
The secondary server used to offer public copies of data from the Mountain should be located at a remote site and backed up frequently to stable media.
Data transfer between the two servers should be fully automated, with notification to responsible parties whenever transfer fails.
No outside connections should be permitted to the primary database server. The Mountain top should be firewalled to protect the integrity of the telescope and instrument control systems as well as the archive of operational and acquired data.
de@ucolick.org