The only major commercial component of our specification is one RDBMS server license. RDBMS are priced according to a "number of concurrent users" rubric, and in our case I would say this number can be kept small. A license on the order of 8-16 users would probably meet our forseeable needs. A Sybase license of this size costs UC sites about $6K at this time.
Obviously whichever engine is chosen must run on one of the two standard Unix platforms chosen for the DEIMOS project: a SparcStation running Solaris or a Dec Alpha running OSF. Both Oracle and Sybase support these operating systems. (See below for recommended hardware configuration.)
I am not including any cost for commercial software development tools. Although there are many (expensive) GUI tools available for schema design and implementation, to date I have found them counterproductive rather than helpful for the experienced RDBMS hacker. I don't feel that any additional commercial software is required to complete the project. While there are certain costs associated with avoiding commercial software, my experience over the last decade has indicated that in many cases, the costs associated with adopting commercial software are far higher.
I therefore assume about $6000 as the cost of acquiring additional commercial software for the information management component. This assumes that we need to build a Sybase server for the Keck-II computer room, to support DEIMOS, and that the "offsite server" which offers the public archive, etc. is one of the existing UCO/Lick servers. If we assume that the "archival" server described in the text is not at UCO/Lick, then an additional server license is required for that machine, raising the total software acquisition cost to $12000.
The host which supports the database server should be located in the Keck-II computer room with the rest of the machines which directly support the observing process. The database server will be used to store (log) operational data during the night, as well as provide information for the observer or for the rest of the observing software. It should be considered an integral part of the observing software/hardware, and co-located with its peer machines which perform instrument, telescope, and dome control.
64MB of main memory is a good median configuration for a Sybase server. Data space is reconfigurable after server installation and startup, but we would want to start with a reasonable disk configuration (enough space for at least a year's worth of operation). If we wish to do volume mirroring, then we need to duplicate the partitions we choose to mirror, on another spindle. For example, one of my Sybase servers has a 500MB data partition on a 2GB drive; this partition is mirrored to a 500MB drive on the same machine. The rest of the data on the 2GB drive are either non-Sybase or non-mirrored. If we choose not to mirror any partitions, then only one spindle is really required (though performance improvements could be realized by using multiple smaller spindles).
A 2-server model also ensures a working backup of the data, and the 2nd server should preferably not be at the same site. For example, a Sybase server at UCO/Lick might offer the public portion of the data archive via WWW pages, getting fresh data daily from the private server on Mauna Kea.
The slitmask library and operational (logged) data represent only a modest problem of volume and accumulation. The image header data likewise do not represent a real challenge in terms of storage space. It is the images themselves (as discussed in Appendix D (9.10.D), not stored in the database, which pose the real problem of storage space and access time. The data actually stored in the RDBMS represent only a few hundred megabytes per year. Some maintenance and re-indexing may be needed to ensure rapid access to the data as the accumulated record grows, but these tasks can be at least partially automated.
Given that these tables are unlikely to exceed a mega-record in a couple of years, I don't see a call for sophisticated multi- processor architecture or other expensive high-performance CPU power. SCSI-II disk speed would help to improve response time, but otherwise no state-of-the-art or specialized hardware is required for this fairly basic application.
If we were to "make do and mend" by using the existing Sybase license on Mauna Kea, building a server out of miscellaneous used parts, etc., we could probably reduce this cost to no more than the price of the disk drive; however, we'd have to examine what other functions were required of the existing Sybase server and whether those requirements conflicted with the restrictions recommended above. It might not be practical to relocate the "Remedy" server, or other considerations might prohibit this "penny pinching" strategy.
An economical suggestion:
It is possible that functions could be combined so that the database
server host was also the designated host for some other low-level
integral DEIMOS function. This function would have to be non-login,
and involve no unpredictable and/or sudden load changes or interruptions
of uptime. It should also not consume so much memory as to compete
heavily with the database engine and drive the host into swapping.
If such functions can be identified, sharing the hardware would be
practical and even desirable.
A more luxurious suggestion:
It's been suggested (D. Koo) that two hosts should be constructed,
one which is primarily a database server platform and can in case
of emergency serve some other basic machine control function; the
other is primarily a machine control or other low-level service
host, but can function as a database server. This would provide
a rapid recovery path should either host suffer hardware failure;
however, it involves doubled hardware costs and some maintenance
overhead.
If we assume that the "public" data server is not at UCO/Lick, then we might have to build an additional server for the public archive. In that case it might be wise to construct a twin of the DEIMOS database engine, for an additional $14000. However, given the existence and present underutilization of the Lick science database server, it seems reasonable to assume that for some fairly lengthy initial period the public archive could be managed and served from that machine. The additional cost then would be $2000 or so for additional disk space for the Lick server, rather than $14000 for an entire system.
If we assume that a sizable archive of acquired images is to be offered to the public, there are hardware costs associated with the jukebox system needed to manage the extensive CDROM library (see Appendix D (9.10.D)). The approximate cost today of a 500-disc jukebox is on the order of $15000. However, the image library will not achieve this size immediately, and smaller jukeboxes in the $6000 range could probably be used initially.
The specifications call for
The local (primary) database server is integral to the observing process; observing can proceed in its absence, but quick-look reduction might be less automated or slower than with the database functioning properly. Loss of engineering information for one or more nights could impede our efforts to diagnose or even describe instrument problems.
The maintenance and management of this host and the database software are therefore critical to the perceived success of the instrument. The obvious question that arises is, "Who (which institution, Lick/CARA/Keck) will provide the qualified staff to handle this?"
The secondary DB server and WWW server, though having no impact at all on the observing process, need competent management and maintenance to preserve continuous access to the published body of data. Unavailability or corrupted data will make a bad impression, impairing the public image of Lick and Keck. If astronomers come to rely on the availability of these data for their research, then research efforts might also be obstructed or delayed if these machines and software engines are not kept in good shape. Some institution must commit to doing so.
The current design calls for many meta-data points which are not accessible via telemetry. We must discover whether any of these are logged to (e.g.) Keck's existing "Remedy" system, from which they could be automatically incorporated into the DEIMOS logs. If not, we must consider the staff time and degree of cooperation needed if Keck personnel are to enter reliable, accurate data. Typical of this class of meta-data are mirror re-aluminizing dates/times, optical alignment date/times, etc.
If Keck personnel are assigned to support the logging system by key entry of critical maintenance and repair events, then Keck (the institution) may well feel that this system should serve interests more general than "just DEIMOS". In that case, the logging/monitoring component of the DEIMOS software might be running every night, not just when DEIMOS was in use, and other instrument systems might want to retrieve/store data using the server. Issues of access control, security, and authority immediately arise: whose software, under what circumstances, is to have which kinds of access to which tables when?
In general, information services flourish best when a single individual is personally responsible for the server. (A deputy is of course required during that person's absences). Joint management is rarely successful, unless the joint managers work exceptionally well together and/or share an office (or are otherwise conveniently accessible to one another). Joint management usually results, over time, in inconsistent interpretations of policy and other forms of "tripping over each other" which can have serious impacts on the service provided.
Close cooperation is also required between the systems (OS) management personnel of these 3 machines and the info services managers/maintainers.
There is some implication beyond the design of software to permit the observer appropriate choices for publication of data; someone, probably the manager of the information service, is responsible for maintaining and periodically testing access control. Observers should feel confident that unauthorized access is prohibited, and that shareable central data archives are not a threat to their research programs. Security and access control require on-going responsible management.
In summary of these points: One identifiable, well-qualified individual should be responsible for each of the information service "nexi" posited in the project requirements. One person could conceivably manage all three (2 DBs and a WWW). However, three people should not manage one nexus. There should be very close cooperation between these managers and the system management of each machine, if indeed they are not the same person. Disorganization or incompetence in this matter could be expensive and/or very visible.
An inherent contradiction lurks in this support plan vs. the implementation schedule for the science data archive. Most data taken with DEIMOS will be considered private by the observers until some period, probably about 2 years, has elapsed. Not until that time will the publishing of DEIMOS science data begin; in other words, the moment when the data archive finally starts to become valuable and visible is about the same moment when Lick support for the instrument has faded away and the info management system is at most risk of being discontinued.
If there is any commitment to the archiving and public offering of data taken with DEIMOS, then that commitment must include some staff time allocated to handling, labelling and transport of CDROM media, and librarian/operator functions for the large jukebox system, as well as the system/database management functions described above. Some agreement should be worked out between the institutions involved, as to who will meet this minor but on-going cost.
de@ucolick.org