Observers will want to take DEIMOS images home with them for reduction. The size of these images poses a serious challenge to current removable storage media technology. We are currently leaning towards SCA removable hot-swap SCSI drives as the best compromise between cost, capacity, speed, convenience, and robustness. A detailed comparison of several currently available storage technologies follows.
DEIMOS images will run about 140MB for spectra, 70MB for direct frames. Using lossless fcompress, we may get a factor of 2 to 3 compression; it may be overoptimistic to assume anything much better than 2.
These are the largest images being produced by any Keck instrument, as well as the most complicated FITS files we have yet written.
Ever since the project's inception we have been worried about the "media problem." On what media are we going to write these images for backup/archival storage? And how is the observer going to bring the data back to the home institution for reduction?
At both reviews (PDR and CDR) we expressed an optimism that has not been substantiated in reality. We thought that new developments in optical or magneto-optical storage, using CDROM sized media, would "catch up with" our needs by the time commissioning drew near. While there are exciting things going on in this sector, they are not happening as fast as we would have liked. We're left, realistically, with a set of choices each of which is unsatisfactory.
These are some trade-offs you may want to keep in mind when considering current storage technologies. It is impossible (today) to "win" on all these counts. Most media will only look good on two or three points out of this list.
Media Cost | (and are those media re-usable?) |
Reader Cost | (cost to observer of drive to read media at home institution) |
Writer Cost | (cost to project or to CARA to write media in Waimea) |
Write Speed | (how long to copy a run's data to media?) |
Read Speed/ Access | (random access for disc type media, linear for tape; how long to access an image?) |
Media Density | (number of media needed to hold 1 run's worth of FITS image files) |
Media Longevity | (how long will they last physically? how long will the technology last?) |
Media Convenience | (size/weight/fragility) |
Media Openness | (how proprietary is the medium/format?) |
There are two problems which we may want to address separately. One is the archiving or backup of acquired data at Keck, the process that is currently achieved using Exabyte tapes and STB software. DEIMOS images pose a challenge to this system, but this document is not primarily concerned with this side of the problem. What we are considering here is the secure transport of observed data to the observer's home institution. How does the astronomer take these very large DEIMOS datasets back home?
For conservative estimating, we'll say that an image is 140 MB, and that fcompress could reduce this to 70 MB. We could fit 100 images in 7GB of storage. If we assume a night is about 150 images -- counting all calibration exposures as well as science data -- then we could say a night is about 10GB.
The average run is probably 2 or 3 nights. We guess the astronomer will want to take home something like 20 to 30 GB of data from the run. Again to be conservative, we might call "one run" 30 GB.
Thirty gigabytes is not going to be such a depressingly large number for very much longer; there are a couple of very high-density storage technologies in development which may be of great interest to us. One (C3D) will not get to beta test for another six months or more. It is well behind the original schedule mentioned in press releases 2 years ago. Another effort (electron beam writing) seems further along in the prototyping process, but will not be commercially available until 2002. Both of these technologies promise over 100GB on a CDROM-sized plastic medium. C3D promises to be low-cost and 140GB per medium. The electron beam technology will probably be expensive (it's aimed at large governmental and corporate customers, not the consumer market) but it claims 200 GB per medium and a very stable, multi-decade, archival end product.
For our immediate needs, both these technologies are out of the question. Their only relevance is the ominous likelihood that whatever medium we select now will become obsolete and undesirable sometime in the next 2 years, because one of these technologies or a new competitor will be offering a large density improvement. In other words, we are likely to want to throw away whatever we are doing now, after only a couple of years of use. The impermanence of this situation colours our choice of technology.
We are looking, therefore, for a bandaid -- a bandaid of moderate cost, yet reasonable efficacy. If it's too expensive, we'll regret it later when the new technologies mature. But if it's too feeble, we'll waste thousands of dollars compensating for its inadequacy while we wait for something better to come along.
One potential transport mechanism that we haven't been considering very seriously is "Internet2". We might ask whether it is possible to upload the observed data via TCP/IP to the observer's home site.
The theoretical max bandwidth we could ever get, as of today, is 35 Mb/s (the size of the pipe between Oahu and Hawai'i). But all the astronomy traffic from Big Island is on that link, so we are not going to get it all to ourselves! Let's take an optimistic scenario. Suppose we have 10 GB to move each day, and we get something close to ethernet spec: 10Mb/s (1MB/sec). At that rate, it would take 10,000 sec to transport the images, or 160 minutes. (We'd be hogging a greedy share of the Oahu link for more than 2 hours, but for the moment let's ignore any potential political issues.)
That is not very competitive with most removable media storage alternatives. But there's worse news: we have every reason to doubt that in practise we could get that kind of transfer rate. Why?
Without special TCP/IP driver tuning (i.e. on any stock workstation) we will never even get close to the ethernet spec of 10Mb/sec over such a long round trip. The propagation delay is about 100 ms from anywhere in the contiguous 48 states to the Big Island. This delay combines with the default 8K window size for TCP/IP to limit the actual transfer rate (to any unmodified workstation) to about 10 windows per second, or 80K/sec. To move 1 night's worth (10GB) of data at this rate would take over 1.5 days.
Obviously, if it takes 1.5 days (under ideal conditions) to move 1 night of observing data, we will never catch up. We could try to establish specially tuned workstations at various home institution; but the special IP tuning is an expert process involving kernel parameters, root privilege, etc. It would be lost at each OS upgrade. It seems unreasonable to expect that specially hacked "data upload" workstations will be reliably maintained at observers' home institutions.
There are other drawbacks to this method: the observer's home institution would have to make 10GB per night of disc space available for the data, or the transfer will fail; the transfer is vulnerable to routing failures, power outages, and other network mishaps along the way; the transfer must be initiated from inside Keck, which means the observer will not be able to retry it after having returned home and detected a problem.
For all these reasons we feel we can go on ignoring "network upload" as a practical method of bringing DEIMOS data home.
What is more usual and customary today than network transfer, is for the observer to carry the data home from the mountain. In the old days this was done using 8-inch floppy discs, DECtapes, 9-track tapes; more recently, we've used Exabytes and DAT tapes.
There is a psychological appeal to this method: the observer has a physical copy of the data in a tangible, securable form. Drawbacks are obvious: it's not always possible to verify the medium perfectly; any error in format or content is very unpleasant to discover after getting home again; media have to be transported (on airplanes), raising questions of weight, fragility, packaging, immunity to dust and dirt, etc. A major cost, in some cases, occurs when N observers at N institutions all have to buy their own expensive media readers for some new media standard imposed by the instrument.
These are some currently available media options.
Exabyte | The new Exabyte Mammoth-2 claims a capacity of 60 GB per cartridge. | |
Drive Price | $4000 | |
Write Speed | 12 MB/sec 60GB uncompressed in 1.5 hrs 2 runs' worth of data, no operator intervention |
|
Bus | SCSI | |
Medium | Exatape AME $92 for one full-length cartridge $1.50/GB |
|
Ecrix VXA | the Ecrix VXA-1 drive claims a capacity of 33 GB per cartridge | |
Drive Price | $1000 | |
Write Speed | 3MB/sec (est) 30 GB uncomprssed in 1.5 hrs 1 run, no operator intervention |
|
Bus | SCSI | |
Medium | Ecrix VXA cartridge $80 each $2.66/GB |
|
DLT | current DLT drives claim a capacity of 40 GB per cartridge | |
Drive Price | $2000-$4000 | |
Write Speed | 1.5MB/sec 40 GB in 2 hours 1 run, no operator intervention |
|
Bus | SCSI | |
Medium | DLT helical-scan TK-style monohub cartridge $65 each $1.62/GB |
|
Sony 8mm | Sony's latest high-density 8mm drive claims a capacity of 25 GB per cartridge | |
Drive Price | $2200 | |
Write Speed | 3MB/sec over an hour to write 25GB might not quite fit one run; operator intervention> |
|
Bus | SCSI | |
Medium | 170m AME 8mm; not sure if Exatapes would work specs are hard to get Sony's link is broken |
All the cartridge tape solutions have some characteristics in common:
Big Optical | old-style platters claim 15-30GB | |
Drive Price | Still waiting for quotes; expensive; $5000-$10,000 guesstimate |
|
Write Speed | 2.7MB/sec an hour or so to write 15GB 2 hrs to write a run's worth operator intervention if 15GB media |
|
Bus | SCSI | |
Medium | 12 inch optical platter $50 each $3.33/GB if 15GB media |
|
DVD | current media claim 4.7GB/side only one-sided drives/media seem to be available format plagued with contending standards |
|
Drive Price | writer $4500-$5000 reader $250 |
|
Write Speed | 11.08Mb/sec (similar to CDROM x1) about 1.1 MB/sec 1 hour to write 1 4.7GB disc5 or 6 discs for 1 run; operator intervention |
|
Bus | writer SCSI reader various: SCSI, IDE, firewire, USB, etc |
|
Medium | 5.25in plastic DVD $40 each $8.50/GB |
|
SCA SCSI hotswap | these are standard SCSI hard drives
packaged in a carrier with a handle; they can be "hot swapped" in and out
of a SCSI bay (single or multiple) a reasonable capacity today would be 36GB per disc |
|
Drive Price | base unit $350-$400 | |
Write Speed | sales claim: 40MB/sec actual limit about 25MB/sec 30GB in 1200sec, or 20 min sub-10ms random access times |
|
Bus | SCSI (fast wide, Ultra, etc) | |
Medium | 5.25in magnetic multiplatter $400-$500 with carrier $11/GB |
|
PCMCIA mini | these are standard 2.5in IDE half-height disc
assemblies packaged in compact portable units with attached PCMCIA card.
they can be read on any laptop having a PCMCIA port and configured with
a recent Linux or Wi/NT release. today's capacities stop around 18GB |
|
Drive Price | laptop with PCMCIA slot wide price range |
|
Write Speed | 16.6MB/seci 12ms random access times |
|
Bus | PCMCIA | |
Medium | 2.5in IDE "wrapped" in PCMCIA adaptor Handmade: 18GB drive $460, adapter $100, total $560 Prepackaged with flip-card: $700 $31/GB to $38/GB |
The big optical laserdisc style platters score very badly on portability and drive cost; there is also the fear that they are already obsolete or very close to obsolescence. They seem to be very proprietary technology; each vendor has a format, media standard, etc. of its own.
DVD-R is unique in that the investment required for *reading* is very low, so the cost to N observers' home sites to read DVDs is very small compared to most of our other solutions. However, it's not dense enough, and incredibly slow to write (slower than most tape drives). Worse, the standards wars are still raging. On the positive side, DVD media are small, light, and tough; they resist impacts and fingerprints quite well.
With either DVD or large MO disks, the question of standards and format longevity is a vexed one. The DVD format wars may be won soon by some format other than DVD-R. Or another technology, as mentioned above, may overtake both media and cast them into the pit of obsolescence almost overnight. We would then feel we had wasted any investment made in them.
SCSI discs, though they may be considered "small" in capacity after a year or two, can continue to be useful. Most sites have at least one or two unix workstations with SCSI capability, so an external SCA "cage" with one or more bays is a small incremental investment -- not quite as cheap as a DVD reader, but far cheaper than most tape drives. The removable drives themselves may be seen as overpriced or undersized after a couple of years, but they will remain compatible with large numbers of workstations.
The SCA discs are in the borderland of "convenience" -- they are a little too heavy, fragile, and large to be really "portable technology" by today's standards; however, they can be fitted into one corner of a suitcase, or a backpack.
The winner for pure portability would be the mini IDE discs, which literally fit in a shirt pocket (and are packaged specially for this kind of transport). Even the 5-6 DVDs needed to hold a run's-worth of data can be transported fairly easily if stored in tyvek sleeves rather than jewelboxes.
The SCA plug/play discs are heavier and larger than any other single medium, but their capacity is very large and they win hands-down on read and write speed and ease of access for data reduction. Also, they are the least proprietary or "exotic" of the technologies listed here.
At present we are inclined to favour the SCA SCSI removable disc drives. The cost to acquire multiple writers is small. Indeed the "bay" or "cage" for these devices is the cheapest *writer* of the collection. The cost per medium is high, but not much higher than DVD, and because a single medium will hold a run's-worth of data, there is no labour involved in babysitting the copying of data, switching media, etc. The medium can be reused -- hopefully *after* archival copying is done to some slower (probably tape) medium at the home site.
The SCA discs likewise do not require an expensive *reader*. And if we were to abandon this format, the discs can be taken out of their carriers and mounted internally in PCs or workstations for ordinary use; very little investment is lost if we change media, because these discs are more or less generic and can be recycled into other applications.
The winner for sheer convenience would be the PCMCIA-based mini-IDE discs, if it were not for the very high cost and not quite enough capacity to count on getting a whole run onto one disc. Being able to take one's data home in a shirt pocket is very attractive, but possibly needing to buy two of a $500-700 medium is nasty.
There are some concerns about transporting the SCA style discs in a suitcase, through airports, etc. One possible solution would be to FedEx them from Waimea to the home institution; another would be the adoption of a standard ATA approved carrier.
Now, if we return briefly to the "other problem" of backup/archival storage, we have to admit that disc-based solutions beg the question of CARA's need to make nightly backups of observed data. Obviously CARA cannot go on stockpiling $400 discs forever.
The archive/backup problem is not really DEIMOS-specific and can't be addressed by any one instrument project. However, the size of our images does exacerbate the problem considerably, so we should make some kind of suggestion or recommendation for dealing with it.
Our recommendation today would probably have to be the Exabyte Mammoth, due to its high density, decent speed (for a tape drive) and excellent reviews in the trade press. Should DEIMOS present CARA with a Mammoth drive, to aid in the backup of large DEIMOS images? We should discuss this along with the other issues raised above.