More worrying are the user reports that ds9 is "very slow". Here I should note that "slow" mean "compared to figdisp", of course -- we are seeing at least an order of magnitude slower response times out of ds9 than figdisp. That's quite a lot. A factor of two, people might not fuss over. A factor of 10 is scary.
Interactive responsiveness in the RTD is not merely a luxury; it's a very practical concern. Observing time at Keck is costed per minute (I forget the exact number, but we used to use a buck a second as the rule of thumb, so that would be about $60/minute). Needless to say both the Keck administration and the schedule observer want to waste, if possible, zero minutes in the course of a night.
So if the observers have to wait several seconds just to zoom in to part of the image to see whether the focus is any good, that delay will get to seem interminable after a few tens of images; and by the end of a 3-night run it will have added up to a lot of wasted observing minutes -- time that they won't get again until maybe next year or the year after. One of the design imperatives for our instruments and their support software is always to waste as little observing time as possible.
Figdisp, though primitive (it's really a "dumb frame buffer" sort of tool), was extensively optimized during the ESI project and offers quite responsive performance during observing. After the readout begins (for ESI or HIRES -- LRIS is still stuck with an older, slower version), only a second later -- or even less -- image rows begin to paint on the screen. The user can also pan and zoom during readout, and abort an image right away if the first few tens of rows shows it is unacceptable -- bad focus, clouds, whatever.
Readout times are fairly long with our larger detectors. Being able to abort an unsatisfactory image before spending a minute or so capturing it, is quite important.
Using ds9, we experience first of all the mysterious "header-getting" delay which means that 1/3rd of the readout has already gone by before any screen refresh. For 1/3rd of the total readout time, the observer sits there seeing nothing at all. Then chunks of image start to paint, and we have no difficulty keeping up with the readout after that. But unfortunately, ds9 is then so busy updating the frame that user mouse interactions -- pan and zoom commands for example -- are ignored or acted on belatedly. it is almost impossible to interact with the image until the readout is complete. This means effectively that the readout must complete (another 30 seconds or so) before the user can even decide whether it is acceptable. This is no worse than LRIS, but LRIS is the oldest and least capable of our instrument control systems, and its limitations are hardly what we would want for a flagship instrument like DEIMOS.
Once having acquired the image, the observers now want to pan and zoom around in it, to answer some questions before committing to the next exposure -- or to help plan their observing strategy for the next hour or even the next few minutes. If each pan or zoom operation takes several seconds, precious observing time is being lost just waiting for the RTD. There is usually a bit of basic interactive image qual assessment and quick look reduction taking place between images, and this activity wants to be very swift and efficient.
Figdisp's pan and zoom times on a 140 mb image, whether during readout or not, even on an encrypted X display, are sub-second -- so fast that we can't accurately time them with our stopwatches because our fingers aren't that fast. By unhappy contrast ds9's pan and zoom times on a 140 mb image (on an unencrypted display) are a factor of 10 or more slower -- 3, 4, even 5 seconds typically. Some actual benchmark results will be found below.
In figdisp, the old image (provided the new one was the same size) remained in the screen buffer while a new readout was starting, i.e. the new rows started overpainting the old rows. This meant that you could start the next exposure "hopefully," while still looking at the last one analytically. Then if you found something bad in the last image, you could abort the current exposure-in-progress.
In ds9, the start of the new image seems to require re-initing the frame buffer which destroys the image (reverts to white screen) so one cannot review the last image while also doing a new exposure. Again this is a lost efficiency -- activities which previously could be overlapped now being serialized. If ds9's response times were w/in a factor of 2 of figdisp's, I think no one would care terribly about this "screen being wiped" difference, but these little things add up ... Now we are not only waiting 25x longer to see the first pixels of the new image, but we can't even look at the old image while we wait :-(
Lately we have been reluctantely discussing a partial rollback to figdisp. No one is very keen on this, but there's a growing fear that ds9 will not be satisfactory for actual observing because of these inefficiencies (i.e. it is not, despite our efforts so far, a "quick-look" tool). This would be a disappointing outcome to say the least, since we had hoped to walk away from figdisp forever with this instrument, and we don't want to support both RTDs! We can only regard this as a bandaid solution, and we preserve a longer-term faith that ds9 interactive performance is improvable.
Yesterday in the lab, being pressed for time, the engineers reverted to non-mosaic images and figdisp, because it would have been impossible to acquire and review the necessary images in the time available, using ds9. We particularly wish to maintain our standard image format for DEIMOS -- multi-HDU mosaic files, so reverting to figdisp would mean some last-minute hackery to write multi-HDU files to disk, yet display a single-HDU file (all that figdisp can handle).
Here are the benchmarks.
ds9 RUNNING ON: | Enterprise 450 (Sparc Solaris 5.8) 4 cpus, I believe, at the moment 3.25 GB RAM 3 GB swap images on LOCAL RAID DISK |
DISPLAY ON: |
Sparc Ultra 10 Solaris 5.7 openwin X server 256MB RAM 512MB swap 8 bit display unencrypted display (xauth direct to this display, not virtual ssh encrypted display) |
First we read FROM DISK a 16 amp full frame DEIMOS mosaic image. All the next series of timings are from this 140 MB, 16-HDU image.
read file in from disk 22.7 sec resize ds9 window a bit larger 5 sec (!) scale buttons zscale 3.4 sec histeq 57 sec (!!!!!) minmax 4 sec 98% 57 sec (!!!!!) panning (at 1/16 whole image in view) click in image in one quadrant 4 sec (figdisp .25 sec) click in panner box 4 sec xoom to x1 4.5 sec drag cyan pan box to SW quadrant 2 sec drag to NE quad 5 sec drag to SW again 1.2 sec to SE 3.2 sec to SW 1.7 sec to NW 3.5 sec (!) zoom back to 1/16 4 sec to x16 3.4 sec to 1/16 again 4 sec middle click to re-centre image 3.5 sec resize window even bigger, to take up about half of screen 10 sec (!) resize window to take up all of screen 11.5 sec (!) with this big window, zoom to x1 4.5 sec back to 1/16 11 sec to x1 5.8 sec to 1/16 10.8 sec click in panner to pan to NE quad 14 sec (!) to SW 4 sec to NW 5.5 sec to SE 7.8 sec and more scale (first return to linear etc) zscale 7 sec histeq 7.8 sec minmax 7 sec 98 % 7 sec shrink ds9 window 5 sec
We then did a live exposure. It's worth noting that on the first attempt to capture a live image, we did not succeed; ds9 hung and had to be restarted. The restarted copy accepted the incoming image OK, with following timing.
IMAGE READOUT STARTS new shmem seg opened frame initialized headers scanned header data grabbed PAINTING STARTS 25 seconds after readout started, about 1/3 of image has already read out. IMAGE READOUT COMPLETE DS9 FINISHED PAINTING about 2 sec after image readout complete.
That final 2 sec wait is not a big deal. But the 25 seconds that go by before we ever see any updates on the screen, those are an issue. We are "blind" for those 25 seconds. If you review our prior correspondence on this delay, you'll recall that we discovered it scaled more or less linearly with the number of hdus, i.e. it had something to do with the way ds9 was scanning for hdus in the shmem fits file. This was never resolved.
I should note that for all pan and zoom operations, figdisp is sub-half-second, i.e. no perceptible delay at all. the contrast is astonishing. figdisp screen painting starts about 1 second after image readout starts. I don't think we expect ds9 to be exactly as fast as a dumb frame buffer, but I also think we don't expect it to be 25 times slower :-) a factor of 2 slower, or maybe even 4, would not be worrying. factors of 10 to 25 are scary.
. . . . 1. are we in 8 bit or 24 bit mode?
always 8 bit. 24-bit is infeasible because one cannot scroll colormaps in real time (something we are always needing to do) on any 24 bit video hardware we have today. so at present we have no interest in 24 bit mode. I am running a 24 bit x server at present just out of curiosity, but will be switching back to 8 bit soon.
. . . . 2. is the data local (ie on a local disk), or on an nfs mounted disk?
local. X display is remoted.
. . . . 3. if yes, to #2, are you using nfs+ with cache enabled, and how fast is your . . . . network? 10mbit, 100mbit, 1Gbit?
100 Mbit. but if severe network load were affecting the X display, it should have been affecting the figdisp X display equally -- same machine, same net, same X server, same glass.
. . . . and what kind of load? if you copy the mosaic . . . . file to a local disk, how long does that take?
in this case we are not concerned with files so much as with the data in shmem, so file transport shouldn't be an issue.
. . . . 4. concerning the ultra 10 box, how much memory? how much swap? are you doing . . . . anything else at the same time? (for example, running netscape while displaying . . . . an animated gif will suck a cpu dry!)
We had no other x clients running, except a few x terms.
. . . . I'll be glad to download the 140mb mosaic, just to verify that all is well and . . . . normal.
we have lots of those :-)
try
http://www.ucolick.org/~de/deimos/backup.fits
(a 16 amp dark, taken yesterday, used for the benchmarks above)
btw while I remember it, pls also note (reported by users last week, a funny panner bug) -- notice how the colour map in the panner is not like the actual image, at the upper left? confusing for the user. it's almost as if 2 amps were swapped... but not really, 'cos the bad col is where it should be.