TclX List Commands

Lists are an important Tcl feature. The core Tcl list functions alone are quite powerful, giving Tcl somewhat LISP-like strengths. User or data file input often looks like a valid Tcl list, so Tcl's list-processing features add to its utility as a parsing language. TclX expands Tcl list functionality considerably, adding ten new commands.

In core Tcl, as a beginner, I not infrequently wrote code along these lines:

tcl>set dlist [list This was input from some source or other]
tcl>set var1 [lindex $dlist 0]
tcl>set var2 [lindex $dlist 1]

Later I became a little smarter and wrote code more like this:

tcl>set dlist [list This was input from some source or other]
tcl>set vlist [list var1 var2 var3 var4 var5 var6 var7 var8]
tcl>set dl [llength $dlist]
tcl>for {set i 0} {$i < $dl} {incr i} {
=>set [lindex $vlist $i] [lindex $dlist $i]
=>}
tcl>echo $var1
This

and so on. Using TclX, the equivalent code is:

tcl>set dlist [list This was input from some source or other]
tcl>lassign $dlist var1 var2 var3 var4 var5 var6 var7 var8
tcl>echo $var1
This

or even more concisely

tcl>eval lassign \$dlist $vlist

The list assignment command,

	lassign list var [var...]

assigns each element of list to one of a list of variable names which follow the list argument. If fewer variables are provided than there are list elements in dlist, lassign returns a list of the unassigned values:

tcl>lassign $dlist var1 var2 var3
from some source or other

If too many variables were provided, those for which there are no list values are set to the null value (but they do exist):

tcl>lassign $dlist var1 var2 var3 var4 var5 var6 var7 var8 var9 var10
tcl>echo $var9

tcl>info exists var9
1

lassign is another TclX feature that any programmer might well write as a proc for her own private library, but is so basic and useful that it belongs in the language.

Sometimes all you need to know about a list is whether it is empty (has no members). The command

	lempty list

is slightly more concise than comparing the list to a null string, or checking whether [llength list] is 0.

tcl>set dlist ""
tcl>lempty $dlist
1
tcl>set dlist "a b cd e"
tcl>lempty $dlist
0

Now we come to one of my favourite commands,

	lrmdups list

Because I write a lot of list-processing code, I find this command very useful; it simply sorts list and suppresses all duplicate entries (the equivalent of a sort -u):


tcl>set dlist [list the quick brown fox jumped over the lazy dog]
tcl>lrmdups $dlist
brown dog fox jumped lazy over quick the

Core Tcl uses the lsearch command to determine whether a string is found in a list and in what position. TclX carries this concept several steps in a different direction with

	lmatch ?-mode? list pattern

The three possible modes are -exact, -glob, and - regexp.When -exact is chosen, as you would expect, the supplied match value must be found, intact, as an element of the list. The -glob option causes lmatch to behave like core Tcl string match, and -regexp causes it to work more like the core Tcl regexp command.

tcl>set dlist [list The quick brown fox jumped over the lazy dog]
tcl>lmatch -exact $dlist do
tcl>lmatch -exact $dlist dog
dog
tcl>lmatch $dlist *o*
brown fox over dog
tcl>lmatch -regexp $dlist "e+"
The jumped over the
tcl>lmatch -regexp $dlist "(o+)|(h+)"
The brown fox over the dog

Core Tcl provides the list and lappend commands for creating lists. The TclX commands lvarcat, lvarpop, and lvarpush help you to construct and deconstruct lists by adding and removing list elements.

	lvarcat varName string ?string...?

creates a single list out of all its string arguments. If any string is a list, it is deconstructed into individual elements which are appended to the output list. The output list is stored in varName and also returned as the command result. The variable varName need not pre-exist; lvarcat creates it if necessary.

Here I'll illustrate the difference between lappend and lvarcat.

tcl>set dlist1 {The quick}
tcl>set dlist2 {brown fox jumped}
tcl>set dlist3 {over the lazy dog}
tcl>lappend blist $dlist1 $dlist2 $dlist3
{The quick} {brown fox jumped} {over the lazy dog}
tcl>lvarcat xlist $dlist1 $dlist2 $dlist3
The quick brown fox jumped over the lazy dog

As you can see, lappend makes a list of lists, with embedded braces delimiting the original lists. lvarcat makes a simple list with no internal structure.

Somewhat similar to ctoken is

	lvarpop varName ?listIndex? ?newString?

which deletes (and returns) the list element indexed by listIndex. If the index is not supplied, it defaults to 0 and the command lifts off the first list element as if popping a stack. If a newString argument is supplied, then the original deleted item is replaced by newString. The return value is the deleted item, so this is another handy way to strip items from space-separated lists such as command line arguments.

tcl>set dlist "The quick brown fox jumped over the lazy dog"
tcl>lvarpop dlist 
The
tcl>echo $dlist
quick brown fox jumped over the lazy dog
tcl>lvarpop dlist 3 flew
jumped
tcl>echo $dlist
quick brown fox flew over the lazy dog
tcl>lvarpop dlist end cow
dog
tcl>echo $dlist
quick brown fox flew over the lazy cow
tcl>lvarpop dlist end-1 industrious
lazy
tcl>echo $dlist
quick brown fox flew over the industrious cow

This is somewhat faster and easier than using lappend, lreplace, and linsert to the same effect. One could easily use lvarpop to parse command line arguments and flags:

tcl>
#
# sample args -Eerrfile -Ooutfile -Mmyname@host infile.dat
#	in no particular order
#
while {![lempty $argv]} {
	set word [lvarpop argv]
	if {[cindex $word 0] == "-"} {
		switch -- [cindex $word 1] {
			E { set errfile [crange $word 2 end] }
			O { set outfile [crange $word 2 end] }
			M { set mailto [crange $word 2 end] }
			default {
				puts stderr "Bad flag $word"
				echo $syntax_reminder
				exit 1
			}
		}
	} else {
		set infile $word
	}
}
tcl>echo $errfile $outfile $mailto $infile
errfile outfile myname@host infile.dat

The corresponding "push" command is

	lvarpush varName newString ?listIndex?

which inserts a new item newString into the list just before position listIndex, and listIndex again defaults to 0 if unspecified. We had lopped off the first word of "The quick brown fox..." (in dlist): let's use lvarpush to stick it back on:

tcl>lvarpush dlist The
tcl>echo $dlist
The quick brown fox flew over the industrious cow

List processing is often set processing; i.e. the object is to determine or assign membership of entities or attributes (elements) in sets (lists). TclX's toolkit for this kind of problem starts with

	union listA listB

which simply merges the two lists and eliminates dups; in other words, it's just lvarcat plus lrmdups.

More useful to most programmers than union is

	intersect listA listB

which returns the intersection of two lists (the list of elements found in both lists):
tcl>set listA {bb xx aa gg yy pp} tcl>set listB {zz nn bb oo tt yy mm} tcl>union $listA $listB aa bb gg mm nn oo pp tt xx yy zz tcl>intersect $listA $listB bb yy
Last and most useful is
intersect3 listA listB
which returns a list of three lists: first, all the elements of listA not found in listB; second, the normal intersect of the two lists (elements they have in common); third, the elements of listB not found in listA. This function has been of repeated use to me in database applications where lists of entities and attributes must be compared, merged, and diffed.
tcl>intersect3 $listA $listB {aa gg pp xx} {bb yy} {mm nn oo tt zz}
I've used TclX list functions in a network traffic mapping problem, for which I had gathered extensive summary data describing packet traffic by source and destination address. From these data I could determine for each host a list of "buddies" with whom it chatted more than with other (non-buddy) hosts. The question was whether these sets of buddies would group into larger "clubs" all of whose members tended to chat far more within the club than outside the club. The answer to this question would tell us whether our overloaded single Ethernet backbone could be partitioned rationally (using routers and bridges) to optimize bandwidth usage.
This question was a set theory problem: can each host be assigned to a "club" and what degree of crossmembership is there between clubs? TclX list manipulation commands seemed like the right tool, and this is an excerpt from the application:
# players is the list of hosts who are "playing" this game # the proc "buddies" returns a list of hosts who chat with me # more than the threshold amount (parameter) foreach p $players { # p is an IP address, which we convert to a name via an array # set at the beginning of the "game" using nslookup (scotty # tcl extension) and local database set hn $name($p) set myset [buddies $hn] set myclub -1 set besto 0 # clubs is an array of lists of hosts where each list of hosts # is an association or club whose members chat with each other. # with which existing club does my social set (buddies) have # the largest overlap? foreach c [array names clubs] { set cl $clubs($c) lassign [intersect3 $myset $cl] mynew overlap others set ol [llength $overlap] if {$ol > $besto} { set besto $ol set myclub $c } } # if there was no overlap at all then we are a new club of our own if {$myclub < 0} { set newc 0 catch {eval set newc \[max [array names clubs]\]} set myclub [expr $newc + 1] set clubs($myclub) "" } # whether new or old, we append all my buddies to the selected # club membership list foreach h $myset { lappend clubs($myclub) $h set membership($h) $myclub } # then we remove dups from this club set clubs($myclub) [lrmdups $clubs($myclub)] }
The TclX basic list commands are enhancements that any programmer could provide with a private library of procs, like the math functions and lassign command; but it's far more pleasant to have them "off the shelf" as part of the installed interpreter.
You can do even more with lists in TclX by using its "keyed list" concept. A keyed list is a list with fixed internal structure, that is, a list of lists. The commands keylset and keylget store and retrieve data from keyed lists. An example will be far more useful than a lengthy explanation here:
tcl> keylset person LAST Flintstone FIRST Fred PHONE 333-4444 \ OCCUP Toon SALARY 0 NOTES "What a swell guy" tcl> echo $person {LAST Flintstone} {FIRST Fred} {PHONE 333-4444} {OCCUP Toon} {SALARY 0} {NOTES {What a swell guy}} tcl> keylget person OCCUP Toon
What we just did was to establish a set of keyword/value pairs and to give that set a name (person). We can now retrieve specific values by keyword (what is the "occupation" of "person"?). Database programmers will immediately recognize this construct as a tuple expressing attributes; other programmers may see a strong resemblance to a C struct. The syntax is
keylset listName Keyword Value ?Keyword Value ...? keylget listName Keyword
A lot of my Tcl code is basically state-enginesque; that is, if it were written in C, all the structs would be global (and very large). So I just use large global arrays for 90 percent of my data storage. I have thus been sheltered from the one significant limitation of arrays in Tcl: arrays are not first-class objects. You can't pass or return an array by value, only by name.
When you want to pass structured packages of data between procs by value, you may not want to refer to a lot of upvar levels and pass everything by name. Unless your code is already a state engine, you may not want to make most of your variable space global either. At this point you generally use a list:
proc foo mylist { global ofp lassign $mylist last first phone echo $ofp [format "%-15s %-15s %07d" Last First Phone] echo $ofp [format "%-15s %-15s %07d" $last $first $phone] ... } lappend person $last_name lappend person $first_name lappend person $phone_num set res [foo $person] ...
The only clue to the meaning of the list elements is their order; that's not so painful, but this kind of code can present maintenance problems later, especially if different programmers work on different modules. If you choose good variable names, as above, it's almost self-documenting; but if you get sloppy and start referring to your list elements by raw index numbers
if {[lindex $mylist 0] == $test_name} { do_something }
the code starts to lose legibility. Who can tell, without poring over (perhaps a lot of) other source, whether the name being compared is a first or a last name?
Keyed lists were introduced into TclX to provide a first-class object (one that could be passed and returned by value), yet had internal structure and was self-documenting and conducive to good coding. You could say that keyed lists are a stylistic, rather than a purely functional, enhancement. They offer you a syntactic construct that encourages good programming style.
As you saw above, a keyed list is just a list of lists. There's no radical new Tcl variable type here, just some procedures that allow you to create and manipulate a list of lists easily and concisely. Naturally the story does not end with storage and retrieval. You can also delete a keyword and its associated value out of the list:
tcl> keyldel person SALARY tcl> echo $person {LAST Flintstone} {FIRST Fred} {PHONE 333-4444} {OCCUP Toon} {NOTES {What a swell guy}}
or add a new one
tcl> keylset person HOBBIES "Lithography Zoetropes Oenology" {LAST Flintstone} {FIRST Fred} {PHONE 333-4444} {OCCUP Toon} {NOTES {What a swell guy}} {HOBBIES {Lithography Zoetropes Oenology}}
And, as with the array names command, you can retrieve the list of valid keys set for this keyed list:
tcl >keylkeys person LAST FIRST PHONE OCCUP NOTES HOBBIES
Now you have a data storage convention with the benefits both of an array (named storage locations whose names are retrievable from the storage object itself) and of a list (can be passed by value). The benefits of passing by value become more obvious when we consider the challenges of a distributed (client/server) application. If I want to pass a structured package of information from the client to the server or vice versa, neither has any knowledge of the other's variable space; I have to pass by value. It would be nice if I could examine the return value I had just received to see what I was given, rather than relying on hard-coded index positions and arcane rules ("if the first word of the list is X, then position 3 is the phone number").
Let's say the client has the social security number of a person, and it wants to know what the server knows about that person. The client might contain some code like this:
# I sent the server a request for info, containing some lookup # code like a social security number. # The server process returned me a keyed list which I call "answer". set flds [keylkeys answer] set patient "FIRST LAST FLOOR WARD BED CHARTNUM MEDS DIET PPHYS" set doctor "FIRST LAST DIVIS SPECIAL CASELOAD HOURS PAGER HOME" # If there are fields in the answer that are not patient fields # then it has to be a doctor; there are only two species in our taxonomy if {![lempty [lindex [lintersect3 $flds $patient] 0]]]} { set type "Doctor" } else { set type "Patient" } echo "$type [keylget answer FIRST] [keylget answer LAST]:" keyldel answer FIRST keyldel answer LAST foreach k [keylkeys answer] { echo "$k : [keylget answer $k]" }
We'll cover the commands which actually implement servers and clients in TclX later in this chapter; this example is just to demonstrate that I can pass a structured message by value and retrieve not only its contents but embedded information about the meaning of its contents. It also demonstrates the use of a list intersect command to compare lists of attributes, in order to determine the type of an entity (for those who like that kind of thing).
As to maintainability, you can add fields to keyed lists without disturbing procs that already use those lists, because the procs can easily be written to ignore the length of the list and order of elements. There is no temptation to indulge in raw integer indexing, so the readability of code written with keyed lists depends only on an intelligent choice of keywords.
A keyed list can itself be a value element in a keyed list (just as structs can contain structs in C). This makes them more powerful than mere keyword/value pairs.
tcl>keylset n LAST Smith FIRST Sybilla tcl>echo $n {LAST Smith} {FIRST Sybilla} tcl>keylset person NAME $n ADDR "1 Haresfoot Crescent" tcl>echo $person {NAME {{LAST Smith} {FIRST Sybilla}}} {ADDR {1 Haresfoot Crescent}} tcl> keylget person NAME.LAST Smith tcl>keylget person NAME {LAST Smith} {FIRST Sybilla}
Here we made n a keyed list which became one element of the keyed list person. Note that we can now retrieve the subelements of NAME without resorting to [keylget [keylget...] ...] by means of the dot-syntax NAME.LAST -- this is in fact necessary, because the keyed list commands refer to their target lists by name and not value! We could also have set the nested list values using the same dot-syntax:
tcl>keylset person NAME.LAST Smith tcl>keylset person NAME.FIRST Shadrach tcl>echo $person {NAME {{LAST Smith} {FIRST Shadrach}}} {ADDR {1 Haresfoot Crescent}}
You could model fairly complex hierarchical data structures using keyed list.
Now for the catch: beware of performance problems with keyed lists! If you have too many keywords and values, or if your keywords are themselves variables, you can run into serious performance problems. (I once cut execution time by two thirds when I replaced global keyed lists with global arrays in a major Tk application; as TclX author Mark Diekhans said, I was "abusing" keyed lists by turning them into large-scale dynamic data storage.) Keyed lists are simply not practical when they get too big or when you try to construct them dynamically. The keyed list functions, being procs, involve more overhead than compiled-in commands, so if you plan very intensive, repeated hits on your data storage, keyed lists would not be a good choice.
If you want the keyed list features but you need more storage, remember that a keyed list can also be stored in an array; you can build large arrays in which each array element is a keyed list.