TclX String Functions

TclX String Functions

TclX adds several character and string functions, permitting you to perform quite easily many operations that would be inconvenient in core Tcl.

The function cequal compares two strings:

	cequal strA strB 
It returns 1 if the two strings are identical, and 0 if they are not. This is a shorter syntax than string compare, and the result is more intuitive (though string compare is modelled on the C strcmp, many programmers find strcmp confusing at first, and cequal is more "sensible").
tcl> cequal "This" "That"
0
tcl> cequal "This" "This"
1
tcl> if {[cequal $strA $strB]} {
. . .
cequal also gets you around a well-known "gotcha" in Tcl expressions: if a string happens to conform to the Tcl syntax for a numeric quantity, the normal equals operator (=) interprets it as a number, and finds it to be equal to another string that seems to represent the same number. Thus:
tcl>set str1 "0x7"
tcl>set str2 "007"
tcl>if {$str1 == $str2} {echo they are the same}  
they are the same
To overcome this in standard Tcl you have to resort to string compare.

The cindex function does character-wise indexing into strings. Thus,

	cindex string indexExpr
returns the character indexed by the indexExpr. For example,

tcl>cindex Hello 1 
e
Note that indexing in Tcl generally starts with 0.

clength gets the length in characters of a string:

tcl>clength Hello
5

If you need to extract substrings by character indices,

	crange string ind1Expr ind2Expr 
returns the range of characters from index ind1Expr through index ind2Expr:

tcl>crange "Hello World" 2 7 
llo Wo 
csubstr does almost the same thing as crange, but by start position and length, so
csubstr string indExpr lenExpr
returns a range of characters starting at the index indExpr and lenExpr long. If the values of these arguments take csubstr beyond the end of the string, it simply returns what it can get.

tcl>csubstr "Hello World" 4 5 
o Wor 
tcl>csubstr "Hello World" 9 10 
ld 

One common application that can be tedious and repetitive to code is the parsing of ASCII input. With its list functions and string functions, Tcl and TclX are remarkably useful for parsing. TclX includes the ctoken function specifically for this purpose:

	ctoken strVar sepString
This parses a token out of a character string. The string to parse is contained in the variable strVar, and the string sepString contains all the valid separator characters for tokens in the target string. The first token is returned and the contents of strVar are modified to contain only the remainder of the input string following the extracted token:

tcl>set sepString ~_ 
tcl>set parse "_~This~is_a~string__to_parse~for~tokens"  
tcl>ctoken parse $sepString  
This 
tcl>echo $parse 
~is_a~string__to_parse~for~tokens
tcl>ctoken parse $sepString  
is 
tcl>ctoken parse $sepString  
a 

(and so on). ctoken ignores any leading separators. ctoken is basically a more intelligent split, with the addition of the "eat token and shorten string" step that one would otherwise have to code by hand. ctoken is a close analogue of the C library routine strtok.

Parsing problems often involve the validation or "typing" of input tokens. TclX provides ctype to address this:

	ctype [-failIndex var] charClass string
returns 1 if every character in the string is of the specified charClass, and 0 if any character is not. It also returns 0 if the string is of zero length. If the failIndex flag and variable name are provided, then the index of the first character to fail the test for membership in type charClass is returned in the variable.

tcl>set str 87654h890 
tcl>ctype digit $str 
0 
tcl>ctype -failindex where digit $str 
0 
tcl>echo $where 
5 
tcl>echo [cindex $str $where]
h

Other character classes include alnum, alpha, ascii, cntrl, lower, upper, space, etc. ctype does more than just test strings for type; it can also be used to convert decimal ASCII values to characters, and vice versa:

tcl>ctype ord e 
101 
tcl>ctype char 101 
e 

Eventually every programmer needs this conversion. It's one more wheel that the TclX user doesn't have to re-invent.

TclX has yet more "fun with strings" in its bag of tricks: replicate and translit.

	replicate string times
simply returns a string constructed of times replications of the string string:

tcl>replicate a 10 
aaaaaaaaaa 
tcl>replicate ab 10 
abababababababababab 

The Unix tr command is mirrored in

	translit inrange outrange string
which translates characters in string, changing any char in the range inrange to its corresponding char in outrange. You could use this as an alternative version of string toupper:

tcl>set str "Hello World" 
tcl>translit a-z A-Z $str 
HELLO WORLD 

or you could do some simple-minded data obfuscation:

tcl>translit a-z b-za abc
bcd 
tcl>translit a-z m-zA-L abcpqr 
mnoBCD 
tcl>translit m-zA-L a-z mnoBCD 
abcpqr 

The string expand function,

	cexpand string
expands all backslash sequences in string to their actual character values.

tcl>set str "This is a square bracket \\\[ in a string" 
tcl> echo $str
This is a square bracket \[ in a string 
tcl>cexpand $str 
This is a square bracket [ in a string 

Of these functions, I have found clength, crange, and ctype the most essential; when parsing user input they are invaluable. Tcl can be called essentially a string processing language, since its variables are typeless; the more powerful the string parsing and manipulation commands in your toolbox, the better you can exploit Tcl's "everything's a string" philosophy.