[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

11. "Strings"

.setq string-chapter chapter-number Strings are a type of array which represent a sequence of characters. The printed representation of a string is its characters enclosed in quotation marks, for example "foo bar". Strings are constants, that is, evaluating a string returns that string. Strings are the right data type to use for text-processing.

Strings are arrays of type art-string, where each element holds an eight-bit unsigned fixnum. This is because characters are represented as fixnums, and for fundamental characters only eight bits are used. A string can also be an array of type art-fat-string, where each element holds a sixteen-bit unsigned fixnum; the extra bits allow for multiple fonts or an expanded character set.

The way characters work, including multiple fonts and the extra bits from the keyboard, is explained in (character-set). Note that you can type in the fixnums that represent characters using "#/" and "#\"; for example, #/f reads in as the fixnum that represents the character "f", and #\return reads in as the fixnum that represents the special "return" character. See (sharp-slash) for details of this syntax.

The functions described in this section provide a variety of useful operations on strings. In place of a string, most of these functions will accept a symbol or a fixnum as an argument, and will coerce it into a string. Given a symbol, its print name, which is a string, will be used. Given a fixnum, a one-character string containing the character designated by that fixnum will be used. Several of the functions actually work on any type of one-dimensional array and may be useful for other than string processing; these are the functions such as substring and string-length which do not depend on the elements of the string being characters.

Since strings are arrays, the usual array-referencing function aref is used to extract the characters of the string as fixnums. For example,
 
(aref "frob" 1) => 162  ;lower-case r
Note that the character at the beginning of the string is element zero of the array (rather than one); as usual in Zetalisp, everything is zero-based.

It is also legal to store into strings (using aset). As with rplaca on lists, this changes the actual object; one must be careful to understand where side-effects will propagate to. When you are making strings that you intend to change later, you probably want to create an array with a fill-pointer (see (fill-pointer)) so that you can change the length of the string as well as the contents. The length of a string is always computed using array-active-length, so that if a string has a fill-pointer, its value will be used as the length.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

11.1 "Characters"

Function: character x
character coerces x to a single character, represented as a fixnum. If x is a number, it is returned. If x is a string or an array, its first element is returned. If x is a symbol, the first character of its pname is returned. Otherwise, an error occurs. The way characters are represented as fixnums is explained in (character-set).

Function: char-equal ch1 ch2
This is the primitive for comparing characters for equality; many of the string functions call it. ch1 and ch2 must be fixnums. The result is t if the characters are equal ignoring case and font, otherwise nil. %%ch-char is the byte-specifier for the portion of a character which excludes the font information.

Function: char-lessp ch1 ch2
This is the primitive for comparing characters for order; many of the string functions call it. ch1 and ch2 must be fixnums. The result is t if ch1 comes before ch2 ignoring case and font, otherwise nil. Details of the ordering of characters are in (character-set).


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

11.2 Upper and Lower Case Letters

Variable: alphabetic-case-affects-string-comparison
This variable is normally nil. If it is t, char-equal, char-lessp, and the string searching and comparison functions will distinguish between upper-case and lower-case letters. If it is nil, lower-case characters behave as if they were the same character but in upper-case. It is all right to bind this to t around a string operation, but changing its global value to t will break many system functions and user interfaces and so is not recommended.

Function: char-upcase ch
If ch, which must be a fixnum, is a lower-case alphabetic character its upper-case form is returned; otherwise, ch itself is returned. If font information is present it is preserved.

Function: char-downcase ch
If ch, which must be a fixnum, is a upper-case alphabetic character its lower-case form is returned; otherwise, ch itself is returned. If font information is present it is preserved.

Function: string-upcase string
Returns a copy of string, with all lower case alphabetic characters replaced by the corresponding upper case characters.

Function: string-downcase string
Returns a copy of string, with all upper case alphabetic characters replaced by the corresponding lower case characters.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

11.3 Basic String Operations

Function: string x
string coerces x into a string. Most of the string functions apply this to their string arguments. If x is a string (or any array), it is returned. If x is a symbol, its pname is returned. If x is a non-negative fixnum less than 400 octal, a one-character-long string containing it is created and returned. If x is a pathname (see (pathname)), the "string for printing" is returned. Otherwise, an error is signalled.

If you want to get the printed representation of an object into the form of a string, this function is not what you should use. You can use format, passing a first argument of nil (see (format-fun)). You might also want to use with-output-to-string (see (with-output-to-string-fun)).

Function: string-length string
string-length returns the number of characters in string. This is 1 if string is a number, the array-active-length (see (array-active-length-fun)) if string is an array, or the array-active-length of the pname if string is a symbol.

Function: string-equal string1 string2 &optional (idx1 0) (idx2 0) lim1 lim2
string-equal compares two strings, returning t if they are equal and nil if they are not. The comparison ignores the extra "font" bits in 16-bit strings and ignores alphabetic case. equal calls string-equal if applied to two strings. The optional arguments idx1 and idx2 are the starting indices into the strings. The optional arguments lim1 and lim2 are the final indices; the comparison stops just before the final index. lim1 and lim2 default to the lengths of the strings. These arguments are provided so that you can efficiently compare substrings.
 
Examples:
(string-equal "Foo" "foo") => t
(string-equal "foo" "bar") => nil
(string-equal "element" "select" 0 1 3 4) => t

Function: %string-equal string1 idx1 string2 idx2 count
%string-equal is the microcode primitive which string-equal calls. It returns t if the count characters of string1 starting at idx1 are char-equal to the count characters of string2 starting at idx2, or nil if the characters are not equal or if count runs off the length of either array.

Instead of a fixnum, count may also be nil. In this case, %string-equal compares the substring from idx1 to (string-length string1) against the substring from idx2 to (string-length string2). If the lengths of these substrings differ, then they are not equal and nil is returned.

Note that string1 and string2 must really be strings; the usual coercion of symbols and fixnums to strings is not performed. This function is documented because certain programs which require high efficiency and are willing to pay the price of less generality may want to use %string-equal in place of string-equal.

 
Examples:
To compare the two strings foo and bar:
(%string-equal foo 0 bar 0 nil)
To see if the string foo starts with the characters "bar":
(%string-equal foo 0 "bar" 0 3)

Function: string-lessp string1 string2
string-lessp compares two strings using dictionary order (as defined by char-lessp). The result is t if string1 is the lesser, or nil if they are equal or string2 is the lesser.

Function: string-compare string1 string2 &optional (idx1 0) (idx2 0) lim1 lim2
string-compare compares two strings using dictionary order (as defined by char-lessp). The arguments are interpreted as in string-equal. The result is 0 if the strings are equal, a negative number if string1 is less than string2, or a positive number if string1 is greater than string2. If the strings are not equal, the absolute value of the number returned is one greater than the index (in string1) where the first difference occurred.

Function: substring string start &optional end area
This extracts a substring of string, starting at the character specified by start and going up to but not including the character specified by end. start and end are 0-origin indices. The length of the returned string is end minus start. If end is not specified it defaults to the length of string. The area in which the result is to be consed may be optionally specified.
 
Example:
(substring "Nebuchadnezzar" 4 8) => "chad"

Function: nsubstring string start &optional end area
nsubstring is the same as substring except that the substring is not copied; instead an indirect array (see (indirect-array)) is created which shares part of the argument string. Modifying one string will modify the other.

Note that nsubstring does not necessarily use less storage than substring; an nsubstring of any length uses at least as much storage as a substring 12 characters long. So you shouldn't use this just "for efficiency"; it is intended for uses in which it is important to have a substring which, if modified, will cause the original string to be modified too.

Function: string-append &rest strings
Any number of strings are copied and concatenated into a single string. With a single argument, string-append simply copies it. If the first argument is an array, the result will be an array of the same type. Thus string-append can be used to copy and concatenate any type of 1-dimensional array.
 
Example:
(string-append #/! "foo" #/!) => "!foo!"

Function: string-nconc modified-string &rest strings
string-nconc is like string-append except that instead of making a new string containing the concatenation of its arguments, string-nconc modifies its first argument. modified-string must have a fill-pointer so that additional characters can be tacked onto it. Compare this with array-push-extend ((array-push-extend-fun)). The value of string-nconc is modified-string or a new, longer copy of it; in the latter case the original copy is forwarded to the new copy (see adjust-array-size, (adjust-array-size-fun)). Unlike nconc, string-nconc with more than two arguments modifies only its first argument, not every argument but the last.

Function: string-trim char-set string
This returns a substring of string, with all characters in char-set stripped off of the beginning and end. char-set is a set of characters, which can be represented as a list of characters or a string of characters.
 
Example:
(string-trim '(#\sp) "  Dr. No  ") => "Dr. No"
(string-trim "ab" "abbafooabb") => "foo"

Function: string-left-trim char-set string
This returns a substring of string, with all characters in char-set stripped off of the beginning. char-set is a set of characters, which can be represented as a list of characters or a string of characters.

Function: string-right-trim char-set string
This returns a substring of string, with all characters in char-set stripped off of the end. char-set is a set of characters, which can be represented as a list of characters or a string of characters.

Function: string-reverse string
Returns a copy of string with the order of characters reversed. This will reverse a 1-dimensional array of any type.

Function: string-nreverse string
Returns string with the order of characters reversed, smashing the original string, rather than creating a new one. If string is a number, it is simply returned without consing up a string. This will reverse a 1-dimensional array of any type.

Function: string-pluralize string
string-pluralize returns a string containing the plural of the word in the argument string. Any added characters go in the same case as the last character of string.
 
Example:
(string-pluralize "event") => "events"
(string-pluralize "Man") => "Men"
(string-pluralize "Can") => "Cans"
(string-pluralize "key") => "keys"
(string-pluralize "TRY") => "TRIES"
For words with multiple plural forms depending on the meaning, string-pluralize cannot always do the right thing.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

11.4 "String Searching"

Function: string-search-char char string &optional (from 0) to
string-search-char searches through string starting at the index from, which defaults to the beginning, and returns the index of the first character which is char-equal to char, or nil if none is found. If the to argument is supplied, it is used in place of (string-length string) to limit the extent of the search.
 
Example:
(string-search-char #/a "banana") => 1

Function: %string-search-char char string from to
%string-search-char is the microcode primitive which string-search-char and other functions call. string must be an array and char, from, and to must be fixnums. Except for this lack of type-coercion, and the fact that none of the arguments is optional, %string-search-char is the same as string-search-char. This function is documented for the benefit of those who require the maximum possible efficiency in string searching.

Function: string-search-not-char char string &optional (from 0) to
string-search-not-char searches through string starting at the index from, which defaults to the beginning, and returns the index of the first character which is not char-equal to char, or nil if none is found. If the to argument is supplied, it is used in place of (string-length string) to limit the extent of the search.
 
Example:
(string-search-not-char #/b "banana") => 1

Function: string-search key string &optional (from 0) to
string-search searches for the string key in the string string. The search begins at from, which defaults to the beginning of string. The value returned is the index of the first character of the first instance of key, or nil if none is found. If the to argument is supplied, it is used in place of (string-length string) to limit the extent of the search.
 
Example:
(string-search "an" "banana") => 1
(string-search "an" "banana" 2) => 3

Function: string-search-set char-set string &optional (from 0) to
string-search-set searches through string looking for a character which is in char-set. The search begins at the index from, which defaults to the beginning. It returns the index of the first character which is char-equal to some element of char-set, or nil if none is found. If the to argument is supplied, it is used in place of (string-length string) to limit the extent of the search. char-set is a set of characters, which can be represented as a list of characters or a string of characters.
 
Example:
(string-search-set '(#/n #/o) "banana") => 2
(string-search-set "no" "banana") => 2

Function: string-search-not-set char-set string &optional (from 0) to
string-search-not-set searches through string looking for a character which is not in char-set. The search begins at the index from, which defaults to the beginning. It returns the index of the first character which is not char-equal to any element of char-set, or nil if none is found. If the to argument is supplied, it is used in place of (string-length string) to limit the extent of the search. char-set is a set of characters, which can be represented as a list of characters or a string of characters.
 
Example:
(string-search-not-set '(#/a #/b) "banana") => 2

Function: string-reverse-search-char char string &optional from (to 0)
string-reverse-search-char searches through string in reverse order, starting from the index one less than from, which defaults to the length of string, and returns the index of the first character which is char-equal to char, or nil if none is found. Note that the index returned is from the beginning of the string, although the search starts from the end. If the to argument is supplied, it limits the extent of the search.
 
Example:
(string-reverse-search-char #/n "banana") => 4

Function: string-reverse-search-not-char char string &optional from (to 0)
string-reverse-search-not-char searches through string in reverse order, starting from the index one less than from, which defaults to the length of string, and returns the index of the first character which is not char-equal to char, or nil if none is found. Note that the index returned is from the beginning of the string, although the search starts from the end. If the to argument is supplied, it limits the extent of the search.
 
Example:
(string-reverse-search-not-char #/a "banana") => 4

Function: string-reverse-search key string &optional from (to 0)
string-reverse-search searches for the string key in the string string. The search proceeds in reverse order, starting from the index one less than from, which defaults to the length of string, and returns the index of the first (leftmost) character of the first instance found, or nil if none is found. Note that the index returned is from the beginning of the string, although the search starts from the end. The from condition, restated, is that the instance of key found is the rightmost one whose rightmost character is before the from'th character of string. If the to argument is supplied, it limits the extent of the search.
 
Example:
(string-reverse-search "na" "banana") => 4

Function: string-reverse-search-set char-set string &optional from (to 0)
string-reverse-search-set searches through string in reverse order, starting from the index one less than from, which defaults to the length of string, and returns the index of the first character which is char-equal to some element of char-set, or nil if none is found. Note that the index returned is from the beginning of the string, although the search starts from the end. If the to argument is supplied, it limits the extent of the search. char-set is a set of characters, which can be represented as a list of characters or a string of characters.
 
(string-reverse-search-set "ab" "banana") => 5

Function: string-reverse-search-not-set char-set string &optional from (to 0)
string-reverse-search-not-set searches through string in reverse order, starting from the index one less than from, which defaults to the length of string, and returns the index of the first character which is not char-equal to any element of char-set, or nil if none is found. Note that the index returned is from the beginning of the string, although the search starts from the end. If the to argument is supplied, it limits the extent of the search. char-set is a set of characters, which can be represented as a list of characters or a string of characters.
 
(string-reverse-search-not-set '(#/a #/n) "banana") => 0

See also intern ((intern-fun)), which given a string will return "the" symbol with that print name.


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

11.5 I/O to Strings

The special forms in this section allow you to create I/O streams which input from or output to a string rather than a real I/O device. See (streams) for documentation of I/O streams.

Special Form: with-input-from-string (var string [index] [limit]) body...
The form
 
(with-input-from-string (var string)
    body)
evaluates the forms in body with the variable var bound to a stream which reads characters from the string which is the value of the form string. The value of the special form is the value of the last form in its body.

The stream is a function that only works inside the with-input-from-string special form, so be careful what you do with it. You cannot use it after control leaves the body, and you cannot nest two with-input-from-string special forms and use both streams since the special-variable bindings associated with the streams will conflict. It is done this way to avoid any allocation of memory.

After string you may optionally specify two additional "arguments". The first is index:
 
(with-input-from-string (var string index)
    body)
uses index as the starting index into the string, and sets index to the index of the first character not read when with-input-from-string returns. If the whole string is read, it will be set to the length of the string. Since index is updated it may not be a general expression; it must be a variable or a setf-able reference. The index is not updated in the event of an abnormal exit from the body, such as a *throw. The value of index is not updated until with-input-from-string returns, so you can't use its value within the body to see how far the reading has gotten.

Use of the index feature prevents multiple values from being returned out of the body, currently.

 
(with-input-from-string (var string index limit)
    body)
uses the value of the form limit, if the value is not nil, in place of the length of the string. If you want to specify a limit but not an index, write nil for index.

Special Form: with-output-to-string (var [string] [index]) body...
This special form provides a variety of ways to send output to a string through an I/O stream.

 
(with-output-to-string (var)
  body)
evaluates the forms in body with var bound to a stream which saves the characters output to it in a string. The value of the special form is the string.

 
(with-output-to-string (var string)
  body)
will append its output to the string which is the value of the form string. (This is like the string-nconc function; see (string-nconc-fun).) The value returned is the value of the last form in the body, rather than the string. Multiple values are not returned. string must have an array-leader; element 0 of the array-leader will be used as the fill-pointer. If string is too small to contain all the output, adjust-array-size will be used to make it bigger.

 
(with-output-to-string (var string index)
  body)
is similar to the above except that index is a variable or setf-able reference which contains the index of the next character to be stored into. It must be initialized outside the with-output-to-string and will be updated upon normal exit. The value of index is not updated until with-output-to-string returns, so you can't use its value within the body to see how far the writing has gotten. The presence of index means that string is not required to have a fill-pointer; if it does have one it will be updated.

The stream is a "downward closure" simulated with special variables, so be careful what you do with it. You cannot use it after control leaves the body, and you cannot nest two with-output-to-string special forms and use both streams since the special-variable bindings associated with the streams will conflict. It is done this way to avoid any allocation of memory.

It is OK to use a with-input-from-string and with-output-to-string nested within one another, so long as there is only one of each.

Another way of doing output to a string is to use the format facility (see (format-fun)).


[ < ] [ > ]   [ << ] [ Up ] [ >> ]         [Top] [Contents] [Index] [ ? ]

11.6 "Maclisp-Compatible Functions"

The following functions are provided primarily for Maclisp compatibility.

Function: alphalessp string1 string2
(alphalessp string1 string2) is equivalent to (string-lessp string1 string2).

Function: getchar string index
Returns the index'th character of string as a symbol. Note that 1-origin indexing is used. This function is mainly for Maclisp compatibility; aref should be used to index into strings (however, aref will not coerce symbols or numbers into strings).

Function: getcharn string index
Returns the index'th character of string as a fixnum. Note that 1-origin indexing is used. This function is mainly for Maclisp compatibility; aref should be used to index into strings (however, aref will not coerce symbols or numbers into strings).

Function: ascii x
ascii is like character, but returns a symbol whose printname is the character instead of returning a fixnum.
 
Examples:
(ascii 101) => A
(ascii 56) => /.
The symbol returned is interned in the current package (see (package)).

Function: maknam char-list
maknam returns an uninterned symbol whose print-name is a string made up of the characters in char-list.
 
Example:
(maknam '(a b #/0 d)) => ab0d

Function: implode char-list
implode is like maknam except that the returned symbol is interned in the current package.

The samepnamep function is also provided; see (samepnamep-fun).


[ << ] [ >> ]           [Top] [Contents] [Index] [ ? ]

This document was generated by Brad Parker on June, 13 2006 using texi2html