[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This chapter discusses functions that manipulate conses, and higher-level structures made up of conses such as lists and trees. It also discusses hash tables and resources, which are related facilities.
A cons is a primitive Lisp data object that is extremely simple: it knows about two other objects, called its car and its cdr.
A list is recursively defined to be the symbol nil, or a cons whose cdr is a list. A typical list is a chain of conses: the cdr of each is the next cons in the chain, and the cdr of the last one is the symbol nil. The cars of each of these conses are called the elements of the list. A list has one element for each cons; the empty list, nil, has no elements at all. Here are the printed representations of some typical lists:
(foo bar) ;This list has two elements. (a (b c d) e) ;This list has three elements. |
A "dotted list" is like a list except that the cdr of the last cons does not have to be nil. This name comes from the printed representation, which includes a "dot" character. Here is an example:
(a b . c) |
A tree is any data structure made up of conses whose cars and cdrs are other conses. The following are all printed representations of trees:
(foo . bar) ((a . b) (c . d)) ((a . b) (c d e f (g . 5) s) (7 . 4)) |
These definitions are not mutually exclusive. Consider a cons whose car is a and whose cdr is (b (c d) e). Its printed representation is
(a b (c d) e) |
A circular list is like a list except that the cdr of the last cons, instead of being nil, is the first cons of the list. This means that the conses are all hooked together in a ring, with the cdr of each cons being the next cons in the ring. While these are perfectly good Lisp objects, and there are functions to deal with them, many other functions will have trouble with them. Functions that expect lists as their arguments often iterate down the chain of conses waiting to see a nil, and when handed a circular list this can cause them to compute forever. The printer (see (print-fun)) is one of these functions; if you try to print a circular list the printer will never stop producing text. You have to be careful what you do with circular lists.
The Lisp Machine internally uses a storage scheme called "cdr coding" to represent conses. This scheme is intended to reduce the amount of storage used in lists. The use of cdr-coding is invisible to programs except in terms of storage efficiency; programs will work the same way whether or not lists are cdr-coded or not. Several of the functions below mention how they deal with cdr-coding. You can completely ignore all this if you want. However, if you are writing a program that allocates a lot of conses and you are concerned with storage efficiency, you may want to learn about the cdr-coded representation and how to control it. The cdr-coding scheme is discussed in (cdr-code).
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Example: (car '(a b c)) => a |
Example: (cdr '(a b c)) => (b c) |
Officially car and cdr are only applicable to conses and locatives. However, as a matter of convenience, car and cdr of nil return nil.
Example: (cddadr x) is the same as (cdr (cdr (car (cdr x)))) |
Examples: (cons 'a 'b) => (a . b) (cons 'a (cons 'b (cons 'c nil))) => (a b c) (cons 'a '(b c d)) => (a b c d) |
Example: (xcons 'a 'b) => (b . a) |
Example: (cons-in-area 'a 'b my-area) => (a . b) |
The backquote reader macro facility is also generally useful for creating list structure, especially mostly-constant list structure, or forms constructed by plugging variables into a template. It is documented in the chapter on macros; see (macro).
Note: there is no cdr-location function; it is difficult because of the cdr-coding scheme (see (cdr-code)).
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Examples: (length nil) => 0 (length '(a b c d)) => 4 (length '(a (b c) d)) => 3 |
(defun length (x) (cond ((atom x) 0) ((1+ (length (cdr x)))) )) |
(defun length (x) (do ((n 0 (1+ n)) (y x (cdr y))) ((atom y) n) )) |
Examples: (nth 1 '(foo bar gack)) => bar (nth 3 '(foo bar gack)) => nil |
Note: this is not the same as the InterLisp function called nth, which is similar to but not exactly the same as the Lisp Machine function nthcdr. Also, some people have used macros and functions called nth of their own in their Maclisp programs, which may not work the same way; be careful.
nth could have been defined by: (defun nth (n list) (do ((i n (1- i)) (l list (cdr l))) ((zerop i) (car l)))) |
Examples: (nthcdr 0 '(a b c)) => (a b c) (nthcdr 2 '(a b c)) => (c) |
This is similar to InterLisp's function nth, except that the InterLisp function is one-based instead of zero-based; see the InterLisp manual for details. nthcdr could have been defined by:
(defun nthcdr (n list) (do ((i 0 (1+ i)) (list list (cdr list))) ((= i n) list))) |
Example: (setq x '(a b c d)) (last x) => (d) (rplacd (last x) '(e f)) x => '(a b c d e f) |
(defun last (x) (cond ((atom x) x) ((atom (cdr x)) x) ((last (cdr x))) )) |
Example: (list 3 4 'a (car '(b . c)) (+ 6 -2)) => (3 4 a b 4) |
list could have been defined by: (defun list (&rest args) (let ((list (make-list (length args)))) (do ((l list (cdr l)) (a args (cdr a))) ((null a) list) (rplaca l (car a))))) |
Example: (list* 'a 'b 'c 'd) => (a b c . d) |
(cons 'a (cons 'b (cons 'c 'd))) |
More examples: (list* 'a 'b) => (a . b) (list* 'a) => a |
make-list always creates a cdr-coded list (see (cdr-code)).
Examples: (make-list 3) => (nil nil nil) (make-list 4 ':initial-value 7) => (7 7 7 7) |
When make-list was originally implemented, it took exactly two arguments: the area and the length. This obsolete form is still supported so that old programs will continue to work, but the new keyword-argument form is preferred.
(mapcar (function +) foo (circular-list 5)) |
circular-list could have been defined by: (defun circular-list (&rest elements) (setq elements (copylist* elements)) (rplacd (last elements) elements) elements) |
Example: (reverse '(a b (c d) e)) => (e (c d) b a) |
(defun reverse (x) (do ((l x (cdr l)) ; scan down argument, (r nil ; putting each element (cons (car l) r))) ; into list, until ((null l) r))) ; no more elements. |
Example: (nreverse '(a b c)) => (c b a) |
(defun nreverse (x) (cond ((null x) nil) ((nreverse1 x nil)))) (defun nreverse1 (x y) ;auxiliary function (cond ((null (cdr x)) (rplacd x y)) ((nreverse1 (cdr x) (rplacd x y))))) ;; this last call depends on order of argument evaluation. |
Currently, nreverse does something inefficient with cdr-coded (see (cdr-code)) lists, because it just uses rplacd in the straightforward way. This may be fixed someday. In the meantime reverse might be preferable in some cases.
Example: (append '(a b c) '(d e f) nil '(g)) => (a b c d e f g) |
A version of append which only accepts two arguments could have been defined by:
(defun append2 (x y) (cond ((null x) y) ((cons (car x) (append2 (cdr x) y)) ))) |
The generalization to any number of arguments could then be made (relying on car of nil being nil):
(defun append (&rest args) (if (< (length args) 2) (car args) (append2 (car args) (apply (function append) (cdr args))))) |
These definitions do not express the full functionality of append; the real definition minimizes storage utilization by cdr-coding (see (cdr-code)) the list it produces, using cdr-next except at the end where a full node is used to link to the last argument, unless the last argument is nil in which case cdr-nil is used.
To copy a list, use copylist (see (copylist-fun)); the old practice of using append to copy lists is unclear and obsolete.
Example: (setq x '(a b c)) (setq y '(d e f)) (nconc x y) => (a b c d e f) x => (a b c d e f) |
nconc could have been defined by:
(defun nconc (x y) ;for simplicity, this definition (cond ((null x) y) ;only works for 2 arguments. (t (rplacd (last x) y) ;hook y onto x x))) ;and return the modified x. |
nreconc could have been defined by:
(defun nreconc (x y) (cond ((null x) y) ((nreverse1 x y)) )) |
Examples: (butlast '(a b c d)) => (a b c) (butlast '((a b) (c d))) => ((a b)) (butlast '(a)) => nil (butlast nil) => nil |
Examples: (setq foo '(a b c d)) (nbutlast foo) => (a b c) foo => (a b c) (nbutlast '(a)) => nil |
Example: (firstn 2 '(a b c d)) => (a b) (firstn 0 '(a b c d)) => nil (firstn 6 '(a b c d)) => (a b c d nil nil) |
(nleft n list tail) takes cdr of list enough times that taking n more cdrs would yield tail, and returns that. You can see that when tail is nil this is the same as the two-argument case. If tail is not eq to any tail of list, nleft will return nil.
Examples: (setq x '(a b c d e)) (setq y (cdddr x)) => (d e) (ldiff x y) => (a b c) but (ldiff '(a b c d) '(c d)) => (a b c d) since the sublist was not eq to any part of the list. |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The functions rplaca and rplacd are used to make alterations in already-existing list structure; that is, to change the cars and cdrs of existing conses.
The structure is not copied but is physically altered; hence caution should be exercised when using these functions, as strange side-effects can occur if portions of list structure become shared unbeknownst to the programmer. The nconc, nreverse, nreconc, and nbutlast functions already described, and the delq family described later, have the same property.
Example: (setq g '(a b c)) (rplaca (cdr g) 'd) => (d c) Now g => (a d c) |
Example: (setq x '(a b c)) (rplacd x 'd) => (a . d) Now x => (a . d) |
Example: (subst 'Tempest 'Hurricane '(Shakespeare wrote (The Hurricane))) => (Shakespeare wrote (The Tempest)) |
subst could have been defined by:
(defun subst (new old tree) (cond ((equal tree old) new) ;if item equal to old, replace. ((atom tree) tree) ;if no substructure, return arg. ((cons (subst new old (car tree)) ;otherwise recurse. (subst new old (cdr tree)))))) |
To copy a tree, use copytree (see (copytree-fun)); the old practice of using subst to copy trees is unclear and obsolete.
Note: certain details of subst may be changed in the future. It may possibly be changed to use eq rather than equal for the comparison, and possibly may substitute only in cars, not in cdrs. This is still being discussed.
(defun nsubst (new old tree) (cond ((eq tree old) new) ;If item eq to old, replace. ((atom tree) tree) ;If no substructure, return arg. (t ;Otherwise, recurse. (rplaca tree (nsubst new old (car tree))) (rplacd tree (nsubst new old (cdr tree))) tree))) |
Example: (sublis '((x . 100) (z . zprime)) '(plus x (minus g z x p) 4)) => (plus 100 (minus g zprime 100 p) 4) |
sublis could have been defined by: (defun sublis (alist sexp) (cond ((atom sexp) (let ((tem (assq sexp alist))) (if tem (cdr tem) sexp))) ((let ((car (sublis alist (car sexp))) (cdr (sublis alist (cdr sexp)))) (if (and (eq (car sexp) car) (eq (cdr sexp) cdr)) sexp (cons car cdr)))))) |
nsublis could have been defined by: (defun nsublis (alist tree) (cond ((atom tree) (let ((tem (assq tree alist))) (if tem (cdr tem) tree))) (t (rplaca tree (nsublis alist (car tree))) (rplacd tree (nsublis alist (cdr tree))) tree))) |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This section explains the internal data format used to store conses inside the Lisp Machine. Casual users don't have to worry about this; you can skip this section if you want. It is only important to read this section if you require extra storage efficiency in your program.
The usual and obvious internal representation of conses in any implementation of Lisp is as a pair of pointers, contiguous in memory. If we call the amount of storage that it takes to store a Lisp pointer a "word", then conses normally occupy two words. One word (say it's the first) holds the car, and the other word (say it's the second) holds the cdr. To get the car or cdr of a list, you just reference this memory location, and to change the car or cdr, you just store into this memory location.
Very often, conses are used to store lists. If the above representation is used, a list of n elements requires two times n words of memory: n to hold the pointers to the elements of the list, and n to point to the next cons or to nil. To optimize this particular case of using conses, the Lisp Machine uses a storage representation called "cdr coding" to store lists. The basic goal is to allow a list of n elements to be stored in only n locations, while allowing conses that are not parts of lists to be stored in the usual way.
The way it works is that there is an extra two-bit field in every word of memory, called the "cdr-code" field. There are three meaningful values that this field can have, which are called cdr-normal, cdr-next, and cdr-nil. The regular, non-compact way to store a cons is by two contiguous words, the first of which holds the car and the second of which holds the cdr. In this case, the cdr code of the first word is cdr-normal. (The cdr code of the second word doesn't matter; as we will see, it is never looked at.) The cons is represented by a pointer to the first of the two words. When a list of n elements is stored in the most compact way, pointers to the n elements occupy n contiguous memory locations. The cdr codes of all these locations are cdr-next, except the last location whose cdr code is cdr-nil. The list is represented as a pointer to the first of the n words.
Now, how are the basic operations on conses defined to work based on this data structure? Finding the car is easy: you just read the contents of the location addressed by the pointer. Finding the cdr is more complex. First you must read the contents of the location addressed by the pointer, and inspect the cdr-code you find there. If the code is cdr-normal, then you add one to the pointer, read the location it addresses, and return the contents of that location; that is, you read the second of the two words. If the code is cdr-next, you add one to the pointer, and simply return that pointer without doing any more reading; that is, you return a pointer to the next word in the n-word block. If the code is cdr-nil, you simply return nil.
If you examine these rules, you will find that they work fine even if you mix the two kinds of storage representation within the same list. There's no problem with doing that.
How about changing the structure? Like car, rplaca is very easy; you just store into the location addressed by the pointer. To do an rplacd you must read the location addressed by the pointer and examine the cdr code. If the code is cdr-normal, you just store into the location one greater than that addressed by the pointer; that is, you store into the second word of the two words. But if the cdr-code is cdr-next or cdr-nil, there is a problem: there is no memory cell that is storing the cdr of the cons. That is the cell that has been optimized out; it just doesn't exist.
This problem is dealt with by the use of "invisible pointers". An invisible pointer is a special kind of pointer, recognized by its data type (Lisp Machine pointers include a data type field as well as an address field). The way they work is that when the Lisp Machine reads a word from memory, if that word is an invisible pointer then it proceeds to read the word pointed to by the invisible pointer and use that word instead of the invisible pointer itself. Similarly, when it writes to a location, it first reads the location, and if it contains an invisible pointer then it writes to the location addressed by the invisible pointer instead. (This is a somewhat simplified explanation; actually there are several kinds of invisible pointer that are interpreted in different ways at different times, used for things other than the cdr coding scheme.)
Here's how to do an rplacd when the cdr code is cdr-next or cdr-nil. Call the location addressed by the first argument to rplacd l. First, you allocate two contiguous words (in the same area that l points to). Then you store the old contents of l (the car of the cons) and the second argument to rplacd (the new cdr of the cons) into these two words. You set the cdr-code of the first of the two words to cdr-normal. Then you write an invisible pointer, pointing at the first of the two words, into location l. (It doesn't matter what the cdr-code of this word is, since the invisible pointer data type is checked first, as we will see.)
Now, whenever any operation is done to the cons (car, cdr, rplaca, or rplacd), the initial reading of the word pointed to by the Lisp pointer that represents the cons will find an invisible pointer in the addressed cell. When the invisible pointer is seen, the address it contains is used in place of the original address. So the newly-allocated two-word cons will be used for any operation done on the original object.
Why is any of this important to users? In fact, it is all invisible to you; everything works the same way whether or not compact representation is used, from the point of view of the semantics of the language. That is, the only difference that any of this makes is a difference in efficiency. The compact representation is more efficient in most cases. However, if the conses are going to get rplacd'ed, then invisible pointers will be created, extra memory will be allocated, and the compact representation will be seen to degrade storage efficiency rather than improve it. Also, accesses that go through invisible pointers are somewhat slower, since more memory references are needed. So if you care a lot about storage efficiency, you should be careful about which lists get stored in which representations.
You should try to use the normal representation for those data structures that will be subject to rplacding operations, including nconc and nreverse, and the compact representation for other structures. The functions cons, xcons, ncons, and their area variants make conses in the normal representation. The functions list, list*, list-in-area, make-list, and append use the compact representation. The other list-creating functions, including read, currently make normal lists, although this might get changed. Some functions, such as sort, take special care to operate efficiently on compact lists (sort effectively treats them as arrays). nreverse is rather slow on compact lists, currently, since it simple-mindedly uses rplacd, but this will be changed.
(copylist x) is a suitable way to copy a list, converting it into compact form (see (copylist-fun)).
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Zetalisp includes functions which simplify the maintenance of tabular data structures of several varieties. The simplest is a plain list of items, which models (approximately) the concept of a set. There are functions to add (cons), remove (delete, delq, del, del-if, del-if-not, remove, remq, rem, rem-if, rem-if-not), and search for (member, memq, mem) items in a list. Set union, intersection, and difference functions can be easily written using these.
.setq assoc-lists-section section-page Association lists are very commonly used. An association list is a list of conses. The car of each cons is a "key" and the cdr is a "datum", or a list of associated data. The functions assoc, assq, ass, memass, and rassoc may be used to retrieve the data, given the key. For example,
((tweety . bird) (sylvester . cat)) |
Structured records can be stored as association lists or as stereotyped cons-structures where each element of the structure has a certain car-cdr path associated with it. However, these are better implemented using structure macros (see (defstruct)).
Simple list-structure is very convenient, but may not be efficient enough for large data bases because it takes a long time to search a long list. Zetalisp includes hash table facilities for more efficient but more complex tables (see (hash-table)), and a hashing function (sxhash) to aid users in constructing their own facilities.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Examples: (memq 'a '(1 2 3 4)) => nil (memq 'a '(g (x a y) c a d e a f)) => (a d e a f) |
Example: (let ((sublist (memq x z))) ;Search for x in the list z. (if (not (null sublist)) ;If it is found, (rplaca sublist y))) ;Replace it with y. |
memq could have been defined by: (defun memq (item list) (cond ((null list) nil) ((eq item (car list)) list) (t (memq item (cdr list))) )) |
memq is hand-coded in microcode and therefore especially fast.
member could have been defined by:
(defun member (item list) (cond ((null list) nil) ((equal item (car list)) list) (t (member item (cdr list))) )) |
(mem #'< 4 list) |
Examples: (find-position-in-list 'a '(a b c)) => 0 (find-position-in-list 'c '(a b c)) => 2 (find-position-in-list 'e '(a b c)) => nil |
(defun tailp (sublist list) (do list list (cdr list) (null list) (if (eq sublist list) (return t)))) |
(setq a (delq 'b a)) |
(delq 'b a) |
(delq item list n) is like (delq item list) except only the first n instances of item are deleted. n is allowed to be zero. If n is greater than or equal to the number of occurrences of item in the list, all occurrences of item in the list will be deleted.
Example: (delq 'a '(b a c (a b) d a e)) => (b c (a b) d e) |
delq could have been defined by:
(defun delq (item list &optional (n -1)) (cond ((or (atom list) (zerop n)) list) ((eq item (car list)) (delq item (cdr list) (1- n))) (t (rplacd list (delq item (cdr list) n))))) |
Examples: (setq x '(a b c d e f)) (remq 'b x) => (a c d e f) x => (a b c d e f) (remq 'b '(a b c b a b) 2) => (a c a b) |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Examples: (assq 'r '((a . b) (c . d) (r . x) (s . y) (r . z))) => (r . x) (assq 'fooo '((foo . bar) (zoo . goo))) => nil (assq 'b '((a b c) (b c d) (x y z))) => (b c d) |
It is okay to rplacd the result of assq as long as it is not nil, if your intention is to "update" the "table" that was assq's second argument.
Example: (setq values '((x . 100) (y . 200) (z . 50))) (assq 'y values) => (y . 200) (rplacd (assq 'y values) 201) (assq 'y values) => (y . 201) now |
A typical trick is to say (cdr (assq x y)). Since the cdr of nil is guaranteed to be nil, this yields nil if no pair is found (or if a pair is found whose cdr is nil.)
assq could have been defined by:
(defun assq (item list) (cond ((null list) nil) ((eq item (caar list)) (car list)) ((assq item (cdr list))) )) |
Example: (assoc '(a b) '((x . y) ((a b) . 7) ((c . d) .e))) => ((a b) . 7) |
(defun assoc (item list) (cond ((null list) nil) ((equal item (caar list)) (car list)) ((assoc item (cdr list))) )) |
(defun rassq (item in-list) (do l in-list (cdr l) (null l) (and (eq item (cdar l)) (return (car l))))) |
(defun sassq (item alist fcn) (or (assq item alist) (apply fcn nil))) |
sassq and sassoc (see below) are of limited use. These are primarily leftovers from Lisp 1.5.
(defun sassoc (item alist fcn) (or (assoc item alist) (apply fcn nil))) |
Example: (pairlis '(beef clams kitty) '(roast fried yu-shiang)) => ((beef . roast) (clams . fried) (kitty . yu-shiang)) |
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
From time immemorial, Lisp has had a kind of tabular data structure called a property list (plist for short). A property list contains zero or more entries; each entry associates from a keyword symbol (called the indicator) to a Lisp object (called the value or, sometimes, the property). There are no duplications among the indicators; a property-list can only have one property at a time with a given name.
This is very similar to an association list. The difference is that a property list is an object with a unique identity; the operations for adding and removing property-list entries are side-effecting operations which alter the property-list rather than making a new one. An association list with no entries would be the empty list (), i.e. the symbol nil. There is only one empty list, so all empty association lists are the same object. Each empty property-list is a separate and distinct object.
The implementation of a property list is a memory cell containing a list with an even number (possibly zero) of elements. Each pair of elements constitutes a property; the first of the pair is the indicator and the second is the value. The memory cell is there to give the property list a unique identity and to provide for side-effecting operations.
The term "property list" is sometimes incorrectly used to refer to the list of entries inside the property list, rather than the property list itself. This is regrettable and confusing.
How do we deal with "memory cells" in Lisp; i.e. what kind of Lisp object is a property list? Rather than being a distinct primitive data type, a property list can exist in one of three forms:
1. A property list can be a cons whose cdr is the list of entries and whose car is not used and available to the user to store something.
2. The system associates a property list with every symbol (see (symbol-plist-section)). A symbol can be used where a property list is expected; the property-list primitives will automatically find the symbol's property list and use it.
3. A property list can be a memory cell in the middle of some data structure, such as a list, an array, an instance, or a defstruct. An arbitrary memory cell of this kind is named by a locative (see (locative)). Such locatives are typically created with the locf special form (see (locf-fun)).
.setq disembodied-property-list page Property lists of the first kind are called "disembodied" property lists because they are not associated with a symbol or other data structure. The way to create a disembodied property list is (ncons nil), or (ncons data) to store data in the car of the property list.
Here is an example of the list of entries inside the property list of a symbol named b1 which is being used by a program which deals with blocks:
(color blue on b6 associated-with (b2 b3 b4)) |
(get 'foo 'baz) => 3 (get 'foo 'zoo) => nil |
(bar (1 2 3) baz (3 2 1) color blue height six-two) |
(getl 'foo '(baz height)) => (baz (3 2 1) color blue height six-two) |
When more than one of the indicators in indicator-list is present in plist, which one getl returns depends on the order of the properties. This is the only thing that depends on that order. The order maintained by putprop and defprop is not defined (their behavior with respect to order is not guaranteed and may be changed without notice).
Example: (putprop 'Nixon 'not 'crook) |
Example: (defprop foo bar next-to) |
(putprop 'foo 'bar 'next-to) |
(color blue height six-three near-to bar) |
(remprop 'foo 'height) => (six-three near-to bar) |
(color blue near-to bar) |
There is a mixin flavor, called si:property-list-mixin, that provides messages that do things analogous to what the above functions do. [Currently, the above functions do not work on flavor instances, but this will be fixed.]
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
A hash table is a Lisp object that works something like a property list. Each hash table has a set of entries, each of which associates a particular key with a particular value (or sequence of values). The basic functions that deal with hash tables can create entries, delete entries, and find the value that is associated with a given key. Finding the value is very fast even if there are many entries, because hashing is used; this is an important advantage of hash tables over property lists. Hashing is explained in (hash-section).
A given hash table stores a fixed number of values for each key; by default, there is only one value. Each time you specify a new value or sequence of values, the old one(s) are lost.
Hash tables come in two kinds, the difference being whether the keys are compared using eq or using equal. In other words, there are hash tables which hash on Lisp objects (using eq) and there are hash tables which hash on trees (using equal). The following discussion refers to the eq kind of hash table; the other kind is described later, and works analogously.
Hash tables of the first kind are created with the function make-hash-table, which takes various options. New entries are added to hash tables with the puthash function. To look up a key and find the associated value(s), the gethash function is used. To remove an entry, use remhash. Here is a simple example.
(setq a (make-hash-table)) (puthash 'color 'brown a) (puthash 'name 'fred a) (gethash 'color a) => brown (gethash 'name a) => fred |
In this example, the symbols color and name are being used as keys, and the symbols brown and fred are being used as the associated values. The hash table remembers one value for each key, since we did not specify otherwise, and has two items in it, one of which associates from color to brown, and the other of which associates from name to fred.
Keys do not have to be symbols; they can be any Lisp object. Likewise values can be any Lisp object. The Lisp function eq is used to compare keys, rather than equal. This means that keys are really objects, but it means that it is not reasonable to use numbers other than fixnums as keys.
When a hash table is first created, it has a size, which is the maximum number of entries it can hold. Usually the actual capacity of the table is somewhat less, since the hashing is not perfectly collision-free. With the maximum possible bad luck, the capacity could be very much less, but this rarely happens. If so many entries are added that the capacity is exceeded, the hash table will automatically grow, and the entries will be rehashed (new hash values will be recomputed, and everything will be rearranged so that the fast hash lookup still works). This is transparent to the caller; it all happens automatically.
The describe function (see (describe-fun)) prints a variety of useful information when applied to a hash table.
This hash table facility is similar to the hasharray facility of Interlisp, and some of the function names are the same. However, it is not compatible. The exact details and the order of arguments are designed to be consistent with the rest of Zetalisp rather than with Interlisp. For instance, the order of arguments to maphash is different, we do not have the Interlisp "system hash table", and we do not have the Interlisp restriction that keys and values may not be nil. Note, however, that the order of arguments to gethash, puthash, and remhash is not consistent with the Zetalisp's get, putprop, and remprop, either. This is an unfortunate result of the haphazard historical development of Lisp.
If the calling program is using multiprocessing, it must be careful to make sure that there are never two processes both referencing the hash table at the same time. There is no locking built into hash tables; if you have two processes that both want to reference the same hash table, you must arrange mutual exclusion yourself by using a lock or some other means. Even two processes just doing gethash on the same hash table must synchronize themselves, because gethash may be forced by garbage collection to rehash the table. Don't worry about this if you don't use multiprocessing; but if you do use multiprocessing, you will have a lot of trouble if you don't understand this.
Hash tables are implemented with a special kind of array. arrayp of a hash table will return t. However, it is not recommended to use ordinary array operations on a hash table. Hash tables should be manipulated only with the functions described below.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This section documents the functions for eq hash tables, which use objects as keys and associate other objects with them.
Returns also a third value, a list which overlays the hash table entry. Its car is the key; the remaining elements are the values in the entry. This is how you can access values other than the first, if the hash table contains more than one value per entry.
If the hash table associates more than one value with each key, the remaining values in the entry are taken from extra-values.
If the hash table has more than one value per key, all the values, in order, are supplied as arguments, with the corresponding key.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This section documents the functions for equal hash tables, which use trees as keys and associate objects with them. The function to make one is slightly different from make-hash-table because the implementations of the two kinds of hash table differ, but analogous operations are provided.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The eq type hash tables actually hash on the address of the representation of the object. When the copying garbage collector changes the addresses of object, it lets the hash facility know so that gethash will rehash the table based on the new object addresses.
There will eventually be an option to make-hash-table which tells it to make a "non-GC-protecting" hash table. This is a special kind of hash table with the property that if one of its keys becomes "garbage", i.e. is an object not known about by anything other than the hash table, then the entry for that key will be silently removed from the table. When these exist they will be documented in this section.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Hashing is a technique used in algorithms to provide fast retrieval of data in large tables. A function, known as a "hash function", is created, which takes an object that might be used as a key, and produces a number associated with that key. This number, or some function of it, can be used to specify where in a table to look for the datum associated with the key. It is always possible for two different objects to "hash to the same value"; that is, for the hash function to return the same number for two distinct objects. Good hash functions are designed to minimize this by evenly distributing their results over the range of possible numbers. However, hash table algorithms must still deal with this problem by providing a secondary search, sometimes known as a rehash. For more information, consult a textbook on computer algorithms.
Here is an example of how to use sxhash in maintaining hash tables of trees:
(defun knownp (x &aux i bkt) ;look up x in the table (setq i (abs (remainder (sxhash x) 176))) ;The remainder should be reasonably randomized. (setq bkt (aref table i)) ;bkt is thus a list of all those expressions that ;hash into the same number as does x. (memq x bkt)) |
To write an "intern" for trees, one could
(defun sintern (x &aux bkt i tem) (setq i (abs (remainder (sxhash x) 2n-1))) ;2n-1 stands for a power of 2 minus one. ;This is a good choice to randomize the ;result of the remainder operation. (setq bkt (aref table i)) (cond ((setq tem (memq x bkt)) (car tem)) (t (aset (cons x bkt) table i) x))) |
sxhash provides what is called "hashing on equal"; that is, two objects that are equal are considered to be "the same" by sxhash. In particular, if two strings differ only in alphabetic case, sxhash will return the same thing for both of them because they are equal. The value returned by sxhash does not depend on the value of alphabetic-case-affects-string-comparison (see (alphabetic-case-affects-string-comparison-var)).
Therefore, sxhash is useful for retrieving data when two keys that are not the same object but are equal are considered the same. If you consider two such keys to be different, then you need "hashing on eq", where two different objects are always considered different. In some Lisp implementations, there is an easy way to create a hash function that hashes on eq, namely, by returning the virtual address of the storage associated with the object. But in other implementations, of which Zetalisp is one, this doesn't work, because the address associated with an object can be changed by the relocating garbage collector. The hash tables created by make-hash-table deal with this problem by using the appropriate subprimitives so that they interface correctly with the garbage collector. If you need a hash table that hashes on eq, it is already provided; if you need an eq hash function for some other reason, you must build it yourself, either using the provided eq hash table facility or carefully using subprimitives.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Several functions are provided for sorting arrays and lists. These functions use algorithms which always terminate no matter what sorting predicate is used, provided only that the predicate always terminates. The main sorting functions are not stable; that is, equal items may not stay in their original order. If you want a stable sort, use the stable versions. But if you don't care about stability, don't use them since stable algorithms are significantly slower.
After sorting, the argument (be it list or array) has been rearranged internally so as to be completely ordered. In the case of an array argument, this is accomplished by permuting the elements of the array, while in the list case, the list is reordered by rplacd's in the same manner as nreverse. Thus if the argument should not be clobbered, the user must sort a copy of the argument, obtainable by fillarray or copylist, as appropriate. Furthermore, sort of a list is like delq in that it should not be used for effect; the result is conceptually the same as the argument but in fact is a different Lisp object.
Should the comparison predicate cause an error, such as a wrong type argument error, the state of the list or array being sorted is undefined. However, if the error is corrected the sort will, of course, proceed correctly.
The sorting package is smart about compact lists; it sorts compact sublists as if they were arrays. See (cdr-code) for an explanation of compact lists, and A. I. Memo 587 by Guy L. Steele Jr. for an explanation of the sorting algorithm.
The sort function proceeds to sort the contents of the array or list under the ordering imposed by the predicate, and returns the array or list modified into sorted order. Note that since sorting requires many comparisons, and thus many calls to the predicate, sorting will be much faster if the predicate is a compiled function rather than interpreted.
Example: (defun mostcar (x) (cond ((symbolp x) x) ((mostcar (car x))))) (sort 'fooarray (function (lambda (x y) (alphalessp (mostcar x) (mostcar y))))) |
(Tokens (The lion sleeps tonight)) (Carpenters (Close to you)) ((Rolling Stones) (Brown sugar)) ((Beach Boys) (I get around)) (Beatles (I want to hold your hand)) |
((Beach Boys) (I get around)) (Beatles (I want to hold your hand)) (Carpenters (Close to you)) ((Rolling Stones) (Brown sugar)) (Tokens (The lion sleeps tonight)) |
When sort is given a list, it may change the order of the conses of the list (using rplacd), and so it cannot be used merely for side-effect; only the returned value of sort will be the sorted list. This will mess up the original list; if you need both the original list and the sorted list, you must copy the original and sort the copy (see copylist, (copylist-fun)).
Sorting an array just moves the elements of the array into different places, and so sorting an array for side-effect only is all right.
If the argument to sort is an array with a fill pointer, note that, like most functions, sort considers the active length of the array to be the length, and so only the active part of the array will be sorted (see array-active-length, (array-active-length-fun)).
(sortcar '((3 . dog) (1 . cat) (2 . bird)) #'<) => ((1 . cat) (2 . bird) (3 . dog)) |
Remember that sortcar, when given a list, may change the order of the conses of the list (using rplacd), and so it cannot be used merely for side-effect; only the returned value of sortcar will be the sorted list.
[ < ] | [ > ] | [ << ] | [ Up ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Storage allocation is handled differently by different computer systems. In many languages, the programmer must spend a lot of time thinking about when variables and storage units are allocated and deallocated. In Lisp, freeing of allocated storage is normally done automatically by the Lisp system; when an object is no longer accessible to the Lisp environment, it is garbage collected. This relieves the programmer of a great burden, and makes writing programs much easier.
However, automatic freeing of storage incurs an expense: more computer resources must be devoted to the garbage collector. If a program is designed to allocate temporary storage, which is then left as garbage, more of the computer must be devoted to the collection of garbage; this expense can be high. In some cases, the programmer may decide that it is worth putting up with the inconvenience of having to free storage under program control, rather than letting the system do it automatically, in order to prevent a great deal of overhead from the garbage collector.
It usually is not worth worrying about freeing of storage when the units of storage are very small things such as conses or small arrays. Numbers are not a problem, either; fixnums and small flonums do not occupy storage, and the system has a special way of garbage-collecting the other kinds of numbers with low overhead. But when a program allocates and then gives up very large objects at a high rate (or large objects at a very high rate), it can be very worthwhile to keep track of that one kind of object manually. Within the Lisp Machine system, there are several programs that are in this position. The Chaosnet software allocates and frees "packets", which are moderately large, at a very high rate. The window system allocates and frees certain kinds of windows, which are very large, moderately often. Both of these programs manage their objects manually, keeping track of when they are no longer used.
When we say that a program "manually frees" storage, it does not really mean that the storage is freed in the same sense that the garbage collector frees storage. Instead, a list of unused objects is kept. When a new object is desired, the program first looks on the list to see if there is one around already, and if there is it uses it. Only if the list is empty does it actually allocate a new one. When the program is finished with the object, it returns it to this list.
The functions and special forms in this section perform the above function. The set of objects forming each such list is called a "resource"; for example, there might be a Chaosnet packet resource. defresource defines a new resource; allocate-resource allocates one of the objects; deallocate-resource frees one of the objects (putting it back on the list); and using-resource temporarily allocates an object and then frees it.
(defresource name parameters keyword value keyword value ...) |
name should be a symbol; it is the name of the resource and gets a defresource property of the internal data structure representing the resource.
parameters is a lambda-list giving names and default values (if &optional is used) of parameters to an object of this type. For example, if one had a resource of two-dimensional arrays to be used as temporary storage in a calculation, the resource would typically have two parameters, the number of rows and the number of columns. In the simplest case parameters is ().
The keyword options control how the objects of the resource are made and kept track of. The following keywords are allowed:
If these options are used with forms (rather than functions), the forms get compiled into functions as part of the expansion of defresource. These functions are given names like (:property resource-name si:resource-constructor); these names are not guaranteed not to change in the future.
Most of the options are not used in typical cases. Here is an example:
(defresource two-dimensional-array (rows columns) :constructor (make-array (list rows columns))) |
Suppose the array was usually going to be 100 by 100, and you wanted to preallocate one during loading of the program so that the first time you needed an array you wouldn't have to spend the time to create one. You might simply put
(using-resource (foo two-dimensional-array 100 100) ) |
(defresource two-dimensional-array (&optional (rows 100) (columns 100)) :constructor (make-array (list rows columns)) :initial-copies 1) |
Here is an example of how you might use the :matcher option. Suppose you wanted to have a resource of two-dimensional arrays, as above, except that when you allocate one you don't care about the exact size, as long as it is big enough. Furthermore you realize that you are going to have a lot of different sizes and if you always allocated one of exactly the right size, you would allocate a lot of different arrays and would not reuse a pre-existing array very often. So you might:
(defresource sloppy-two-dimensional-array (rows columns) :constructor (make-array (list rows columns)) :matcher (and ( (array-dimension-n 1 object) rows) ( (array-dimension-n 2 object) columns))) |
Note that the using-resource special form is usually what you want to use, rather than allocate-resource itself; see below.
using-resource is often more convenient than calling allocate-resource and deallocate-resource. Furthermore it is careful to free the object when the body is exited, whether it returns normally or via *throw. This is done by using unwind-protect; see (unwind-protect-fun).
Here is an example of the use of resources: (defresource huge-16b-array (&optional (size 1000)) :constructor (make-array size ':type 'art-16b)) (defun do-complex-computation (x y) (using-resource (temp-array huge-16b-array) ... ;Within the body, the array can be used. (aset 5 temp-array i) ...)) ;The array is returned at the end. |
[ << ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |