Software Design Notes Internal Storage Formats SECTION 7 Internal Storage Formats 7.1 INTRODUCTION This section details the Explorer virtual machine internal storage formats. Symbolic byte specifiers and constants for most of the formats described can be found in the system parameters file SYS:UCODE;LROY-QCOM. The convention is that byte specifiers begin with a double percent (%%) while symbolic constants representing offsets or other numeric constants begin with a single percent (%). All such symbols are in the SYSTEM package, as are the majority of functions described here. 7.2 THE TAGGED ARCHITECTURE Lisp follows a "single sized data" convention, which states: Any object can be represented in a fixed size storage cell. ____________ On the Explorer, the size of this fixed-size cell is one memory word (32 bits), also called a Q for quantum. A Q is divided into _ three fields: a 25-bit pointer field, a 5-bit data type field, _______ _________ and a 2-bit cdr code field. The format of a Q is shown in Figure ________ 7-1. 31 30 29 25 24 0 +--------------------------------------------------+ | CC | DTP | POINTER | +--------------------------------------------------+ POINTER <0:24> = Pointer field. Contains immediate data or pointer to actual storage. DTP <25:29> = Data Type field. CC <30:31> = Cdr Code field. Figure 7-1 Q Format Lisp can adhere to the single size data convention because some Lisp data items are not objects themselves but rather object ______ references, which can be thought of as pointers to the actual __________ 7-1 Internal Storage Formats Software Design Notes storage used by the object. The kind of object reference being made can always be determined by the data type field of the Q. If it is an Immediate Type, such as a small integer, the actual ______________ data will be stored in the pointer field. No further reference need be made in this case. If the object is a Pointer Type ____________ requiring extended storage, such as an array or a list, the object's pointer field contains the first address of the extended storage. There is an obvious analogy between object references and indirect addressing on conventional machines. However, rather than fully partitioning memory into different spaces where the different types of data objects are stored, the Explorer system uses the tagged data method of object representation. The way an ___________ object is manipulated is always dictated by the its data type tag. In turn the interpretation and use of the object's pointer field depends on teh data type. There is considerable hardware- level support for this architecture in the Explorer processors themselves, allowing for extremely flexible data structuring combined with efficient access. 7.3 BOXED VERSUS UNBOXED MEMORY Boxed or Typed cells are memory words whose data-type fields are _____ _____ valid; that is, they are meant to be interpreted as data types rather than as random bit patterns. Only if the data type field is valid may the pointer field and cdr code field be interpreted as pointers and cdr codes. Words whose contents are all data are termed Unboxed or Untyped. A segment of memory where all cells _______ _______ are boxed would be called fully boxed whereas a segment where ___________ some words are boxed and others are not is called partially _________ boxed. _____ In the Explorer virtual machine, not all memory is fully boxed, although some well-defined segments are. Usually, determining if a given word is boxed or unboxed requires examining the word's context. The first context item to consider is storage type. All virtual memory segments have a storage representation type of List or Structure. List space holds only CONS cells and cdr-coded lists, and si fully boxed. The start and end of list segments is determined by the cdr coding of the words. Structure space holds __________ all other extended storage data structures and is not fully boxed. In structure space, the start of an object is marked by a structure header. Objects that are implemented as structures ________________ with headers are: symbols, instances, arrays, compiled functions (FEFs), and extended numbers. Of these, only symbols and instances are fully boxed. The others contain some boxed words and some unboxed words. 7-2 Software Design Notes Internal Storage Formats If a word is in structure space and if it is in a structure that is only partially boxed, further information must be sought to determine exactly which words are boxed. It is not even the case that a given structure type is always partially boxed or always fully boxed. For example, some arrays are completely typed while others have typed cells only up to the start of their actual data elements. Some extended numbers are made up of other extended numbers; hence are fully typed. Others contain bit-encodings following their headers. The one unbreakable rule about partially boxed structures is that all boxed words must come before any unboxed storage. For code that needs to know details about the size, boxed size, and unboxed size of data objects there are a number of functions that may be used. %STRUCTURE-TOTAL-SIZE, given a pointer to any boxed word in a structure, will return the total number of words of storage the structure uses. %STRUCTURE-BOXED-SIZE will likewise return the number of boxed words. The number of unboxed words in the structure can then be computed. The largest drawback to these primitives is that they cannot be used on any arbitrary address. They will cause a crash if given an invalid address; they will be unpredictable (crash, trap) if given the address of an unboxed data word. For efficiency, they assume they have a good, boxed word to start with and go from there. Code that needs to find the (always boxed) start of the structure containing a given address should either begin parsing memory at the memory segment (region) beginning, or use a "safe" primitive such as %FIND-STRUCTURE-HEADER-SAFE (FSH-SAFE for short). This latter can be slow, since it parses memory in the forward direction, but makes every attempt to be safe. It will return NIL, for example, if given an invalid address. 7.4 DATA TYPE SUMMARY The 5-bit data type field supports 32 data types. Some of these represent actual Lisp objects; that is, they can be passed as arguments to functions, returned as values, and have standard operations defined on them. These Lisp object data types compries the primitive, architectural support out of which all more complex types are built. The Lisp object types and some selected attributes are listed in Table 7-1. 7-3 Internal Storage Formats Software Design Notes Table 7-1 Lisp Object Data Type Attributes -------------------------------------------------------- Type Pointer Field Generic Code Data Type Name Contents Type -------------------------------------------------------- 5 DTP-Fix Immediate data Number 6 DTP-Character " Character 8 DTP-Short-Float " Number 1 DTP-List Address List 2 DTP-Stack-List " List 13 DTP-Closure " Function 14 DTP-Lexical-Closure " Function 3 DTP-Symbol " Symbol 9 DTP-Instance " Instance 4 DTP-Array " Array 16 DTP-Stack-Group " Stack Group 12 DTP-Function " Function 7 DTP-Single-Float " Number 10 DTP-Extended-Number " Number 11 DTP-Locative " Pointer 15 DTP-U-Entry Index Function -------------------------------------------------------- The remaining data types are termed housekeeping types because __________________ they are used in the virtual machine implementation rather than to represent computational objects. Examples of housekeeping uses include marking the start of extended structure storage or acting as invisible forwarding pointers. The pointer field of a housekeeping type may contain an address or an index as with Lisp object types. If it is not an address it is usually divided up into various flag fields that describe the storage structure it is used in. The housekeeping data types and some selected attributes are listed in Table 7-2. 7-4 Software Design Notes Internal Storage Formats Table 7-2 Housekeeping Data Types ---------------------------------------------------------- Type Pointer Field Special Code Data Type Name Contents Use ---------------------------------------------------------- 22 DTP-Symbol-Header Address Structure hdr 25 DTP-Instance-Header " " 24 DTP-Array-Header Flags " 26 DTP-Fef-Header Flags " 19 DTP-One-Q-Forward Address Forwarding 18 DTP-External-Value- Cell-Pointer " " 17 DTP-GC-Forward " " 28 DTP-GC-Young-Pointer " " 21 DTP-Body-Forward " " 20 DTP-Header-Forward " Forwarding, Structure hdr 27 DTP-Self-Ref-Pointer Index, Flags " 30 DTP-Null Address Unbound mark 0 DTP-Trap Unused Trap 31 DTP-Ones-Trap Unused Trap 29 DTP-Free Unused Illegal ---------------------------------------------------------- The paragraphs below cover all the data types. First the Lisp object types are covered, divided into the immediate types and the pointer types. The pointer types are subdivided into List, Structure, and Miscellaneous types. The housekeeping data types used as structure headers are covered in the discussion of the extended storage object they implement. Finally, the remainder of the housekeeping types (the forwarding and mescellaneous housekeeping types) are detailed. 7.5 IMMEDIATE DATA TYPES Some Lisp objects can be represented completely in one storage cell. These are called Immediate Types or INUM types, since the _______________ pointer field of such a cell is an actual value rather than a pointer to the actual value. INUM types are typically implemented for efficiency reasons. Each INUM type is discussed in the following paragraphs. 7-5 Internal Storage Formats Software Design Notes 7.5.1 DTP-Fix. This is a small integer, usually called a FIXNUM for fixed point ______ number. The pointer field contains the actual value of the number in twos complement notation. Unlike extended numbers, FIXNUMs with the same value will always be EQ. Integers in the range [- 2**24 ... 2**24 - 1] will be represented as FIXNUMs. Outside of this range an integer extended number (BIGNUM) will be created. 7.5.2 DTP-Character. This is a Common Lisp character object. For Zetalisp compatability it can be used in arithmetic like a FIXNUM. Its format is shown in Figure 7-2. The byte descriptors can be found in Q-FIELDS. 31 29 25 24 23 22 21 20 19 15 5 7 0 +--+-------+--+--+--+--+--+--+---+----------+----------+ |CC| DTP | K|MO| H| S| M| C| | FONT | CHAR | +------------------------------------------------------+ CHAR <0:7> = Character code or mouse click code FONT <8:15> = Font bits C <19> = Control Bit M <20> = Meta Bit S <21> = Super Bit H <22> = Hyper Bit MO <23> = Mouse Bit KP <24> = Keypad Bit Figure 7-2 Character Format 7.5.3 DTP-Short-Float. A short precision floating point number. The pointer field contains a 25-bit floating point number, subdivided into a 1-bit sign field, an 8-bit exponent field, and a 16-bit fraction field. The exponent field contains the excess 127 exponent, which gives a range of 10**-38 to 10**+38 approximately. The fraction field contains a 16-bit fraction, excluding the hidden bit (which is only used if the exponent field contains a non-zero value). The fraction can contain approximately 5 decimal digits. The format used is equivalent to the IEEE Std 754-1985 single precision format, with the exception that the fraction field has been 7-6 Software Design Notes Internal Storage Formats reduced from 23 bits to 16 bits. The format of a short float word is shown later along with the other numeric formats. 7.6 POINTER OBJECT TYPES The pointer type objects have an address (or an index) in their pointer field; the address generally points to the first word of the object's actual extended storage. Each pointer object points to a certain kind of storage. For structure-type pointers, the storage is always delineated by a structure header. For list- type pointers, the storage is a CONS or a list whose extent is defined by the local cdr coding. The pointer object types and what they point to are listed by category in Table 7-3 and are discussed individually below. Table 7-3 Pointer Types Structure Pointer Object: Points to: ----------------------------------------------------------- DTP-Symbol DTP-Symbol-Header DTP-Instance DTP-Instance-Header DTP-Array DTP-Array-Header or DTP-Header-Forward DTP-Stack-Group DTP-Array-Header DTP-Function DTP-Fef-Header DTP-Extended-Number DTP-Header DTP-Single-Float DTP-Header List Pointer Object: Points to: ----------------------------------------------------------- DTP-List CONS cell, Cdr-Coded list, or DTP-Header-Forward DTP-Closure Cdr-Coded list DTP-Lexical-Closure CONS cell DTP-Stack-List Cdr-Coded list on a regular PDL Other Pointer Object: Points to: ----------------------------------------------------------- DTP-Locative Any boxed word DTP-Self-Ref-Pointer Indexes an instance cell 7-7 Internal Storage Formats Software Design Notes 7.7 LIST POINTER OBJECTS 7.7.1 DTP-List. The pointer field points to a CONS or a cdr-coded list segment. Whether the storage is a CONS or a list segment, and if a list segment how long a segment all depends on the cdr code field. ________ The encoding of the 2-bit cdr code field is shown in Table 7-4. Table 7-4 CDR Codes 0 - CDR NORMAL 1 - CDR ERROR 2 - CDR NIL 3 - CDR NEXT 7.7.2 CONS Cells. A CONS is a 2-word structure whose first word is the CAR and second word is the CDR. This is the two-pointer form of CONS used in most versions of Lisp. The CAR cell always has a cdr code of CDR-NORMAL, and the CDR cell is marked by CDR-ERROR. The CDR-NORMAL means that the Q following this one contains the CDR. CDR ERROR means that it is an error to take thr CDR of this location, since this is the second half of a full (CDR-NORMAL) node. 7.7.3 Cdr-Coded Lists. A scheme called Cdr-Coding allows a special, high-density storage __________ scheme for regular lists. In such a storage scheme a list of N elements can be stored in N consecutive words using a series of CDR-NEXTs followed by a final CDR-NORMAL (if this list continues on with one or more CONS cells) or a CDR-NIL (if this is the end of the list). CDR-NIL means simply that the CDR of this node is the symbol NIL. In contrast, a list of N elements constructed from normal CONS cells requires 2N words of storage. CDR-NEXT designates a list element (a CAR). It indicates that the CDR of this CAR is a (not physically present) pointer to the next word. When the CDR of a normal list is requested, a copy of the contents of the CDR cell is returned, when a CDR of a cdr- coded list is requested, a pointer to the location one word past the CAR is constructed and returned. 7-8 Software Design Notes Internal Storage Formats The functions APPEND, LIST, and COPY-LIST always form these compact cdr-coded lists while CONS and related operations always create full nodes (CDR-NORMAL, CDR-ERROR). 7.7.4 Destructive Operations: RPLACA, RPLACD. RPLACA and RPLACD are easily defined on CONS cells. RPLACA overwrites the first cell of the CONS and RPLACD overwrites the second. Since all CARs in a cdr-coded list have cells allocated to them, to RPLACA an element of a CDR-NEXT list, you simply overwrite the specified CAR. But since the CDR node is not actually present, but rather is implied by the cdr-coding, RPLACDing is more difficult. RPLACD first finds the CAR whose not-yet-existent cdr needs to be written. It then allocates a completely new CONS node (CDR- NORMAL, CDR-ERROR) and copies the CAR to the first word. It then writes the new CDR in the second word. Finally, the original CAR with the CDR-NEXT field is overwritten with a word with data type DTP-HEADER-FORWARD and cdr code CDR-ERROR, which points to the newly allocated CONS node. This process is called RPLACD ______ forwarding. __________ 7.5.5 DTP-Stack-List. This is a cdr-coded list which has been created on a runtime stack (regular PDL), usually as a function's REST arg. The pointer field must always point to the portion of the stacj that is active (i.e., before the top-of-stack pointer). Since runtime stacks are represented by special PDL arrays, a DTP-STACK-LIST paradoxically always points to structure space. The elements of the STACK-LIST are always cdr-coded except that it may end with either a CDR-NIL or a CDR-NORMAL which points to a CONS in List space. RPLACA works normally on a STACK-LIST. It is an error to RPLACD one. If an attempt is made to store a STACK-LIST into memory not in the active portion of the PDL, the list elements will be copied out to regular List space. All STACK-LIST pointers will then be replaced with normal LIST pointers, and the old elements on the stack will be replaced with words with data type DTP-EXTERNAL- VALUE-CELL-POINTER and pointer fields which address the corresponding element in the copied-out list. 7.7.6 Closures. A dynamic closure is a word of type DTP-CLOSURE which points to a block of cdr-coded storage in list space. This block is 2*N+1 words long, where N is the number of cells closed over. 7-9 Internal Storage Formats Software Design Notes A lexical closure is a word of type DTP-LEXICAL-CLOSURE which points to a CONS cell. The CAR of the CONS is the closure's function, and the CDR is the a word of type DTP-LOCATIVE pointing to the closure's lexical environment. The type DTP-LEXICAL-CLOSURE is only created by compiled code. The interpreter implements lexical closures as a DTP-CLOSURE over the special variables that hold the interpreter's environment. Both these closure types are described in detail in a later section on Closures. 7.8 STRUCTURE POINTER OBJECTS The are only a handful of actual structure types. These are: symbols, instances, arrays (including stack groups), functions, and extended numbers. Each of these is discussed here. 7.8.1 Symbols. A symbol is represented by a Q of datatype DTP-SYMBOL whose ______ pointer points to a five-word symbol block. The five words are listed in Table 7-5. Table 7-5 Qs of Symbol Offset Cell Name ------ --------- 0 Print Name cell 1 Value Cell 2 Function Cell 3 Property Cell 4 Package Cell The Print Name Cell holds a word of DTP-SYMBOL-HEADER pointing to _______________ a STRING array which is the print name for the symbol. The SYMBOL-HEADER acts just like an array pointer in many contexts. The Value Cell holds the value of the symbol, and so can be of __________ almost any data type. If the value has not been initialized, a symbol's value cell may be empty or unbound. If so, the cell _______ contains a DTP-NULL whose pointer field points back at the symbol header. The Function Cell holds the functional property of the symbol. _____________ If the symbol is called as a function, the contents of this cell will be analyzed to determine what function to perform. A 7-10 Software Design Notes Internal Storage Formats symbol's function cell may also be unbound, in which case it contains a DTP-NULL which points back at the symbol header. The Property Cell contains the property list. The use of _____________ properties is not required by the basic system at all, so this might be NIL. On the other hand, many subsystems and features make heavy use of the property list, so it is likely to contain something. The Package Cell is used to point to the package to which the ____________ symbol belongs for interned symbols; for uninterned symbols, the package cell contains NIL. The only architectural support for packages is the package cell of symbols. When a symbol is initially created, the value and function cells contain DTP-NULL. The property cell initially contains NIL; however, the loader and other parts of the system that create symbols may place properties on them. The function VALUE-CELL-LOCATION can be used to obtain a DTP- LOCATIVE pointer to this location and the contents can be obtained by (CAR ) or more generally (CONTENTS ) on the locative so generated. 7.8.2 Instances. A flavor instance instance is a word of DTP-INSTANCE which points to a DTP-INSTANCE-HEADER. Instances are variable length, but all their elements (which are instance variable slots) are boxed. The INSTANCE-HEADER pointer field points to this instance's parent flavor. The flavor data structure is implemented as an array (so that the INSTANCE-HEADER acts as an array pointer in most contextx) and is described in greater detail in the section on Flavor Internals. 7.8.3 Arrays. An array object is a word of DTP-ARRAY which points to a DTP- ARRAY-HEADER. The ARRAY-HEADER may optionally be preceded by a number of boxed leader words topped by a DTP-HEADER of header type ARRAY-LEADER. The ARRAY-HEADER may be followed by by some housekeeping words (if it is a long, multidimensional, displaced, or physical array) and then generally by the actual array data storage words. The ARRAY-HEADER pointer field does not contain an address, but rather several fields of data describing the array. The section on arrays discusses arrat formats in greater detail. Below we briefly introduce some special arrays: Stack Groups, Regular PDL Arrays, and Binding PDL Arrays. 7.8.3.1 Stack Groups. 7-11 Internal Storage Formats Software Design Notes The state of a computation on the Explorer is maintained in a data structure called a Stack Group when the computation is not ___________ active. A stack group is represented by a word of DTP-STACK- GROUP pointing to an ARRAY-HEADER. The stack group array is a Q array of array type ART-STACK-GROUP-HEAD. It has no data elements; all the computation state is stored in array leader words. The stack group data structure itself is described fully in a later section. In addition to saved register values and other state information, there are two arrays associated with each stack group, the Regular PDL and the Special PDL (SPDL). These are also described more fully in the sections on function calling and stack groups, but are introduced here to touch upon their particular storage management conventions. 7.8.3.2 Regular PDL Arrays. A computation's run-time stack is kept in the stack group's Regular PDL array. A Regular PDL array has array type ART-REG- PDL. It is a Q array in that its valid elements can contain any Lisp object; but not all its elements are valid. Those which are not valid are considered as unboxed memory. A Regular PDL array always has a leader with one leader element. The leader element (element 0) contains the stack group which owns this PDL. The top of the currently-active stack group's runtime stack is kept in the PDL Buffer of the processor. The hardware PDL Buffer __________ acts as a cache of up to 1024 words which greatly speeds up almost all references to the stack. The PDL Buffer cache is maintained by microcode invisibly to macrocode and all higher levels. The Regular PDL is not allowed to contain most housekeeping types. In addition, it always contains valid boxed Qs up to the current top-of-stack pointer. However, Regular PDL array elements beyond the current top-of-stack pointer are not guaranteed to be valid Lisp objects so must not be accessed by any Lisp code or be scavenged by the garbage collector. They are considered as unboxed elements in the array. The top-of-stack pointer for a Regular PDL is computed is one of two ways. If the Regular PDL does not belong to the current computation (i.e., is not the Regular PDL of %CURRENT-STACK- GROUP), the valid portion of the PDL array is described by the saved regular PDL pointer of its Stack Group (SG-REGULAR-PDL- POINTER ). Array elements numbering up to this value may be freely accessed. If, however, this is the current stack-group's Regular PDL, the highest valid element number must be computed using the hardware PDL-BUFFER-POINTER register. In either case, the %STRUCTURE-BOXED-SIZE primitive will always return the correct number of valid, boxed words when given a 7-12 Software Design Notes Internal Storage Formats Regular PDL array. This number is the top-of-stack element index plus 2, since it counts the PDL array header word and the array long length word as valid, boxed words. If a Regular PDL needs more elements than its initial allocation size the microcode traps out to the Error Handler. The Error Handler will allocate a larger array and structure-forward the old PDL to the new one. 7.8.3.3 Special PDL Arrays. A computation's dynamic (special variable) binding stack is represented using a Special PDL array (SPDL) which has array type ART-SPECIAL-PDL. Like the Regular PDL array, the SPDL is a Q array in that its valid elements can contain any Lisp object; but not all its elements are valid. Those which are not valid are considered as unboxed memory. The SPDL also has one leader element which contains the stack group owning this SPDL. The valid top-of-stack for the binding stack is stored in the stack group (SG-SPECIAL-PDL-POINTER ) if this is not the current stack group's SPDL. Otherwise, as with Regular PDLs, the valid portion can only be computed using %STRUCTURE-BOXED-SIZE. If a SPDL needs more elements than its initial allocation size the microcode traps out to the Error Handler which will allocate a larger array and structure-forward the old SPDL to the new one. 7.8.4 Compiled Functions (FEFs). When a function is macro-compiled, the macrocompiler produces a compiled code object called a Function Entry Frame (FEF). This is represented by a word of DTP-FUNCTION pointing to storage starting with a word of type DTP-FEF-HEADER. The storage occupied by the FEF can be divided into two parts. The first part, which is at least eight words long, is called the overhead section and consists only of boxed words. This section contains encoded information about the function and pointers to variables and other functions called by the function. The second section is completely unboxed and contains the function's macrocode instructions packed two to a 32-bit memory word. See the section on Function Calling for more details on FEFs. 7.8.5 DTP-Header. This word is the beginning of a block of storage which is either an array leader, a sginel float, or and extended number. The pointer field does not contain an address; instead it has a HEADER-TYPE field which explains what purpose the header is serving. The HEADER-TYPEs are summarized in Table 7-6. Array 7-13 Internal Storage Formats Software Design Notes leaders are discussed in the arrays section. The remaining extended number structures are described below. Table 7-6 Header Types Code Header Type Name Pointed To By ---- ---------------- ------------- 0 %HEADER-TYPE-ERROR Unused 1 %HEADER-TYPE-UNUSED-1 Unused 2 %HEADER-TYPE-ARRAY-LEADER DTP-LOCATIVE 3 %HEADER-TYPE-UNUSED-3 Unused 4 %HEADER-TYPE-SINGLE-FLOAT DTP-SINGLE-FLOAT 5 %HEADER-TYPE-COMPLEX DTP-EXTENDED-NUMBER 6 %HEADER-TYPE-BIGNUM DTP-EXTENDED-NUMBER 7 %HEADER-TYPE-RATIONAL DTP-EXTENDED-NUMBER 8 %HEADER-TYPE-DOUBLE-FLOAT DTP-EXTENDED-NUMBER 7.8.6 DTP-Extended-Number An word of type DTP-EXTENDED-NUMBER pointing to a DTP-HEADER word signifies one of the following, depending on the HEADER-TYPE of the HEADER word: an extended integer (BIGNUM); a ratio between two integers (RATIONAL); a double-precision floating point number (DOUBLE-FLOAT); or a complex number (COMPLEX). All except DOUBLE-FLOAT are described here. The floating point formats are details in the next subsection. 7.8.6.1 BIGNUM. A BIGNUM is an extended precision integer represented by an object of type DTP-EXTENDED-NUMBER pointing to a DTP-HEADER. The storage length of a BIGNUM is determined by the size of the integer it represents. It is composed by a HEADER word and a number of unboxed data words as shown in Figure 7-3. After the BIGNUM header, the integer is stored in successive words with the least significant word first. The format of the BIGNUM header is shown in Figure 7-4. The LENGTH field gives the length of the BIGNUM in words; this is the length of the BIGNUM structure minus one. Each of the BIGNUM data wrods has the format shown in Figure 7-5. The high order bit is always 0. The remaining bits are a section of the bits of the positive integer that is represented. 7-14 Software Design Notes Internal Storage Formats +---------------------------------+ | Bignum Header | +---------------------------------+ | Least Significant Word | . . +---------------------------------+ | Most Significant Word | +---------------------------------+ Figure 7-3 BIGNUM Structure 31 29 25 22 19 18 17 0 +---------------------------------------------------+ | | | | | | | +---------------------------------------------------+ | | | | | | CDR Code | | | | | | | | | | DTP-Header-- | | | | | | | | Spare Bits ------ | | | | | | Header-Type-BIGNUM ---- | | | | Sign Bit--------------------- | | Length-------------------------------------- Figure 7-4 BIGNUM Header Format 31 30 0 +---------------------------------------------------+ | | | +---------------------------------------------------+ | | 0 | | Section of integer----- Figure 7-5 BIGNUM Data Format 7-15 Internal Storage Formats Software Design Notes 7.8.6.2 RATIONAL. A RATIONAL number is an object of type DTP-EXTENDED number pointing to a 3-word fully boxed structure. The first word is of type DTP-HEADER and the next two words are the numerator and the denominator, respectively, of the ratio. Each of these may either be a FIXNUM or a BIGNUM. 7.8.6.3 COMPLEX. A COMPLEX number is an object of type DTP-EXTENDED number pointing to a 3-word fully boxed structure. The first word is of type DTP-HEADER and the next two words are the real part and the imatinary part, respectively, of the COMPLEX number. Each of these may be any numeric type. 7.8.7 Floating Point Numbers. There are several floating point formats supported on the Explorer. Short precision floating point numbers are an INUM type; that is, the pointer field of the Q contains the value of the object rather than a pointer to it value. Single precision and double precision floating point numbers are represented as a pointer pointing to a DTP-HEADER with header type %HEADER-TYPE- SINGLE-FLOAT and %HEADER-TYPE-DOUBLE-FLOAT respectively. Single precision floating point numbers have a pointer type DTP-SINGLE- FLOAT, double precision floating point numbers have a pointer type DTP-EXTENDED-NUMBER. 7-16 Software Design Notes Internal Storage Formats 7.8.7.1 Short Float. The format of a short precision floating point number is shown in Figure 7-6. This format is equivalent to the single precision floating point format, with the fraction field reduced to 16 bits. 31 29 25 24 23 16 15 0 +---------------------------------------------------+ | | | | | | +---------------------------------------------------+ | | | | | CDR Code | | | | | | | | DTP-Short-Float | | | | | | Sign bit --------- | | | | Exponent ----------------- | | Fraction ----------------------------------- Figure 7-6 Short Float Format 7-17 Internal Storage Formats Software Design Notes 7.8.7.2 Single Float. A single precision floating point number is represented as an object of DTP-SINGLE-FLOAT, pointing to a two-word structure as shown in Figure 7-7. The second word is unboxed. Unused ---------------------------------- | Header-Type-Single-Float- | | | Spare Bits ------- | | | | | DTP-Header - | | | | | | | CDR Code | | | | | | | | | 31 29 25 22 19 18 0 +---------------------------------------------------+ Word 1 | | | | | | +---------------------------------------------------+ Word 2 |s| e | f | +---------------------------------------------------+ 31 30 23 22 0 s = sign bit e = exponent field f = fraction field Figure 7-7 Single Precision Float Structure Format The format used is the IEEE Std 754-1985 single format. The 8- bit exponent field uses an excess-127 (decimal) format. The single precision floating point format uses a 23-bit fraction field. The range for single precision floating point numbers is approximately 10**-38 to 10**+38, with a precision of approximately 7 decimal digits. 7-18 Software Design Notes Internal Storage Formats 7.8.7.3 Double Float. A double precision floating point number is represented as an object of DTP-EXTENDED-NUMBER, pointing to a three-word structure as shown in Figure 7-8. The two words following the HEADER are unboxed. Unused ---------------------------------- | Header-Type-Double-Float - | | | Spare Bits ------- | | | | | DTP-Header - | | | | | | | CDR Code | | | | | | | | | 31 29 25 22 19 18 0 +---------------------------------------------------+ Word 1 | | | | | | +---------------------------------------------------+ Word 2 | f-low | +---------------------------------------------------+ Word 3 |s| e | f-high | +---------------------------------------------------+ 31 30 20 19 0 s = sign bit e = exponent field f = fraction field Figure 7-8 Double Precision Float Structure Format The format used is the IEEE Std 754-1985 double format. The 11- bit exponent field uses an excess-1023 (decimal) format. The double precision floating point format uses a 52-bit fraction field, with the 20 most significant bits concatenated with the sign and exponent fields. The range for double precision floating point numbers is approximately 10**-308 to 10**+308, with a precision of approximately 16 decimal digits. 7.8.8 The Cdr Code Field in Strucutres. The cdr code field not usually significant in structure space. There are, however some exceptions. The cdr code field is used to denote actual cdr coding in structures that are treated as lists (a STACK-LIST or ART-Q-LIST array). In addition, the cdr code has a specialized meaning relating to binding blocks inside 7-19 Internal Storage Formats Software Design Notes of a Special (Binding) PDL array (see the section on Stack Groups for more details). Finally, hash table entries are cdr coded so the key and value(s) may easily be returned as a list. 7.9 OTHER POINTER OBJECTS Two miscellaneous pointer objects are implemented with the data types DTP-LOCATIVE and DTP-U-Entry. The LOCATIVE is generally used to point to a single cell. A U-ENTRY is a functional object with an index in its pointer field. 7.9.1 DTP-Locative. A locative is a general-purpose pointer to a single boxed cell of memory. It is not an invisible pointer type. It can be used for many things. For example, it is often used to point to an array leader's header word, or to point to bound cells on the binding PDL. Both CAR and CDR of a locative return the same thing, namely the contents of the cell pointed at. 7.9.2 DTP-U-Entry. This data type represents a microcoded function that takes a variable number of arguments or has a REST arg. It is a legitimate functional object which can be funcalled, passed as an argument, and so forth. AREF and LIST are examples of U-ENTRYs. The term U-ENTRY is short for Ucode, or microcode, entry. The pointer field is actually an index into the MICRO-CODE-ENTRY- AREA; this contains either a FIXNUM or a function. If it is a FIXNUM, that number is an index into the MICRO-CODE-LINK-AREA. The FIXNUM then found in the MICRO-CODE-LINK-AREA consists of the 16-bit control store address of the microcode to run for this function; 6 bits describing the required arguments pattern; and one bit indicating a REST arg or not. If the entry in the MICRO-CODE-ENTRY-AREA is not a FIXNUM, then the current definition of this function is not microcoded and the symbol names the function to run. The DEBUG-INFO structure for a U-ENTRY function can be found by using the pointer field as an index into the MICRO-CODE-ENTRY- DEBUG-INFO-AREA. 7-20 Software Design Notes Internal Storage Formats 7.10 HOUSEKEEPING TYPES The discussion of housekeeping types in the following paragraphs begins with the forwarding pointer types. Then the remainder of the data types are described: the DTP-SELF-REF-POINTER type; the Trap Types DTP-NULL, DTP-TRAP; and the type DTP-FREE. 7.10.1 Forwarding Pointer Types. Forwarding pointer types are "invisible" data types which provide data indirection. They are termed invisible because their presence is completely transparent to most Lisp code. Whenever a forwarding pointer is read, the read is indirected along the invisible pointer. This is sometimes called the INVIZ process, _____ or just INVZ. INVZ is similar to indirect addressing in other ____ computers, except that instead of being specified by the reading instruction, the indirection is specified by the data read. For example, if you ask for the symbol value of a sumbol with a DTP- ONE-Q-FORWARD Q in its value cell, what you will actually get back is the contents of the word at the address specified in the pointer field of the DTP-ONE-Q-FORWARD word. There are six forwarding data types. DTP-ONE-Q-FORWARD and DTP- EXTERNAL-VALUE-CELL-POINTER are one-word forwards used for various housekeeping functions. DTP-GC-FORWARD and DTP-GC-YOUNG- POINTER are used for garbage collection support. DTP-HEADER- FORWARD and DTP-BODY-FORWARD are used to forward a whole structure that has grown or been moved. All are discussed below. 7.10.1.1 DTP-One-Q-Forward. This is a simple kind of invisible pointer used to hide a single cell of memory. It forwards only the word that it is in, not the whole containing structure. Its most common use is to alias a symbol's value to that of another symbol (see DEFF and FORWARD- VALUE-CELL). 7.10.1.2 DTP-External-Value-Cell-Pointer. This is a kind of one-word forwarding invisible pointer used for several different purposes. They are most commonly found in the overhead words of compiled functions (FEFs). There they are used to point to value cells of symbols referenced by the code and to function cells of other functions called. EVCPs are also placed in symbol value cells by the dynamic closure mechanism (see the section on Closures), and are used to replace STACK-LIST elements on the PDL after the list is copied out to normal memory (see the paragraph on Stack Lists). The 7-21 Internal Storage Formats Software Design Notes Lisp interpreter also uses EVCPs to implement dynamic binding. 7.10.1.3 GC Forwarding Pointers. Data type DTP-GC-Forward is the forwarding pointer left behind by the garbage collector in Oldspace; in fact, it is illegal anywhere else but Oldspace. The pointer field of a DTP-GC- FORWARD must point to Copyspace, and contains the address where this cell is forwarded. In this sence the GC-FORWARD is a one- word forward. Normally, whenever a pointer-type Q is read from emmory a check is done to see if the address in the pointer field falls in Oldspace. If so, the object pointed to is copied to Copyspace and the object's old location is filled completely with GC- FORWARDS. This is called transporting. Furthermore, the ____________ original memory location read is updated so that it now has the Copyspace address. When other referenced Qs are found to point to this GC-FORWARD they are also updated in memory to have the forwarded, Copyspace address. This process is called snapping ________ out, and preserves the identity of objects copied be the garbage ___ collector. The type DTP-GC-Young-Pointer (GCYP) is a single-cell forwarding pointer which is used in implementing the Temporal Garbage Collection (TGC) algorithm. It acts as a special marker indicating that the object it replaces is younger than its containing structure. The GCYP pointer field always addresses an indirection cell in ________________ the special INDIRECTION-CELL-AREA. The indirection cell contains the actual object the GCYP replaces, including the cdr code field. 7.10.2 Structure Forwarding. When an array needs to grow beyond its original allocation due to VECTOR-PUSH-EXTEND or ADJUST-ARRAY-SIZE operations, a process known as structure forwarding takes place. This is done by the ____________________ STRUCTURE-FORWARD function. New, larger contiguous storage is allocated, all the array's leader and data element words are copied into the new storage, and the old structure is filled with forwarding markers. The old array header word is replaced by a word of type DTP- HEADER-FORWARD whose pointer field has the address of the new array header word. All other words in the old structure, both leader words and data element words, are replaced with Qs containing data type DTP-BODY-FORWARD and pointing to the HEADER- 7-22 Software Design Notes Internal Storage Formats FORWARD word. The body forwards are needed to forward array elements that have pointers to them (created, for example, by LOCF), but are used even in unboxed arrays. To follow a BODY-FORWARD, the pointer field is used to find the HEADER-FORWARD word. The offset (either positive or negative) from the HEADER-FORWARD word is calculated. That offset is then applied to the address of the real header, found after following all intermediate HEADER-FORWARDs have been followed. It is possible for symbols to be structure forwarded also, but this is no longer done by any system code. No other structures in structure space can be structure forwarded. DTP-HEADER- FORWARD does have a further, different use in List space to support the RPLACD operation. This is described above in the paragraphs on Destructive List Operations. 7.10.3 DTP-Self-Ref-Pointer. A SELF-REF-POINTER (SRP) is used for mapped and unmapped instance variable references and may be used to implement monitor variables in the future. The pointer field of an SRP contains flag bits nad in index field. The format of a DTP-SELF-REF- POINTER (SRP) is shown in Table 7-7. SRPs are created and manipulated within the code for flavors, mostly in the mapping- table sections. They are currently found only in the typed overhead words of FEFs. Table 7-7 SELF-REF-POINTER Format Bit 19: SELF-REF-RELOCATE-FLAG Bit 18: SELF-REF-MAP-LEADER-FLAG Bit 17: SELF-REF-MONITOR-FLAG Bits 12-0: SELF-REF-INDEX Bits 12-1: SELF-REF-WORD-INDEX If RELOCATE-FLAG is clear, an SRP is an unmapped instance variable reference. The instance variable address is derived by indexing SELF-REF-INDEX words into the instance data structure currently associated with SELF. Unmapped instance variables are created by the :ORDERED-INSTANCE-VARIABLES option to DEFFLAVOR, and when a flavor is a base flavor with no mixins, required flavors, and so forth. If RELOCATE-FLAG is set, the SRP is a mapped instance variable reference and a SELF-MAPPING-TABLE array must be used. This is the most common case (about 75% of SRPs). When the MAP-LEADER- FLAG is clear, the INDEX field is used as an element index into the current SELF-MAPPING-TABLE array (an ART-16b array). That 7-23 Internal Storage Formats Software Design Notes location should contain a number which is the real index into SELF of the mapped instance variable. The MAP-LEADER-FLAG, when set, means to read the contents of a slot in the array leader of the SELF-MAPPING-TABLE. This flag is used only when fetching another mapping table during the execution of a :COMBINED method built on compowed flavors (about 5% of SRPs). The SELF-MAPPING-TABLE is obtained from Local Slot 1 on the stack. The INDEX field is then an index into the mapping table array leader, which should then contain a locative to a cell containing the instance variable value. The MONITOR-FLAG bit is intended for use in implementing "monitor variables" which cause a trap when written into; however monitor vairables are not currently supported hence the Monitor-Flag bit should always be a 0 in an SRP. If the MONITOR-FLAG bit were set, the SRP would be a reference to the next memory location on read and would cause a trap on write. The MAP-LEADER-FLAG would be ignored. No monitor pointers could appear within methods. 7.10.4 Trap Types. Reading a word with a trap data type will normally cause the error handler to be invoked (via the trans trap mechanism __________ described in the section on Error Handling). 7.10.4.1 DTP-Null. This datatype is used for various things to mean "nothing." Its most common use is to act as an unbound cell marker. For example, an unbound sumbol has DTP-NULL in its value cell. The pointer field points back at the symbol header so that the error handler can describe the symbol that is unbound. DTP-NULL can also act as an unbound marker in other structures, but the pointer field must always point to some structure's header in order support the error handler. DTP-NULL is also used in hash tables to mark unused entries. Here, it a special case of the pointer field restriction, the pointer field contains 0, which is the address if NIL's header. 7.10.4.2 DTP-Trap and DTP-Ones-Trap. These data types are present mainly for error checking. A newly created virtual pave is filled with words of type DTP-TRAP and pointer fields of 0 by the page-fault microcode (in fact, all 32 bits are 0). The DTP-ONES-TRAP data type is used as a means of easily detecting sign extension during arithmetic operations. 7.10.4.3 Unused Type DTP-Free. 7-24 Software Design Notes Internal Storage Formats Some virtual memory that is allocated by the system build program (GENASYS) but is not yet occupied by objects has data type DTP- FREE. It is illegal in any context (will cause crash processing to occur). It should be considered reserved for future use. 7-25