Kawa: Data structures

Data structures

Sequences

A sequence is a generalized list, consisting of zero or more values. You can choose between a number of different kinds of sequence implementations. Scheme traditionally has lists and vectors. Any Java class that implements java.util.List is a sequence type. Raw Java arrays can also be viewerd as a sequence, and strings can be viewed a sequence (or vector) of characters. Kawa also provides uniform vectors.

Sequence types differ in their API, but given a sequence type stype you can construct instances of that types using the syntax:

(stype v0 v1 .... vn)

For example:

(bytevector 9 8 7 6)  ⇒ #u8(9 8 7 6)

For a raw Java class name jname you may need to use the empty keyword ||: to separate constructor parameters (if any) from sequence elements, as in:

(gnu.lists.U8Vector ||: 9 8 7 6)  ⇒ #u8(9 8 7 6)

This syntax works with any type with a default constructor and a 1-argument add method; see the section called “Allocating objects” for details. You can use the same syntax for allocating arrays, though array creation supports more options.

To extract an element from Scheme sequence of type stype there is usually a function stype-ref. For example:

(define vec1 (vector 5 6 7 8))
(vector-ref vec1 2) ⇒ 7

More concisely, you can use (Kawa-specific) function call syntax:

(vec1 3) ⇒ 8

The same function call syntax also works for raw Java arrays:

(define arr1 (long[] 4 5 6 7))
(arr1 3) ⇒ 7

To assign to (replace) an element from a sequence of Scheme type stype there is usually a function stype-set!:

(vector-set! vec1 1 9)
vec1 ⇒ #(5 9 7 8)

Again, you can use the function call syntax:

(set! (vec1 2) 'x)
vec1 ⇒ #(5 9 x 8)

Lists

A pair (sometimes called a dotted pair) is a record structure with two fields called the car and cdr fields (for historical reasons). Pairs are created by the procedure cons. The car and cdr fields are accessed by the procedures car and cdr. The car and cdr fields are assigned by the procedures set-car! and set-cdr!.

Pairs are used primarily to represent lists. A list can be defined recursively as either the empty list or a pair whose cdr is a list. More precisely, the set of lists is defined as the smallest set X such that:

The empty list is in X.
If list is in X, then any pair whose cdr field contains list is also in X.

The objects in the car fields of successive pairs of a list are the elements of the list. For example, a two-element list is a pair whose car is the first element and whose cdr is a pair whose car is the second element and whose cdr is the empty list. The length of a list is the number of elements, which is the same as the number of pairs.

The empty list is a special object of its own type. It is not a pair, it has no elements, and its length is zero.

Note: The above definitions imply that all lists have finite length and are terminated by the empty list.

The most general notation (external representation) for Scheme pairs is the “dotted” notation (c1 . c2 ) where c1 is the value of the car field and c2 is the value of the cdr field. For example (4 . 5) is a pair whose car is 4 and whose cdr is 5. Note that (4 . 5) is the external representation of a pair, not an expression that evaluates to a pair.

A more streamlined notation can be used for lists: the elements of the list are simply enclosed in parentheses and separated by spaces. The empty list is written (). For example,

(a b c d e)

and

(a . (b . (c . (d . (e . ())))))

are equivalent notations for a list of symbols.

A chain of pairs not ending in the empty list is called an improper list. Note that an improper list is not a list. The list and dotted notations can be combined to represent improper lists:

(a b c . d)

is equivalent to

(a . (b . (c . d)))

Needs to finish merging from R7RS!

make-list k [fill]

Returns a newly allocated list of k elements. If a second argument is given, the each element is initialized to fill. Otherwise the initial contents of each element is unspecified.
(make-list 2 3)   ⇒ (3 3)

SRFI-1 list library

The SRFI-1 List Library is available, though not enabled by default. To use its functions you must (require 'list-lib) or (require 'srfi-1).

(require 'list-lib)
(iota 5 0 -0.5) ⇒ (0.0 -0.5 -1.0 -1.5 -2.0)

reverse! list

The result is a list consisting of the elements of list in reverse order. No new pairs are allocated, instead the pairs of list are re-used, with cdr cells of list reversed in place. Note that if list was pair, it becomes the last pair of the reversed result.

Vectors

Vectors are heterogeneous structures whose elements are indexed by integers. A vector typically occupies less space than a list of the same length, and the average time needed to access a randomly chosen element is typically less for the vector than for the list.

The length of a vector is the number of elements that it contains. This number is a non–negative integer that is fixed when the vector is created. The valid indices of a vector are the exact non–negative integer objects less than the length of the vector. The first element in a vector is indexed by zero, and the last element is indexed by one less than the length of the vector.

Vectors are written using the notation #(obj ...). For example, a vector of length 3 3 containing the number zero in element 0, the list (2 2 2 2) in element 1, and the string "Anna" in element 2 can be written as following:

#(0 (2 2 2 2) "Anna")

Note that this is the external representation of a vector. In Kawa, a vector datum is self-evaluating, but for style (and compatibility with R7RS) is is suggested you quote a vector constant:

’#(0 (2 2 2 2) "Anna")  ⇒ #(0 (2 2 2 2) "Anna")

Compare these different ways of creating a vector:

(vector a b c): In this case a, b, and c are expressions evaluated at run-time and the results used to initialize a newly-allocated 3-element vector.
[a b c]: Same as using vector, but more concise, and results in an immutable (non-modifiable) vector.
#(a b c): This is reader syntax and creates a vector literal, at read-time, early in compile-time. The symbols a, b, and c are not evaluated but instead used literally.
`#(,a ,b ,c): This is reader-syntax, using quasi-quotation, so a, b, and c are expressions evaluated at run-time. This is equivalent to [a b c] in that it results in an immutable vector.

vector

The type of vector objects.

vector obj …

Return a newly allocated vector whose elements contain the given arguments. Analogous to list.
(vector 'a 'b 'c)               ⇒  #(a b c)
Alternatively, you can use square-bracket syntax, which results in an immutable vector:
['a 'b 'c]               ⇒  #(a b c)

make-vector k

make-vector k fill

Return a newly allocated vector of k elements. If a second argument is given, then each element is initialized to fill. Otherwise the initial contents of each element is #!null.

vector? obj

Return #t if obj is a vector, #f otherwise.

vector-length vector

Return the number of elements in vector as an exact integer.

vector-ref vector k

It is an error if k is not a valid index of vector. The vector-ref procedure returns the contents of element k of vector.
(vector-ref '#(1 1 2 3 5 8 13 21) 5)     ⇒  8
(vector-ref '#(1 1 2 3 5 8 13 21)
  (inexact->exact (round (* 2 (acos -1)))))
⇒ 13

vector-set! vector k obj

It is an error if k is not a valid index of vector. The vector-set! procedure stores obj in element k of vector, and returns no values.
(let ((vec (vector 0 '(2 2 2 2) "Anna")))
  (vector-set! vec 1 '("Sue" "Sue"))
  vec)
  ⇒  #(0 ("Sue" "Sue") "Anna")

(vector-set! '#(0 1 2) 1 "doe")
  ⇒  error    ;; constant vector

A concise alternative to vector-ref and vector-set! is to use function call syntax. For example:

(let ((vec (vector 0 '(2 2 2 2) "Anna")))
  (set! (vec 1) '("Sue" "Sue"))
  (list (vec 2) (vec 1)))
  ⇒  ("Anna" ("Sue" "Sue"))

vector->list vector [start [end]]

The vector->list procedure returns a newly allocated list of the objects contained in the elements of vector between start and end.
(vector->list '#(dah dah didah))        ⇒  (dah dah didah)
(vector->list '#(dah dah didah) 1 2)    ⇒  (dah)

list->vector list

The list->vector procedure returns a newly created vector initialized to the elements of the list list, in order.
(list->vector '(dididit dah))           ⇒  #(dididit dah)

vector->string vector [start [end]]

The vector->string procedure returns a newly allocated string of the objects contained in the elements of vector between start and end. It is an error if any element of vector between start and end is not a character, or is a character forbidden in strings.
(vector->string #(#\1 #\2 #\3))             ⇒ "123"
(vector->string #(#\1 #\2 #\3 #\4 #\5) 2 4) ⇒ "34"

string->vector string [start [end]]

The string->vector procedure returns a newly created vector initialized to the elements of the string string between start and end.
(string->vector "ABC")       ⇒ #(#\A #\B #\C)
(string->vector "ABCDE" 1 3) ⇒ #(#\B #\C)

vector-copy vector [start [end]]

Returns a newly allocated copy of the elements of the given vector between start and end . The elements of the new vector are the same (in the sense of eqv?) as the elements of the old.
(define a #(1 8 2 8)) ; a may be immutable
(define b (vector-copy a))
(vector-set! b 0 3)   ; b is mutable
b                     ⇒      #(3 8 2 8)
(define c (vector-copy b 1 3))
c                     ⇒ #(8 2)

vector-copy! to at from [start [end]]

Copies the elements of vector from between start and end to vector to, starting at at. The order in which elements are copied is unspecified, except that if the source and destination overlap, copying takes place as if the source is first copied into a temporary vector and then into the destination. This can be achieved without allocating storage by making sure to copy in the correct direction in such circumstances.

It is an error if at is less than zero or greater than the length of to. It is also an error if (- (vector-length to) at) is less than (- end start).
(define a (vector 1 2 3 4 5))
(define b (vector 10 20 30 40 50))
(vector-copy! b 1 a 0 2)
b    ⇒ #(10 1 2 40 50)

vector-append arg...

Creates a newly allocated vector whose elements are the concatenation of the elements of the given arguments. Each arg may be a vector or a list.
(vector-append #(a b c) #(d e f))
    ⇒ #(a b c d e f)

vector-fill! vector fill [start [end]]

Stores fill in in the elements of vector between start and end.
(define a (vector 1 2 3 4 5))
(vector-fill! a 'smash 2 4)
a  ⇒ #(1 2 smash smash 5)

vector-map proc vector_1 vector_2 …

The vectors must all have the same length. proc should accept as many arguments as there are vectors and return a single value.

The vector-map procedure applies proc element–wise to the elements of the vectors and returns a vector of the results, in order. proc is always called in the same dynamic environment as vector-map itself. The order in which proc is applied to the elements of the vectors is unspecified. If multiple returns occur from vector-map, the return values returned by earlier returns are not mutated.

Analogous to map.

vector-for-each proc vector_1 vector_2 …

The vectors must all have the same length. proc should accept as many arguments as there are vectors. The vector-for-each procedure applies proc element–wise to the elements of the vectors for its side effects, in order from the first elements to the last. proc is always called in the same dynamic environment as vector-for-each itself. The return values of vector-for-each are unspecified.

Analogous to for-each.

Uniform vectors

Uniform vectors are vectors whose elements are of the same numeric type. The are defined by SRFI-4. The type names (such as s8vector) are a Kawa extension.

s8vector

The type of uniform vectors where each element can contain a signed 8-bit integer. Represented using an array of byte.

u8vector

The type of uniform vectors where each element can contain an unsigned 8-bit integer. Represented using an array of <byte>, but each element is treated as if unsigned.

This type is a synonym for bytevector, which has extra functions.

s16vector

The type of uniform vectors where each element can contain a signed 16-bit integer. Represented using an array of short.

u16vector

The type of uniform vectors where each element can contain an unsigned 16-bit integer. Represented using an array of short, but each element is treated as if unsigned.

s32vector

The type of uniform vectors where each element can contain a signed 32-bit integer. Represented using an array of int.

u32vector

The type of uniform vectors where each element can contain an unsigned 32-bit integer. Represented using an array of int, but each element is treated as if unsigned.

s64vector

The type of uniform vectors where each element can contain a signed 64-bit integer. Represented using an array of long.

u64vector

The type of uniform vectors where each element can contain an unsigned 64-bit integer. Represented using an array of long, but each element is treated as if unsigned.

f32vector

The type of uniform vectors where each element can contain a 32-bit floating-point real. Represented using an array of float.

f64vector

The type of uniform vectors where each element can contain a 64-bit floating-point real. Represented using an array of double.

s8vector? value

u8vector? value

s16vector? value

u16vector? value

s32vector? value

u32vector? value

s64vector? value

u64vector? value

f32vector? value

f64vector? value

Return true iff value is a uniform vector of the specified type.

make-s8vector n [value]

make-u8vector n [value]

make-s16vector n [value]

make-u16vector n [value]

make-s32vector n [value]

make-u32vector n [value]

make-s64vector n [value]

make-u64vector n [value]

make-f32vector n [value]

make-f64vector n [value]

Create a new uniform vector of the specified type, having room for n elements. Initialize each element to value if it is specified; zero otherwise.

s8vector value ...

u8vector value ...

s16vector value ..

u16vector value ...

s32vector value ...

u32vector value ...

s64vector value ...

u64vector value ...

f32vector value ...

f64vector value ...

Create a new uniform vector of the specified type, whose length is the number of values specified, and initialize it using those values.

s8vector-length v

u8vector-length v

s16vector-length v

u16vector-length v

s32vector-length v

u32vector-length v

s64vector-length v

u64vector-length v

f32vector-length v

f64vector-length v

Return the length (in number of elements) of the uniform vector v.

s8vector-ref v i

u8vector-ref v i

s16vector-ref v i

u16vector-ref v i

s32vector-ref v i

u32vector-ref v i

s64vector-ref v i

u64vector-ref v i

f32vector-ref v i

f64vector-ref v i

Return the element at index i of the uniform vector v.

s8vector-set! v i x

u8vector-set! v i x

s16vector-set! v i x

u16vector-set! v i x

s32vector-set! v i x

u32vector-set! v i x

s64vector-set! v i x

u64vector-set! v i x

f32vector-set! v i x

f64vector-set! v i x

Set the element at index i of uniform vector v to the value x, which must be a number coercible to the appropriate type.

s8vector->list v

u8vector->list v

s16vector->list v

u16vector->list v

s32vector->list v

u32vector->list v

s64vector->list v

u64vector->list v

f32vector->list v

f64vector->list v

Convert the uniform vetor v to a list containing the elments of v.

list->s8vector l

list->u8vector l

list->s16vector l

list->u16vector l

list->s32vector l

list->u32vector l

list->s64vector l

list->u64vector l

list->f32vector l

list->f64vector l

Create a uniform vector of the appropriate type, initializing it with the elements of the list l. The elements of l must be numbers coercible the new vector's element type.

Relationship with Java arrays

Each uniform array type is implemented as an underlying Java array, and a length field. The underlying type is byte[] for u8vector or s8vector; short[] for u16vector or u16vector; int[] for u32vector or s32vector; long[] for u64vector or s64vector; float[] for f32vector; and double[] for f32vector. The length field allows a uniform array to only use the initial part of the underlying array. (This can be used to support Common Lisp's fill pointer feature.) This also allows resizing a uniform vector. There is no Scheme function for this, but you can use the setSize method:

(invoke some-vector 'setSize 200)

If you have a Java array, you can create a uniform vector sharing with the Java array:

(define arr :: byte[] ((primitive-array-new byte) 10))
(define vec :: u8vector (make u8vector arr))

At this point vec uses arr for its underlying storage, so changes to one affect the other. It vec is re-sized so it needs a larger underlying array, then it will no longer use arr.

Bytevectors

Bytevectors represent blocks of binary data. They are fixed-length sequences of bytes, where a byte is an exact integer in the range [0, 255]. A bytevector is typically more space-efficient than a vector containing the same values.

The length of a bytevector is the number of elements that it contains. This number is a non-negative integer that is fixed when the bytevector is created. The valid indexes of a bytevector are the exact non-negative integers less than the length of the bytevector, starting at index zero as with vectors.

The bytevector type is equivalent to the u8vector uniform vector type, but is specified by the R7RS standard.

Bytevectors are written using the notation #u8(byte . . . ). For example, a bytevector of length 3 containing the byte 0 in element 0, the byte 10 in element 1, and the byte 5 in element 2 can be written as following:

#u8(0 10 5)

Bytevector constants are self-evaluating, so they do not need to be quoted in programs.

bytevector

The type of bytevector objects.

bytevector byte …

Return a newly allocated bytevector whose elements contain the given arguments. Analogous to vector.
(bytevector 1 3 5 1 3 5)  ⇒  #u8(1 3 5 1 3 5)
(bytevector)  ⇒  #u8()

bytevector? obj

Return #t if obj is a bytevector, #f otherwise.

make-bytevector k

make-bytevector k byte

The make-bytevector procedure returns a newly allocated bytevector of length k. If byte is given, then all elements of the bytevector are initialized to byte, otherwise the contents of each element are unspecified.
(make-bytevector 2 12) ⇒ #u8(12 12)

bytevector-length bytevector

Returns the length of bytevector in bytes as an exact integer.

bytevector-u8-ref bytevector k

It is an error if k is not a valid index of bytevector. Returns the kth byte of bytevector.
(bytevector-u8-ref ’#u8(1 1 2 3 5 8 13 21) 5)
  ⇒ 8

bytevector-u8-set! bytevector k byte

It is an error if k is not a valid index of bytevector. Stores byte as the kth byte of bytevector.
(let ((bv (bytevector 1 2 3 4)
  (bytevector-u8-set! bv 1 3)
  bv)
  ⇒ #u8(1 3 3 4)

bytevector-copy bytevector [start [end]]

Returns a newly allocated bytevector containing the bytes in bytevector between start and end.
(define a #u8(1 2 3 4 5))
(bytevector-copy a 2 4))
    ⇒ #u8(3 4)

bytevector-copy! to at from [start [end]]

Copies the bytes of bytevectorfrom between start and end to bytevector to, starting at at. The order in which bytes are copied is unspecified, except that if the source and destination overlap, copying takes place as if the source is first copied into a temporary bytevector and then into the destination. This is achieved without allocating storage by making sure to copy in the correct direction in such circumstances.

It is an error if at is less than zero or greater than the length of to. It is also an error if (- (bytevector-length to) at) is less than (- end start).
(define a (bytevector 1 2 3 4 5))
(define b (bytevector 10 20 30 40 50))
(bytevector-copy! b 1 a 0 2)
b        ⇒ #u8(10 1 2 40 50)

bytevector-append bytevector...

Returns a newly allocated bytevector whose elements are the concatenation of the elements in the given bytevectors.
(bytevector-append #u8(0 1 2) #u8(3 4 5))
        ⇒  #u8(0 1 2 3 4 5)

utf8->string bytevector [start [end]]

This procedure decodes the bytes of a bytevector between start and end, interpreting as a UTF-8-encoded string, and returns the corresponding string. It is an error for bytevector to contain invalid UTF-8 byte sequences.
(utf8->string #u8(#x41))  ⇒ "A"

string->utf8 string [start [end]]

This procedure encodes the characters of a string between start and end and returns the corresponding bytevector, in UTF-8 encoding.
(string->utf8 "λ")     ⇒ " #u8(#xCE #xBB)

Streams - lazy lists

Streams, sometimes called lazy lists, are a sequential data structure containing elements computed only on demand. A stream is either null or is a pair with a stream in its cdr. Since elements of a stream are computed only when accessed, streams can be infinite. Once computed, the value of a stream element is cached in case it is needed again.

Note: These are not the same as Java 8 streams.

(require 'srfi-41)
(define fibs
  (stream-cons 1
    (stream-cons 1
      (stream-map +
        fibs
        (stream-cdr fibs)))))
(stream->list 8 fibs) ⇒ (1 1 2 3 5 8 13 21)

See the SRFI 41 specification for details.

The Kawa implementations builds on promises. The stream-null value is a promise that evaluates to the empty list. The result of stream-cons is an eager immutable pair whose car and cdr properties return promises.

Multi-dimensional Arrays

Arrays are heterogeneous data structures whose elements are indexed by integer sequences of fixed length. The length of a valid index sequence is the rank or the number of dimensions of an array. The shape of an array consists of bounds for each index.

The lower bound b and the upper bound e of a dimension are exact integers with (<= b e). A valid index along the dimension is an exact integer k that satisfies both (<= b k) and (< k e). The length of the array along the dimension is the difference (- e b). The size of an array is the product of the lengths of its dimensions.

A shape is specified as an even number of exact integers. These are alternately the lower and upper bounds for the dimensions of an array.

array? obj

Returns #t if obj is an array, otherwise returns #f.

shape bound ...

Returns a shape. The sequence bound ... must consist of an even number of exact integers that are pairwise not decreasing. Each pair gives the lower and upper bound of a dimension. If the shape is used to specify the dimensions of an array and bound ... is the sequence b0 e0 ... bk ek ... of n pairs of bounds, then a valid index to the array is any sequence j0 ... jk ... of n exact integers where each jk satisfies (<= bk jk) and (< jk ek).

The shape of a d-dimensional array is a d * 2 array where the element at k 0 contains the lower bound for an index along dimension k and the element at k 1 contains the corresponding upper bound, where k satisfies (<= 0 k) and (< k d).

make-array shape

make-array shape obj

Returns a newly allocated array whose shape is given by shape. If obj is provided, then each element is initialized to it. Otherwise the initial contents of each element is unspecified. The array does not retain a reference to shape.

array shape obj ...

Returns a new array whose shape is given by shape and the initial contents of the elements are obj ... in row major order. The array does not retain a reference to shape.

array-rank array

Returns the number of dimensions of array.
(array-rank
  (make-array (shape 1 2 3 4)))
Returns 2.

array-start array k

Returns the lower bound for the index along dimension k.

array-end array k

Returns the upper bound for the index along dimension k.

array-ref array k ...

array-ref array index

Returns the contents of the element of array at index k .... The sequence k ... must be a valid index to array. In the second form, index must be either a vector or a 0-based 1-dimensional array containing k ....
(array-ref (array (shape 0 2 0 3)
              'uno 'dos 'tres
              'cuatro 'cinco 'seis)
   1 0)
Returns cuatro.
(let ((a (array (shape 4 7 1 2) 3 1 4)))
   (list (array-ref a 4 1)
         (array-ref a (vector 5 1))
         (array-ref a (array (shape 0 2)
                         6 1))))
Returns (3 1 4).

array-set! array k ... obj

array-set! array index obj

Stores obj in the element of array at index k .... Returns the void value. The sequence k ... must be a valid index to array. In the second form, index must be either a vector or a 0-based 1-dimensional array containing k ....
(let ((a (make-array
            (shape 4 5 4 5 4 5))))
   (array-set! a 4 4 4 "huuhkaja")
   (array-ref a 4 4 4))
Returns "huuhkaja".

share-array array shape proc

Returns a new array of shape shape that shares elements of array through proc. The procedure proc must implement an affine function that returns indices of array when given indices of the array returned by share-array. The array does not retain a reference to shape.
(define i_4
   (let* ((i (make-array
                (shape 0 4 0 4)
                0))
          (d (share-array i
                (shape 0 4)
                (lambda (k)
                   (values k k)))))
      (do ((k 0 (+ k 1)))
          ((= k 4))
         (array-set! d k 1))
      i))
Note: the affinity requirement for proc means that each value must be a sum of multiples of the arguments passed to proc, plus a constant.

Implementation note: arrays have to maintain an internal index mapping from indices k1 ... kd to a single index into a backing vector; the composition of this mapping and proc can be recognised as (+ n0 (* n1 k1) ... (* nd kd)) by setting each index in turn to 1 and others to 0, and all to 0 for the constant term; the composition can then be compiled away, together with any complexity that the user introduced in their procedure.

Multi-dimensional arrays are specified by SRFI-25. In Kawa, a one-dimensional array whose lower bound is 0 is also a sequence. Furthermore, if such an array is simple (not created share-array) it will be implemented using a <vector>. Uniform vectors and strings are also arrays in Kawa. For example:

(share-array
 (f64vector 1.0 2.0 3.0 4.0 5.0 6.0)
 (shape 0 2 0 3)
 (lambda (i j) (+ (* 2 i) j)))

evaluates to a two-dimensionsal array of <double>:

#2a((1.0 2.0 3.0) (3.0 4.0 5.0))

Hash tables

A hashtable is a data structure that associates keys with values. The hashtable has no intrinsic order for the (key, value) associations it contains, and supports in-place modification as the primary means of setting the contents of a hash table. Any object can be used as a key, provided a hash function and a suitable equivalence function is available. A hash function is a procedure that maps keys to exact integer objects.

The hashtable provides key lookup and destructive update in amortised constant time, provided that a good hash function is used. A hash function h is acceptable for an equivalence predicate e iff (e obj1 obj2) implies (= (h obj1) (h obj2)). A hash function h is good for a equivalence predicate e if it distributes the resulting hash values for non-equal objects (by e) as uniformly as possible over the range of hash values, especially in the case when some (non-equal) objects resemble each other by e.g. having common subsequences. This definition is vague but should be enough to assert that e.g. a constant function is not a good hash function.

Kawa provides two complete sets of functions for hashtables:

The functions specified by R6RS have names starting with hashtable-
The functions specified by the older SRFI-69 specifiation have names starting with hash-table-

Both interfaces use the same underlying datatype, so it is possible to mix and match from both sets. That datatype implements java.util.Map. Freshly-written code should probably use the R6RS functions.

R6RS hash tables

To use these hash table functions in your Kawa program you must first:

(import (rnrs hashtables))

This section uses the hashtable parameter name for arguments that must be hashtables, and the key parameter name for arguments that must be hashtable keys.

make-eq-hashtable

make-eq-hashtable k

Return a newly allocated mutable hashtable that accepts arbitrary objects as keys, and compares those keys with eq?. If an argument is given, the initial capacity of the hashtable is set to approximately k elements.

make-eqv-hashtable

make-eqv-hashtable k

Return a newly allocated mutable hashtable that accepts arbitrary objects as keys, and compares those keys with eqv?. If an argument is given, the initial capacity of the hashtable is set to approximately k elements.

make-hashtable hash-function equiv

make-hashtable hash-function equiv k

hash-function and equiv must be procedures. hash-function should accept a key as an argument and should return a non–negative exact integer object. equiv should accept two keys as arguments and return a single value. Neither procedure should mutate the hashtable returned by make-hashtable.

The make-hashtable procedure returns a newly allocated mutable hashtable using hash-function as the hash function and equiv as the equivalence function used to compare keys. If a third argument is given, the initial capacity of the hashtable is set to approximately k elements.

Both hash-function and equiv should behave like pure functions on the domain of keys. For example, the string-hash and string=? procedures are permissible only if all keys are strings and the contents of those strings are never changed so long as any of them continues to serve as a key in the hashtable. Furthermore, any pair of keys for which equiv returns true should be hashed to the same exact integer objects by hash-function.

Note: Hashtables are allowed to cache the results of calling the hash function and equivalence function, so programs cannot rely on the hash function being called for every lookup or update. Furthermore any hashtable operation may call the hash function more than once.

Procedures

hashtable? obj

Return #t if obj is a hashtable, #f otherwise.

hashtable-size hashtable

Return the number of keys contained in hashtable as an exact integer object.

hashtable-ref hashtable key default

Return the value in hashtable associated with key. If hashtable does not contain an association for key, default is returned.

hashtable-set! hashtable key obj

Change hashtable to associate key with obj, adding a new association or replacing any existing association for key, and returns unspecified values.

hashtable-delete! hashtable key

Remove any association for key within hashtable and returns unspecified values.

hashtable-contains? hashtable key

Return #t if hashtable contains an association for key, #f otherwise.

hashtable-update! hashtable key proc default

proc should accept one argument, should return a single value, and should not mutate hashtable.

The hashtable-update! procedure applies proc to the value in hashtable associated with key, or to default if hashtable does not contain an association for key. The hashtable is then changed to associate key with the value returned by proc.

The behavior of hashtable-update! is equivalent to the following code, but is may be (and is in Kawa) implemented more efficiently in cases where the implementation can avoid multiple lookups of the same key:
(hashtable-set!
  hashtable key
  (proc (hashtable-ref
         hashtable key default)))

hashtable-copy hashtable

hashtable-copy hashtable mutable

Return a copy of hashtable. If the mutable argument is provided and is true, the returned hashtable is mutable; otherwise it is immutable.

hashtable-clear! hashtable

hashtable-clear! hashtable k

Remove all associations from hashtable and returns unspecified values.

If a second argument is given, the current capacity of the hashtable is reset to approximately k elements.

hashtable-keys hashtable

Return a vector of all keys in hashtable. The order of the vector is unspecified.

hashtable-entries hashtable

Return two values, a vector of the keys in hashtable, and a vector of the corresponding values.

Example:
(let ((h (make-eqv-hashtable)))
  (hashtable-set! h 1 'one)
  (hashtable-set! h 2 'two)
  (hashtable-set! h 3 'three)
  (hashtable-entries h))
⇒ #(1 2 3) #(one two three) ; two return values
the order of the entries in the result vectors is not known.

Inspection

hashtable-equivalence-function hashtable

Return the equivalence function used by hashtable to compare keys. For hashtables created with make-eq-hashtable and make-eqv-hashtable, returns eq? and eqv? respectively.

hashtable-hash-function hashtable

Return the hash function used by hashtable. For hashtables created by make-eq-hashtable or make-eqv-hashtable, #f is returned.

hashtable-mutable? hashtable

Return #t if hashtable is mutable, otherwise #f.

Hash functions

The equal-hash, string-hash, and string-ci-hash procedures of this section are acceptable as the hash functions of a hashtable only if the keys on which they are called are not mutated while they remain in use as keys in the hashtable.

equal-hash obj

Return an integer hash value for obj, based on its structure and current contents. This hash function is suitable for use with equal? as an equivalence function.

Note: Like equal?, the equal-hash procedure must always terminate, even if its arguments contain cycles.

string-hash string

Return an integer hash value for string, based on its current contents. This hash function is suitable for use with string=? as an equivalence function.

string-ci-hash string

Return an integer hash value for string based on its current contents, ignoring case. This hash function is suitable for use with string-ci=? as an equivalence function.

symbol-hash symbol

Return an integer hash value for symbol.

SRFI-69 hash tables

To use these hash table functions in your Kawa program you must first:

(require 'srfi-69)

(require 'hash-table)

(import (srfi :69 basic-hash-tables))

Type constructors and predicate

make-hash-table [ equal? [ hash [ size-hint]]] → hash-table

Create a new hash table with no associations. The equal? parameter is a predicate that should accept two keys and return a boolean telling whether they denote the same key value; it defaults to the equal? function.

The hash parameter is a hash function, and defaults to an appropriate hash function for the given equal? predicate (see the Hashing section). However, an acceptable default is not guaranteed to be given for any equivalence predicate coarser than equal?, except for string-ci=?. (The function hash is acceptable for equal?, so if you use coarser equivalence than equal? other than string-ci=?, you must always provide the function hash yourself.) (An equivalence predicate c1 is coarser than a equivalence predicate c2 iff there exist values x and y such that (and (c1 x y) (not (c2 x y))).)

The size-hint parameter can be used to suggested an approriate initial size. This option is not part of the SRFI-69 specification (though it is handled by the reference implementation), so specifying that option might be unportable.

hash-table? obj → boolean

A predicate to test whether a given object obj is a hash table.

alist->hash-table alist [ equal? [ hash [ size-hint]]] → hash-table

Takes an association list alist and creates a hash table hash-table which maps the car of every element in alist to the cdr of corresponding elements in alist. The equal?, hash, and size-hint parameters are interpreted as in make-hash-table. If some key occurs multiple times in alist, the value in the first association will take precedence over later ones. (Note: the choice of using cdr (instead of cadr) for values tries to strike balance between the two approaches: using cadr would render this procedure unusable for cdr alists, but not vice versa.)

Reflective queries

hash-table-equivalence-function hash-table

Returns the equivalence predicate used for keys of hash-table.

hash-table-hash-function hash-table

Returns the hash function used for keys of hash-table.

Dealing with single elements

hash-table-ref hash-table key [ thunk ] → value

This procedure returns the value associated to key in hash-table. If no value is associated to key and thunk is given, it is called with no arguments and its value is returned; if thunk is not given, an error is signalled. Given a good hash function, this operation should have an (amortised) complexity of O(1) with respect to the number of associations in hash-table.

hash-table-ref/default hash-table key default → value

Evaluates to the same value as (hash-table-ref hash-table key (lambda () default)). Given a good hash function, this operation should have an (amortised) complexity of O(1) with respect to the number of associations in hash-table.

hash-table-set! hash-table key value → void

This procedure sets the value associated to key in hash-table. The previous association (if any) is removed. Given a good hash function, this operation should have an (amortised) complexity of O(1) with respect to the number of associations in hash-table.

hash-table-delete! hash-table key → void

This procedure removes any association to key in hash-table. It is not an error if no association for the key exists; in this case, nothing is done. Given a good hash function, this operation should have an (amortised) complexity of O(1) with respect to the number of associations in hash-table.

hash-table-exists? hash-table key → boolean

This predicate tells whether there is any association to key in hash-table. Given a good hash function, this operation should have an (amortised) complexity of O(1) with respect to the number of associations in hash-table.

hash-table-update! hash-table key function [ thunk ] → void

Semantically equivalent to, but may be implemented more efficiently than, the following code:
(hash-table-set! hash-table key
                 (function (hash-table-ref hash-table key thunk)))

hash-table-update!/default hash-table key function default → void

Behaves as if it evaluates to (hash-table-update! hash-table key function (lambda () default)).

Dealing with the whole contents

hash-table-size hash-table → integer

Returns the number of associations in hash-table. This operation takes constant time.

hash-table-keys hash-table → list

Returns a list of keys in hash-table. The order of the keys is unspecified.

hash-table-values hash-table → list

Returns a list of values in hash-table. The order of the values is unspecified, and is not guaranteed to match the order of keys in the result of hash-table-keys.

hash-table-walk hash-table proc → void

proc should be a function taking two arguments, a key and a value. This procedure calls proc for each association in hash-table, giving the key of the association as key and the value of the association as value. The results of proc are discarded. The order in which proc is called for the different associations is unspecified.

hash-table-fold hash-table f init-value → final-value

This procedure calls f for every association in hash-table with three arguments: the key of the association key, the value of the association value, and an accumulated value, val. The val is init-value for the first invocation of f, and for subsequent invocations of f, the return value of the previous invocation of f. The value final-value returned by hash-table-fold is the return value of the last invocation of f. The order in which f is called for different associations is unspecified.

hash-table->alist hash-table → alist

Returns an association list such that the car of each element in alist is a key in hash-table and the corresponding cdr of each element in alist is the value associated to the key in hash-table. The order of the elements is unspecified.

The following should always produce a hash table with the same mappings as a hash table h:
(alist->hash-table (hash-table->alist h)
                        (hash-table-equivalence-function h)
                        (hash-table-hash-function h))

hash-table-copy hash-table → hash-table

Returns a new hash table with the same equivalence predicate, hash function and mappings as in hash-table.

hash-table-merge! hash-table1 hash-table2 → hash-table

Adds all mappings in hash-table2 into hash-table1 and returns the resulting hash table. This function may modify hash-table1 destructively.

Hash functions

The Kawa implementation always calls these hash functions with a single parameter, and expects the result to be within the entire (32-bit signed) int range, for compatibility with standard hashCode methods.

hash object [ bound ] → integer

Produces a hash value for object in the range from 0 (inclusive) tp to bound (exclusive).

If bound is not given, the Kawa implementation returns a value within the range (- (expt 2 32)) (inclusive) to (- (expt 2 32) 1) (inclusive). It does this by calling the standard hashCode method, and returning the result as is. (If the object is the Java null value, 0 is returned.) This hash function is acceptable for equal?.

string-hash string [ bound ] → integer

The same as hash, except that the argument string must be a string. (The Kawa implementation returns the same as the hash function.)

string-ci-hash string [ bound ] → integer

The same as string-hash, except that the case of characters in string does not affect the hash value produced. (The Kawa implementation returns the same the hash function applied to the lower-cased string.)

hash-by-identity object [ bound ] → integer

The same as hash, except that this function is only guaranteed to be acceptable for eq?. Kawa uses the identityHashCode method of java.lang.System.

Up: Documentation

Previous: Characters and text

Next: Eval and Environments

The Kawa Scheme Language

The Kawa Scheme Language

Data structures

Sequences

Lists

SRFI-1 list library

Vectors

Uniform vectors

Relationship with Java arrays

Bytevectors

Streams - lazy lists

Multi-dimensional Arrays

Hash tables

R6RS hash tables

Procedures

Inspection

Hash functions

SRFI-69 hash tables

Type constructors and predicate

Reflective queries

Dealing with single elements

Dealing with the whole contents

Hash functions