The Kawa Scheme Language

The Kawa Scheme Language

Syntax

Notation

The formal syntax for Kawa Scheme is written in an extended BNF. Non–terminals are written like-this. Case is insignificant for non–terminal names. Literal text (terminals) are written like this.

All spaces in the grammar are for legibility.

The following extensions to BNF are used to make the description more concise: thing^* or thing... both mean zero or more occurrences of thing, and thing^+ means at least one thing.

Some non-terminal names refer to the Unicode scalar values of the same name: character-tabulation (U+0009), linefeed (U+000A), carriage-return (U+000D), line-tabulation (U+000B), form-feed (U+000C), space (U+0020), next-line (U+0085), line-separator (U+2028), and paragraph-separator (U+2029).

Lexical and datum syntax

The syntax of Scheme code is organized in three levels:

  1. the lexical syntax that describes how a program text is split into a sequence of lexemes,

  2. the datum syntax, formulated in terms of the lexical syntax, that structures the lexeme sequence as a sequence of syntactic data, where a syntactic datum is a recursively structured entity,

  3. the program syntax formulated in terms of the datum syntax, imposing further structure and assigning meaning to syntactic data.

Syntactic data (also called external representations) double as a notation for objects, and the read and write procedures can be used for reading and writing syntactic data, converting between their textual representation and the corresponding objects. Each syntactic datum represents a corresponding datum value. A syntactic datum can be used in a program to obtain the corresponding datum value using quote.

Scheme source code consists of syntactic data and (non–significant) comments. Syntactic data in Scheme source code are called forms. (A form nested inside another form is called a subform.) Consequently, Scheme's syntax has the property that any sequence of characters that is a form is also a syntactic datum representing some object. This can lead to confusion, since it may not be obvious out of context whether a given sequence of characters is intended to be a representation of objects or the text of a program. It is also a source of power, since it facilitates writing programs such as interpreters or compilers that treat programs as objects (or vice versa).

A datum value may have several different external representations. For example, both #e28.000 and #x1c are syntactic data representing the exact integer object 28, and the syntactic data (8 13), ( 08 13 ), (8 . (13 . ())) all represent a list containing the exact integer objects 8 and 13. Syntactic data that represent equal objects (in the sense of equal?) are always equivalent as forms of a program.

Because of the close correspondence between syntactic data and datum values, we sometimes uses the term datum for either a syntactic datum or a datum value when the exact meaning is apparent from the context.

Lexical syntax

The lexical syntax determines how a character sequence is split into a sequence of lexemes, omitting non–significant portions such as comments and whitespace. The character sequence is assumed to be text according to the Unicode standard. Some of the lexemes, such as identifiers, representations of number objects, strings etc., of the lexical syntax are syntactic data in the datum syntax, and thus represent objects. Besides the formal account of the syntax, this section also describes what datum values are represented by these syntactic data.

The lexical syntax, in the description of comments, contains a forward reference to datum, which is described as part of the datum syntax. Being comments, however, these datums do not play a significant role in the syntax.

Case is significant except in representations of booleans, number objects, and in hexadecimal numbers specifying Unicode scalar values. For example, #x1A and #X1a are equivalent. The identifier Foo is, however, distinct from the identifier FOO.

Formal account

Interlexeme-space may occur on either side of any lexeme, but not within a lexeme.

Identifiers, ., numbers, characters, and booleans, must be terminated by a delimiter or by the end of the input.

lexeme ::= identifier | boolean | number
         | character | string
         | ( |  ) |  [ |  ] |  #(
         | ' | ` | , | ,@ | .
         | #' |  #` |  #, |  #,@
delimiter ::= ( |  ) |  [ | ] | " | ; | #
         | whitespace

((UNFINISHED))

Line endings

Line endings are significant in Scheme in single–line comments and within string literals. In Scheme source code, any of the line endings in line-ending marks the end of a line. Moreover, the two–character line endings carriage-return linefeed and carriage-return next-line each count as a single line ending.

In a string literal, a line-ending not preceded by a \ stands for a linefeed character, which is the standard line–ending character of Scheme.

Whitespace and comments

intraline-whitespace ::= space | character-tabulation
whitespace ::=  intraline-whitespace
         | linefeed | line-tabulation | form-feed
         | carriage-return | next-line
         | any character whose category is Zs, Zl, or Zp
line-ending ::= linefeed | carriage return
         | carriage-return linefeed | next-line
         | carriage-return next-line | line-separator
comment ::=  ; all subsequent characters up to a line-ending
                or paragraph-separator
         | nested-comment
         | #; interlexeme-space datum
         | shebang-comment
nested-comment ::=  #| comment-text comment-cont|#
comment-text ::= character sequence not containing #| or |#
comment-cont ::= nested-comment comment-text
atmosphere ::= whitespace | comment
interlexeme-space ::= atmosphere*

As a special case the characters #!/ are treated as starting a comment, but only at the beginning of file. These characters are used on Unix systems as an Shebang interpreter directive. The Kawa reader skips the entire line. If the last non-whitespace character is @backslashchar{} (backslash) then the following line is also skipped, and so on.

shebang-comment ::= #! absolute-filename text up to non-escaped line-ending

Whitespace characters are spaces, linefeeds, carriage returns, character tabulations, form feeds, line tabulations, and any other character whose category is Zs, Zl, or Zp. Whitespace is used for improved readability and as necessary to separate lexemes from each other. Whitespace may occur between any two lexemes, but not within a lexeme. Whitespace may also occur inside a string, where it is significant.

The lexical syntax includes several comment forms. In all cases, comments are invisible to Scheme, except that they act as delimiters, so, for example, a comment cannot appear in the middle of an identifier or representation of a number object.

A semicolon (;) indicates the start of a line comment. The comment continues to the end of the line on which the semicolon appears.

Another way to indicate a comment is to prefix a datum with #;, possibly with interlexeme-space before the datum. The comment consists of the comment prefix #; and the datum together. This notation is useful for “commenting out” sections of code.

Block comments may be indicated with properly nested #| and |# pairs.

#|
   The FACT procedure computes the factorial of a
   non-negative integer.
|#
(define fact
  (lambda (n)
    ;; base case
    (if (= n 0)
        #;(= n 1)
        1       ; identity of *
        (* n (fact (- n 1))))))

Identifiers

identifier ::= initial subsequent*
         | peculiar-identifier
initial ::= constituent | special-initial
         | inline-hex-escape
letter ::= a | b | c | ... | z
         | A | B | C | ... | Z
constituent ::= letter
         | any character whose Unicode scalar value is greater than
             127, and whose category is Lu, Ll, Lt, Lm, Lo, Mn,
             Nl, No, Pd, Pc, Po, Sc, Sm, Sk, So, or Co

special-initial ::= ! | $ | % | & | * | / | < | =
         | > | ? | ^ | _ | ~
subsequent ::= initial | digit
         | any character whose category is Nd, Mc, or Me
         | special-subsequent
digit ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
oct-digit ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7
hex-digit ::= digit
         | a | A | b | B | c | C | d | D | e | E | f | F
special-subsequent ::= + | - | . | @
escape-sequence ::= inline-hex-escape
         | \character-except-x
         | multi-escape-sequence
inline-hex-escape ::= \xhex-scalar-value;
hex-scalar-value ::= hex-digit+
multi-escape-sequence ::= |symbol-element^*|
symbol-element ::=  any character except | or @backslashchar{}
         | inline-hex-escape | mnemonic-escape | @backslashchar{}|

character-except-x ::= any character except x
peculiar-identifier ::= + | - | ... | -> subsequent^*

Most identifiers allowed by other programming languages are also acceptable to Scheme. In general, a sequence of letters, digits, and “extended alphabetic characters” is an identifier when it begins with a character that cannot begin a representation of a number object. In addition, +, -, and ... are identifiers, as is a sequence of letters, digits, and extended alphabetic characters that begins with the two–character sequence ->. Here are some examples of identifiers:

lambda         q                soup
list->vector   +                V17a
<=             a34kTMNs         ->-
the-word-recursion-has-many-meanings

Extended alphabetic characters may be used within identifiers as if they were letters. The following are extended alphabetic characters:

! $ % & * + - . / < = > ? @ ^ _ ~

Moreover, all characters whose Unicode scalar values are greater than 127 and whose Unicode category is Lu, Ll, Lt, Lm, Lo, Mn, Mc, Me, Nd, Nl, No, Pd, Pc, Po, Sc, Sm, Sk, So, or Co can be used within identifiers. In addition, any character can be used within an identifier when specified using an escape-sequence. For example, the identifier H\x65;llo is the same as the identifier Hello.

Kawa supports two additional non-R6RS ways of making identifiers using special characters, both taken from Common Lisp: Any character (except x) following a backslash is treated as if it were a letter; as is any character between a pair of vertical bars.

Identifiers have two uses within Scheme programs:

In contrast with older versions of Scheme, the syntax distinguishes between upper and lower case in identifiers and in characters specified via their names, but not in numbers, nor in inline hex escapes used in the syntax of identifiers, characters, or strings. The following directives give explicit control over case folding.

#!fold-case

#!no-fold-case

These directives may appear anywhere comments are permitted and are treated as comments, except that they affect the reading of subsequent data. The #!fold-case directive causes the read procedure to case-fold (as if by string-foldcase) each identifier and character name subsequently read from the same port. The #!no-fold-case directive causes the read procedure to return to the default, non-folding behavior.

Note that colon : is treated specially for colon notation in Kawa Scheme, though it is a special-initial in standard Scheme (R6RS).

Numbers

((INCOMPLETE))

number ::= ((TODO))
  | quantity
decimal ::= digitoptional-exponent
  | . digitoptional-exponent
  | digit. digitoptional-exponent

optional-exponent ::= empty
  | exponent-marker optional-sign digit+
exponent-marker ::= e | s | f | d | l

The letter used for the exponent in a floating-point literal determines its type:

e

Returns a gnu.math.DFloat - for example 12e2. Note this matches the default when there is no exponent-marker.

s or f

Returns a primitive float (or java.lang.Float when boxed as an object) - for example 12s2 or 12f2.

d

Returns a primitive double (or java.lang.Double when boxed) - for example 12d2.

l

Returns a java.math.BigDecimal - for example 12l2.

optional-sign ::= empty | + | -
digit-2 ::= 0 | 1
digit-8 ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7
digit-10 ::= digit
digit-16 ::= digit-10 | a | b | c | d | e | f

Datum syntax

The datum syntax describes the syntax of syntactic data in terms of a sequence of lexemes, as defined in the lexical syntax.

The following grammar describes the syntax of syntactic data in terms of various kinds of lexemes defined in the grammar in section “Lexical Syntax”:

datum ::= defining-datum
         | nondefining-datum
         | defined-datum
nondefining-datum ::= lexeme-datum
         | compound-datum

lexeme-datum ::= boolean | number
         | character | string | symbol
symbol ::= identifier
compound-datum ::= list | vector | uniform-vector | extended-string-literal | xml-literal
list ::= (datum*)
         | (datum. datum)
         | abbreviation
vector ::= #(datum*)
uniform-vector ::= TODO

Datum labels

datum-label ::= #indexnum=
defining-datum ::= datum-label^+nondefining-datum
defined-datum ::= #indexnum#
indexnum ::= digit^+

The lexical syntax #n=datum reads the same as datum, but also results in datum being labelled by n, which must a sequence of digits.

The lexical syntax #n# serves as a reference to some object labelled by #n=; the result is the same object (in the sense of eq?) as the #n=.

Together, these syntaxes permit the notation of structures with shared or circular substructure.

(let ((x (list 'a 'b 'c)))
  (set-cdr! (cddr x) x)
  x)    ⇒ #0=(a b c . #0#)

The scope of a datum label is the portion of the outermost datum in which it appears that is to the right of the label. Consequently, a reference #n# can occur only after a label #n=; it is an error to attempt a forward reference. In addition, it is an error if the reference appears as the labelled object itself (as in #n=#n#), because the object labelled by #n= is not well defined in this case.

Abbreviations

abbreviation ::= r6rs-abbreviation | kawa-abbreviation
r6rs-abbreviation ::= abbrev-prefix datum
abbrev-prefix ::= ' | ` | , | ,@
         | #' | #`
kawa-abbreviation ::= XXX

The following abbreviations are expanded at read-time:

'datum

means (quote datum).

`datum

means (quasiquote datum).

,datum

means (unquote datum).

,@datum

means (unquote-splicing datum).

#'datum

means (syntax datum).

#`datum

means (quasisyntax datum).

#,datum

means (unsyntax datum). This abbreviation is currently only recognized when nested inside an explicit #`datum form, because of a conflict with SRFI-10 named constructors.

#,@datum

means (unsyntax-splicing datum).

datum1:datum2

means ($lookup$ datum1 (quasiquote datum2)). See the section called “Property access using colon notation”.

[expression ...]

means ($bracket-list$ expression ...).

operator[expression ...]

means ($bracket-apply$ operator expression ...).

Hash-prefixed forms

A number of different special forms are indicated by an initial hash (number) symbols (#). Here is a table summarizing them.

Case is ignored for the character followed the #. Thus #x and #X are the same.

#:keyword

Guile-style keyword syntax (deprecated).

#\

Character literals.

#!

See the section called “Special named constants”.

#`datum

Equivalent to (quasisyntax datum). Convenience syntax for syntax-case macros.

#'datum

Equivalent to (syntax datum). Convenience syntax for syntax-case macros.

#,datum

Equivalent to (unsyntax datum). Currently only recognized when inside a #`template form. Convenience syntax for syntax-case macros.

#,(name datum ...)

Special named constructors. This syntax is deprecated, because it conflicts with unsyntax. It is only recognized when not in a #`template form.

#,@datum

Equivalent to (unsyntax-splicing datum).

#(

A vector.

#|

Start of nested-comment.

#/regex/

See the section called “Regular expressions”.

#<

See the section called “XML literals”.

#;datum

A datum comment - the datum is ignored. (An interlexeme-space may appear before the datum.)

#number=datum

A reference definition, allowing cyclic and shared structure. Equivalent to the datum, but also defines an association between the integer number and that datum, which can be used by a subsequent #number# form.

#number#

A back-reference, allowing cyclic and shared structure.

#b

A binary (base-2) number.

#d

A decimal (base-10) number.

#e

A prefix to treat the following number as exact.

#f
#false

The standard boolean false object.

#fn(number ...)

A uniform vector of floating-point numbers. The parameter n is a precision, which can be 32 or 64. See the section called “Uniform vectors”.

#i

A prefix to treat the following number as inexact.

#o

An octal (base-8) number.

#baser

A number in the specified base (radix).

#sn(number ...)

A uniform vector of signed integers. The parameter n is a precision, which can be 8, 16, 32, or 64. See the section called “Uniform vectors”.

#t
#true

The standard boolean true object.

#un(number ...)

A uniform vector of unsigned integers. The parameter n is a precision, which can be 8, 16, 32, or 64. See the section called “Uniform vectors”.

#x

A hexadecimal (base-16) number.

The follow named constructor forms are supported:

#,(path path)

#,(filepath path)

#,(URI path)

#,(symbol local-name [uri [prefix]])
#,(symbol local-name namespace)

#,(namespace uri [prefix])

#,(duration duration)

Primitive expression syntax

expression ::= literal-expression | variable-reference
  | procedure-call | TODO

Literal expressions

literal-expression ::= (quote datum)
  | ' datum
  | constant
constant ::= number | boolean | character | string

(quote datum) evaluates to datum, which may be any external representation of a Scheme object. This notation is used to include literal constants in Scheme code.

(quote a)               ⇒  a
(quote #(a b c))        ⇒  #(a b c)
(quote (+ 1 2))         ⇒  (+ 1 2)

(quote datum) may be abbreviated as 'datum. The two notations are equivalent in all respects.

’a                      ⇒  a
’#(a b c)               ⇒  #(a b c)
’()                     ⇒  ()
’(+ 1 2)                ⇒  (+ 1 2)
’(quote a)              ⇒  (quote a)
’’a                     ⇒  (quote a)

Numerical constants, string constants, character constants, bytevector constants, and boolean constants evaluate to themselves; they need not be quoted.

145932          ⇒  145932
#t              ⇒  #t
"abc"           ⇒  "abc"

Variable references

variable-reference ::= identifier

An expression consisting of a variable is a variable reference if it is not a macro use (see below). The value of the variable reference is the value stored in the location to which the variable is bound. It is a syntax violation to reference an unbound variable.

The following example examples assumes the base library has been imported:

(define x 28)
x   ⇒  28

Procedure calls

procedure-call ::= (operator operand …)
operator ::= expression
operand ::= expression

A procedure call consists of expressions for the procedure to be called and the arguments to be passed to it, with enclosing parentheses. A form in an expression context is a procedure call if operator is not an identifier bound as a syntactic keyword.

When a procedure call is evaluated, the operator and operand expressions are evaluated (in an unspecified order) and the resulting procedure is passed the resulting arguments.

(+ 3 4)                ⇒  7
((if #f + *) 3 4)      ⇒  12

Property access using colon notation

The colon notation accesses named parts (properties) of a value. It is used to get and set fields, call methods, construct compound symbols, and more. Evaluating the form owner:property evaluates the owner then it extracts the named property of the result.

property-access-abbreviation ::= property-owner-expression:property-name
property-owner-expression ::= expression
property-name ::= identifier | ,expression

The property-name is usually a literal name, but it can be an unquoted expression (i.e. following a ,), in which case the name is evaluated at run-time. No separators are allowed on either side of the colon.

The input syntax owner:part is translated by the Scheme reader to the internal representation ($lookup$ owner (quasiquote part)).

Part lookup rules

Evaluation proceeds as follows. First property-owner-expression is evaluated to yield a owner object. Evaluating the property-name yields a part name, which is a simple symbol: Either the literal identifier, or the result of evaluating the property-name expression. If the expression evaluates to a string, it is converted to a symbol, as if using string->symbol.

  • If the owner implements gnu.mapping.HasNamedParts, then the result is that of invoking the get method of the owner with the part name as a parameter.

    As a special case of this rule, if owner is a gnu.mapping.Namespace, then the result is the compound symbol in that namespace.

  • If owner is a java.lang.Class or a gnu.bytecode.ObjectType, the result is the static member named part (i.e. a static field, method, or member class).

  • If owner is a java.lang.Package object, we get the member class or sub-package named part.

  • Otherwise, we look for a named member (instance member or field).

    Note you can't use colon notation to invoke instance methods of a Class, because it will match a previous rule. For example if you want to invoke the getDeclaredMethod method of the java.util.List , you can't write (java.util.List:getDeclaredMethod because that will look for a static method in java.util.List.

If the colon form is on the left-hand-side of an assignment (set!), then the named part is modified as appropriate.

Specific cases

Some of these are depecated; more compact and readable forms are usually preferred.

Type literal

(type:<>)

Returns the type. Deprecated; usually you can just write:

type

Type cast

(type:@atchar{} expression)

Performs a cast. Deprecated; usually you can just write:

->type

Type test

(type:instanceof? expression)

Deprecated; usually you can just write:

(type? expression)

New object construction

(type:new arg ...)

Deprecated; usually you can just write:

(type arg ...)

Invoking methods

(instance:method-name arg ...)
(class:method-name instance arg ...)
(class:method-name arg ...)
(*:method-name instance arg ...)

For details see the section called “Calling Java methods from Scheme”.

Accessing fields

class:field-name
instance:field-name
(prefix:.field-name instance)

For details see the section called “Accessing object fields”.

Getting array length

(expression:.length)

Bodies

The body of a lambda, let, let*, let-values, let*-values, letrec, or letrec* expression, or that of a definition with a body consists of zero or more definitions or expressions followed by a final expression. (Standard Scheme requires that all definitions precede all expressions.)

body ::= statement ... expression
statement ::= definition | expression

Each identifier defined by a definition is local to the body. That is, the identifier is bound, and the region of the binding is the entire body. Example:

(let ((x 5))
  (define foo (lambda (y) (bar x y)))
  (define bar (lambda (a b) (+ (* a b) a)))
  (foo (+ x 3)))
⇒ 45

When begin, let-syntax, or letrec-syntax forms occur in a body prior to the first expression, they are spliced into the body. Some or all of the body, including portions wrapped in begin, let-syntax, or letrec-syntax forms, may be specified by a macro use.

An expanded body containing variable definitions can be converted into an equivalent letrec* expression. (If there is a definition following expressions you may need to convert the expressions to dummy definitions.) For example, the let expression in the above example is equivalent to

(let ((x 5))
  (letrec* ((foo (lambda (y) (bar x y)))
            (bar (lambda (a b) (+ (* a b) a))))
    (foo (+ x 3))))

Syntax and conditional compilation

cond-expand cond-expand-clause* [(else command-or-definition*)]

cond-expand-clause ::= (feature-requirement command-or-definition*)
feature-requirement ::= feature-identifier
  | (and feature-requirement*)
  | (or feature-requirement*)
  | (not feature-requirement)
feature-identifier ::= a symbol which is the name or alias of a SRFI

The cond-expand form tests for the existence of features at macro-expansion time. It either expands into the body of one of its clauses or signals an error during syntactic processing. cond-expand expands into the body of the first clause whose feature requirement is currently satisfied; the else clause, if present, is selected if none of the previous clauses is selected.

A feature requirement has an obvious interpretation as a logical formula, where the feature-identifier variables have meaning true if the feature corresponding to the feature identifier, as specified in the SRFI registry, is in effect at the location of the cond-expand form, and false otherwise. A feature requirement is satisfied if its formula is true under this interpretation.

Examples:

(cond-expand
    ((and srfi-1 srfi-10)
     (write 1))
    ((or srfi-1 srfi-10)
     (write 2))
    (else))
(cond-expand
  (command-line
   (define (program-name) (car (argv)))))

The second example assumes that command-line is an alias for some feature which gives access to command line arguments. Note that an error will be signaled at macro-expansion time if this feature is not present.

You can use java-6, java-7, java-8, or java-9 to check check if the underlying Java is a specific version or newer. For example the name java-7 matches for either Java 7, Java 8, or newer, as reported by System property "java.version".

You can use class-exists:ClassName to check if ClassName exists at compile-time.

features

Returns a list of feature identifiers which cond-expand treats as true. This not a complete list - for example class-exists:ClassName feature identifiers are not included. It is an error to modify this list. Here is example of what features might return:

(features)  ⇒
(complex exact-complex full-unicode java-7 java-6 kawa
 ratios srfi-0 srfi-4 srfi-6 srfi-8 srfi-9 srfi-11
 srfi-16 srfi-17 srfi-23 srfi-25 srfi-26 srfi-28 srfi-30
 srfi-39 string-normalize-unicode threads)

include path

Read the contents of the file at path as a sequence of forms, and treat the result as if the resulting forms were the forms of a begin.

include-relative path

Same as include, except that the path is relative to the path of the source-file containing the include-file.

Macros

Libraries and top–level programs can define and use new kinds of derived expressions and definitions called syntactic abstractions or macros. A syntactic abstraction is created by binding a keyword to a macro transformer or, simply, transformer.

The transformer determines how a use of the macro (called a macro use) is transcribed into a more primitive form.

Most macro uses have the form:

(keyword datum …)

where keyword is an identifier that uniquely determines the kind of form. This identifier is called the syntactic keyword, or simply keyword. The number of datums and the syntax of each depends on the syntactic abstraction.

Macro uses can also take the form of improper lists, singleton identifiers, or set! forms, where the second subform of the set! is the keyword:

(keyword datum … . datum)
keyword
(set! keyword datum)

The define-syntax, let-syntax and letrec-syntax forms create bindings for keywords, associate them with macro transformers, and control the scope within which they are visible.

The syntax-rules and identifier-syntax forms create transformers via a pattern language. Moreover, the syntax-case form allows creating transformers via arbitrary Scheme code.

Keywords occupy the same name space as variables. That is, within the same scope, an identifier can be bound as a variable or keyword, or neither, but not both, and local bindings of either kind may shadow other bindings of either kind.

Macros defined using syntax-rules and identifier-syntax are “hygienic” and “referentially transparent” and thus preserve Scheme's lexical scoping.

  • If a macro transformer inserts a binding for an identifier (variable or keyword) not appearing in the macro use, the identifier is in effect renamed throughout its scope to avoid conflicts with other identifiers.

  • If a macro transformer inserts a free reference to an identifier, the reference refers to the binding that was visible where the transformer was specified, regardless of any local bindings that may surround the use of the macro.

Macros defined using the syntax-case facility are also hygienic unless datum->syntax is used.

Kawa supports most of the syntax-case feature.

Syntax definitions are valid wherever definitions are. They have the following form:

define-syntax keyword transformer-spec

The keyword is a identifier, and transformer-spec is a function that maps syntax forms to syntax forms, usually an instance of syntax-rules. If the define-syntax occurs at the top level, then the top-level syntactic environment is extended by binding the keyword to to the specified transformer, but existing references to any top-level binding for keyword remain unchanged. Otherwise, it is a internal syntax definition, and is local to the body in which it is defined.

(let ((x 1) (y 2))
   (define-syntax swap!
     (syntax-rules ()
       ((swap! a b)
        (let ((tmp a))
          (set! a b)
          (set! b tmp)))))
   (swap! x y)
   (list x y))  ⇒ (2 1)

Macros can expand into definitions in any context that permits them. However, it is an error for a definition to define an identifier whose binding has to be known in order to determine the meaning of the definitoion itself, or of any predecing definiton that belongs to the same group of internal definitions.

define-syntax-case name (literals) (pattern expr) ...

A convenience macro to make it easy to define syntax-case-style macros. Defines a macro with the given name and list of literals. Each pattern has the form of a syntax-rules-style pattern, and it is matched against the macro invocation syntax form. When a match is found, the corresponding expr is evaluated. It must evaluate to a syntax form, which replaces the macro invocation.

(define-syntax-case macro-name (literals)
  (pat1 result1)
  (pat2 result2))

is equivalent to:

(define-syntax macro-name
  (lambda (form)
    (syntax-case form (literals)
      (pat1 result1)
      (pat2 result2))))

define-macro (name lambda-list) form ...

This form is deprecated. Functionally equivalent to defmacro.

defmacro name lambda-list form ...

This form is deprecated. Instead of

(defmacro (name ...)
  (let ... `(... ,exp ...)))

you should probably do:

(define-syntax-case name ()
  ((_ ...) (let #`(... #,exp ...))))

and instead of

(defmacro (name ... var ...) `(... var ...))

you should probably do:

(define-syntax-case name ()
  ((_ ... var ...) #`(... var ...))

Defines an old-style macro a la Common Lisp, and installs (lambda lambda-list form ...) as the expansion function for name. When the translator sees an application of name, the expansion function is called with the rest of the application as the actual arguments. The resulting object must be a Scheme source form that is futher processed (it may be repeatedly macro-expanded).

gentemp

Returns a new (interned) symbol each time it is called. The symbol names are implementation-dependent. (This is not directly macro-related, but is often used in conjunction with defmacro to get a fresh unique identifier.)

expand form

The result of evaluating form is treated as a Scheme expression, syntax-expanded to internal form, and then converted back to (roughly) the equivalent expanded Scheme form.

This can be useful for debugging macros.

To access this function, you must first (require 'syntax-utils).

(require 'syntax-utils)
(expand '(cond ((> x y) 0) (else 1))) ⇒ (if (> x y) 0 1)

Identifier predicates

identifier? obj

Return #t if obj is an identifier, i.e., a syntax object representing an identifier, and #f otherwise.

The identifier? procedure is often used within a fender to verify that certain subforms of an input form are identifiers, as in the definition of rec, which creates self–contained recursive objects, below.

(define-syntax rec
  (lambda (x)
    (syntax-case x ()
      ((_ x e)
       (identifier? #'x)
       #'(letrec ((x e)) x)))))

(map (rec fact
       (lambda (n)
         (if (= n 0)
             1
             (* n (fact (- n 1))))))
     '(1 2 3 4 5))    ⇒ (1 2 6 24 120)

(rec 5 (lambda (x) x))  ⇒ exception

The procedures bound-identifier=? and free-identifier=? each take two identifier arguments and return #t if their arguments are equivalent and #f otherwise. These predicates are used to compare identifiers according to their intended use as free references or bound identifiers in a given context.

bound-identifier=? id_1 id_2

id_1 and id_2 must be identifiers.

The procedure bound-identifier=? returns #t if a binding for one would capture a reference to the other in the output of the transformer, assuming that the reference appears within the scope of the binding, and #f otherwise.

In general, two identifiers are bound-identifier=? only if both are present in the original program or both are introduced by the same transformer application (perhaps implicitly, see datum->syntax).

The bound-identifier=? procedure can be used for detecting duplicate identifiers in a binding construct or for other preprocessing of a binding construct that requires detecting instances of the bound identifiers.

free-identifier=? id_1 id_2

id_1 and id_2 must be identifiers.

The free-identifier=? procedure returns #t if and only if the two identifiers would resolve to the same binding if both were to appear in the output of a transformer outside of any bindings inserted by the transformer. (If neither of two like–named identifiers resolves to a binding, i.e., both are unbound, they are considered to resolve to the same binding.)

Operationally, two identifiers are considered equivalent by free-identifier=? if and only the topmost matching substitution for each maps to the same binding or the identifiers have the same name and no matching substitution.

The syntax-case and syntax-rules forms internally use free-identifier=? to compare identifiers listed in the literals list against input identifiers.

(let ((fred 17))
  (define-syntax a
    (lambda (x)
      (syntax-case x ()
        ((_ id) #'(b id fred)))))
  (define-syntax b
    (lambda (x)
      (syntax-case x ()
        ((_ id1 id2)
         #`(list
             #,(free-identifier=? #'id1 #'id2)
             #,(bound-identifier=? #'id1 #'id2))))))
  (a fred))
    ⇒ (#t #f)

The following definition of unnamed let uses bound-identifier=? to detect duplicate identifiers.

(define-syntax let
  (lambda (x)
    (define unique-ids?
      (lambda (ls)
        (or (null? ls)
            (and (let notmem? ((x (car ls)) (ls (cdr ls)))
                   (or (null? ls)
                       (and (not (bound-identifier=? x (car ls)))
                            (notmem? x (cdr ls)))))
                 (unique-ids? (cdr ls))))))
    (syntax-case x ()
      ((_ ((i v) ...) e1 e2 ...)
       (unique-ids? #'(i ...))
       #'((lambda (i ...) e1 e2 ...) v ...)))))

The argument #'(i ...) to unique-ids? is guaranteed to be a list by the rules given in the description of syntax above.

With this definition of let:

(let ((a 3) (a 4)) (+ a a))    ⇒ syntax error

However,

(let-syntax
  ((dolet (lambda (x)
            (syntax-case x ()
              ((_ b)
               #'(let ((a 3) (b 4)) (+ a b)))))))
  (dolet a))
⇒ 7

since the identifier a introduced by dolet and the identifier a extracted from the input form are not bound-identifier=?.

Rather than including else in the literals list as before, this version of case explicitly tests for else using free-identifier=?.

(define-syntax case
  (lambda (x)
    (syntax-case x ()
      ((_ e0 ((k ...) e1 e2 ...) ...
          (else-key else-e1 else-e2 ...))
       (and (identifier? #'else-key)
            (free-identifier=? #'else-key #'else))
       #'(let ((t e0))
           (cond
            ((memv t '(k ...)) e1 e2 ...)
            ...
            (else else-e1 else-e2 ...))))
      ((_ e0 ((ka ...) e1a e2a ...)
          ((kb ...) e1b e2b ...) ...)
       #'(let ((t e0))
           (cond
            ((memv t '(ka ...)) e1a e2a ...)
            ((memv t '(kb ...)) e1b e2b ...)
            ...))))))

With either definition of case, else is not recognized as an auxiliary keyword if an enclosing lexical binding for else exists. For example,

(let ((else #f))
  (case 0 (else (write "oops"))))    ⇒ syntax error

since else is bound lexically and is therefore not the same else that appears in the definition of case.

Syntax-object and datum conversions

syntax->datum syntax-object

syntax-object->datum syntax-object

Strip all syntactic information from a syntax object and returns the corresponding Scheme datum.

Identifiers stripped in this manner are converted to their symbolic names, which can then be compared with eq?. Thus, a predicate symbolic-identifier=? might be defined as follows.

(define symbolic-identifier=?
  (lambda (x y)
    (eq? (syntax->datum x)
         (syntax->datum y))))

datum->syntax template-id datum [srcloc]

datum->syntax-object template-id datum

template-id must be a template identifier and datum should be a datum value.

The datum->syntax procedure returns a syntax-object representation of datum that contains the same contextual information as template-id, with the effect that the syntax object behaves as if it were introduced into the code when template-id was introduced.

If srcloc is specified (and neither #f or #!null), it specifies the file position (including line number) for the result. In that case it should be a syntax object representing a list; otherwise it is currently ignored, though future extensions may support other ways of specifying the position.

The datum->syntax procedure allows a transformer to “bend” lexical scoping rules by creating implicit identifiers that behave as if they were present in the input form, thus permitting the definition of macros that introduce visible bindings for or references to identifiers that do not appear explicitly in the input form. For example, the following defines a loop expression that uses this controlled form of identifier capture to bind the variable break to an escape procedure within the loop body. (The derived with-syntax form is like let but binds pattern variables.)

(define-syntax loop
  (lambda (x)
    (syntax-case x ()
      ((k e ...)
       (with-syntax
           ((break (datum->syntax #'k 'break)))
         #'(call-with-current-continuation
             (lambda (break)
               (let f () e ... (f)))))))))

(let ((n 3) (ls '()))
  (loop
    (if (= n 0) (break ls))
    (set! ls (cons 'a ls))
    (set! n (- n 1))))
⇒ (a a a)

Were loop to be defined as:

(define-syntax loop
  (lambda (x)
    (syntax-case x ()
      ((_ e ...)
       #'(call-with-current-continuation
           (lambda (break)
             (let f () e ... (f))))))))

the variable break would not be visible in e ....

The datum argument datum may also represent an arbitrary Scheme form, as demonstrated by the following definition of include.

(define-syntax include
  (lambda (x)
    (define read-file
      (lambda (fn k)
        (let ((p (open-file-input-port fn)))
          (let f ((x (get-datum p)))
            (if (eof-object? x)
                (begin (close-port p) '())
                (cons (datum->syntax k x)
                      (f (get-datum p))))))))
    (syntax-case x ()
      ((k filename)
       (let ((fn (syntax->datum #'filename)))
         (with-syntax (((exp ...)
                        (read-file fn #'k)))
           #'(begin exp ...)))))))

(include "filename") expands into a begin expression containing the forms found in the file named by "filename". For example, if the file flib.ss contains:

(define f (lambda (x) (g (* x x))))

and the file glib.ss contains:

(define g (lambda (x) (+ x x)))

the expression:

(let ()
  (include "flib.ss")
  (include "glib.ss")
  (f 5))

evaluates to 50.

The definition of include uses datum->syntax to convert the objects read from the file into syntax objects in the proper lexical context, so that identifier references and definitions within those expressions are scoped where the include form appears.

Using datum->syntax, it is even possible to break hygiene entirely and write macros in the style of old Lisp macros. The lisp-transformer procedure defined below creates a transformer that converts its input into a datum, calls the programmer's procedure on this datum, and converts the result back into a syntax object scoped where the original macro use appeared.

(define lisp-transformer
  (lambda (p)
    (lambda (x)
      (syntax-case x ()
        ((kwd . rest)
         (datum->syntax #'kwd
           (p (syntax->datum x))))))))

Convenience forms

with-syntax ((pattern expression)) body

The with-syntax form is used to bind pattern variables, just as let is used to bind variables. This allows a transformer to construct its output in separate pieces, then put the pieces together.

Each pattern is identical in form to a syntax-case pattern. The value of each expression is computed and destructured according to the corresponding pattern, and pattern variables within the pattern are bound as with syntax-case to the corresponding portions of the value within body.

The with-syntax form may be defined in terms of syntax-case as follows.

(define-syntax with-syntax
  (lambda (x)
    (syntax-case x ()
      ((_ ((p e0) ...) e1 e2 ...)
       (syntax (syntax-case (list e0 ...) ()
                 ((p ...) (let () e1 e2 ...))))))))

The following definition of cond demonstrates the use of with-syntax to support transformers that employ recursion internally to construct their output. It handles all cond clause variations and takes care to produce one-armed if expressions where appropriate.

(define-syntax cond
  (lambda (x)
    (syntax-case x ()
      ((_ c1 c2 ...)
       (let f ((c1 #'c1) (c2* #'(c2 ...)))
         (syntax-case c2* ()
           (()
            (syntax-case c1 (else =>)
             (((else e1 e2 ...) #'(begin e1 e2 ...))
              ((e0) #'e0)
              ((e0 => e1)
               #'(let ((t e0)) (if t (e1 t))))
              ((e0 e1 e2 ...)
               #'(if e0 (begin e1 e2 ...)))))
           ((c2 c3 ...)
            (with-syntax ((rest (f #'c2 #'(c3 ...))))
              (syntax-case c1 (=>)
                ((e0) #'(let ((t e0)) (if t t rest)))
                ((e0 => e1)
                 #'(let ((t e0)) (if t (e1 t) rest)))
                ((e0 e1 e2 ...)
                 #'(if e0
                        (begin e1 e2 ...)
                        rest)))))))))))

quasisyntax template

unsyntax

unsyntax-splicing

The quasisyntax form is similar to syntax, but it allows parts of the quoted text to be evaluated, in a manner similar to the operation of quasiquote.

Within a quasisyntax template, subforms of unsyntax and unsyntax-splicing forms are evaluated, and everything else is treated as ordinary template material, as with syntax.

The value of each unsyntax subform is inserted into the output in place of the unsyntax form, while the value of each unsyntax-splicing subform is spliced into the surrounding list or vector structure. Uses of unsyntax and unsyntax-splicing are valid only within quasisyntax expressions.

A quasisyntax expression may be nested, with each quasisyntax introducing a new level of syntax quotation and each unsyntax or unsyntax-splicing taking away a level of quotation. An expression nested within n quasisyntax expressions must be within n unsyntax or unsyntax-splicing expressions to be evaluated.

As noted in abbreviation, #`template is equivalent to (quasisyntax template), #,template is equivalent to (unsyntax template), and #,@template is equivalent to (unsyntax-splicing template). Note that for backwards compatibility, you should only use #,template inside a literal #`template form.

The quasisyntax keyword can be used in place of with-syntax in many cases. For example, the definition of case shown under the description of with-syntax above can be rewritten using quasisyntax as follows.

(define-syntax case
  (lambda (x)
    (syntax-case x ()
      ((_ e c1 c2 ...)
       #`(let ((t e))
           #,(let f ((c1 #'c1) (cmore #'(c2 ...)))
               (if (null? cmore)
                   (syntax-case c1 (else)
                     ((else e1 e2 ...)
                      #'(begin e1 e2 ...))
                     (((k ...) e1 e2 ...)
                      #'(if (memv t '(k ...))
                            (begin e1 e2 ...))])
                   (syntax-case c1 ()
                     (((k ...) e1 e2 ...)
                      #`(if (memv t '(k ...))
                            (begin e1 e2 ...)
                            #,(f (car cmore)
                                  (cdr cmore))))))))))))

Note: Any syntax-rules form can be expressed with syntax-case by making the lambda expression and syntax expressions explicit, and syntax-rules may be defined in terms of syntax-case as follows.

(define-syntax syntax-rules
  (lambda (x)
    (syntax-case x ()
      ((_ (lit ...) ((k . p) t) ...)
       (for-all identifier? #'(lit ... k ...))
       #'(lambda (x)
           (syntax-case x (lit ...)
             ((_ . p) #'t) ...))))))

Named quasi-literals

Traditional Scheme has only a few kinds of values, and thus only a few builtin kinds of literals. Modern Scheme allows defining new types, so it is desirable to have a mechanism for defining literal values for the new types.

Consider the URI type. You can create a new instance of a URI using a constructor function:

(URI "http://example.com/")

This isn't too bad, though the double-quote characters are an ugly distraction. However, if you need to construct the string it gets messy:

(URI (string-append base-uri "icon.png"))

Instead use can write:

&URI{http://example.com/}

or:

&URI{&[base-uri]icon.png}

This syntax is translated by the Scheme reader to the more familiar but more verbose equivalent forms:

($construct$:URI "http://example.com/")
($construct$:URI $<<$ base-uri $>>$ "icon.png")

So for this to work there just needs to be a definition of $construct$:URI, usually a macro. Normal scope rules apply; typically you'd define $construct$:URI in a module.

The names $<<$ and $>>$ are bound to unique zero-length strings. They are used to allow the implementation of $construct$:URI to determine which arguments are literal and which come from escaped expressions.

If you want to define your own $construct$:tag, or to read motivation and details, see the SRFI 108 specification.

extended-datum-literal ::=
    & cname @lbracechar{} [initial-ignorednamed-literal-part^* @rbracechar{}
  | & cname [ expression^* ]@lbracechar{} [initial-ignorednamed-literal-part^* @rbracechar{}
cname ::= identifier
named-literal-part ::=
    any character except &, @lbracechar{} or @rbracechar{}
  | @lbracechar{} named-literal-part^+ @rbracechar{}
  | char-ref
  | entity-ref
  | special-escape
  | enclosed-part
  | extended-datum-literal