The formal syntax for Kawa Scheme is written in an extended BNF.
Non–terminals are written like-this
. Case is insignificant
for non–terminal names.
Literal text (terminals) are written like this
.
All spaces in the grammar are for legibility.
The following extensions to BNF are used to make the
description more concise: thing
^* or thing
...
both mean zero or more occurrences of thing
,
and thing
^+ means at least one thing
.
Some non-terminal names refer to the Unicode scalar values of the same
name: character-tabulation
(U+0009), linefeed
(U+000A),
carriage-return
(U+000D), line-tabulation
(U+000B),
form-feed
(U+000C), space
(U+0020), next-line
(U+0085), line-separator
(U+2028), and paragraph-separator
(U+2029).
The syntax of Scheme code is organized in three levels:
the lexical syntax that describes how a program text is split into a sequence of lexemes,
the datum syntax, formulated in terms of the lexical syntax, that structures the lexeme sequence as a sequence of syntactic data, where a syntactic datum is a recursively structured entity,
the program syntax formulated in terms of the datum syntax, imposing further structure and assigning meaning to syntactic data.
Syntactic data (also called external representations) double as a
notation for objects, and the read
and
write
procedures can be used for reading and writing syntactic data,
converting between their textual representation and the corresponding
objects. Each syntactic datum represents a corresponding
datum value. A syntactic datum can be used in a program to obtain the
corresponding datum value using quote
.
Scheme source code consists of syntactic data and (non–significant) comments. Syntactic data in Scheme source code are called forms. (A form nested inside another form is called a subform.) Consequently, Scheme's syntax has the property that any sequence of characters that is a form is also a syntactic datum representing some object. This can lead to confusion, since it may not be obvious out of context whether a given sequence of characters is intended to be a representation of objects or the text of a program. It is also a source of power, since it facilitates writing programs such as interpreters or compilers that treat programs as objects (or vice versa).
A datum value may have several different external representations. For
example, both #e28.000
and #x1c
are syntactic data
representing the exact integer object 28, and the syntactic data
(8 13)
, ( 08 13 )
, (8 . (13 . ()))
all represent a
list containing the exact integer objects 8 and 13. Syntactic data that
represent equal objects (in the sense of equal?
)
are always equivalent as forms of a program.
Because of the close correspondence between syntactic data and datum values, we sometimes uses the term datum for either a syntactic datum or a datum value when the exact meaning is apparent from the context.
The lexical syntax determines how a character sequence is split into a sequence of lexemes, omitting non–significant portions such as comments and whitespace. The character sequence is assumed to be text according to the Unicode standard. Some of the lexemes, such as identifiers, representations of number objects, strings etc., of the lexical syntax are syntactic data in the datum syntax, and thus represent objects. Besides the formal account of the syntax, this section also describes what datum values are represented by these syntactic data.
The lexical syntax, in the description of comments, contains a forward
reference to datum
, which is described as part of the datum
syntax. Being comments, however, these datum
s do not play a
significant role in the syntax.
Case is significant except in representations of booleans, number
objects, and in hexadecimal numbers specifying Unicode scalar values.
For example, #x1A
and #X1a
are equivalent. The identifier
Foo
is, however, distinct from the identifier FOO
.
Interlexeme-space
may occur on either side of any lexeme, but not
within a lexeme.
Identifier
s, .
, number
s, character
s, and
boolean
s, must be terminated by a delimiter
or by the end
of the input.
lexeme
::=
identifier
| boolean
| number
| character
| string
| (
| )
| [
| ]
| #(
| '
| `
| ,
| ,@
| .
| #'
| #`
| #,
| #,@
delimiter
::=
(
| )
| [
| ]
| "
| ;
| #
| whitespace
((UNFINISHED))
Line endings are significant in Scheme in single–line comments
and within string literals.
In Scheme source code, any of the line endings in line-ending
marks the end of a line. Moreover, the two–character line endings
carriage-return
linefeed
and carriage-return
next-line
each count as a single line ending.
In a string literal, a line-ending
not preceded by a \
stands for a linefeed character, which is the standard line–ending
character of Scheme.
intraline-whitespace
::=
space
| character-tabulation
whitespace
::=
intraline-whitespace
| linefeed
| line-tabulation
| form-feed
| carriage-return
| next-line
| any character whose category is Zs, Zl, or Zp
line-ending
::=
linefeed
| carriage return
| carriage-return
linefeed
| next-line
| carriage-return
next-line
| line-separator
comment
::=
;
all subsequent characters up to a line-ending
or paragraph-separator
| nested-comment
| #;
interlexeme-space
datum
| shebang-comment
nested-comment
::=
#|
comment-text
comment-cont
* |#
comment-text
::=
character sequence not containing #|
or |#
comment-cont
::=
nested-comment
comment-text
atmosphere
::=
whitespace
| comment
interlexeme-space
::=
atmosphere
*
As a special case the characters #!/
are treated as starting a comment,
but only at the beginning of file. These characters are used on
Unix systems as an Shebang interpreter directive.
The Kawa reader skips the entire line.
If the last non-whitespace character is @backslashchar{}
(backslash) then the following line is also skipped, and so on.
Whitespace characters are spaces, linefeeds, carriage returns, character tabulations, form feeds, line tabulations, and any other character whose category is Zs, Zl, or Zp. Whitespace is used for improved readability and as necessary to separate lexemes from each other. Whitespace may occur between any two lexemes, but not within a lexeme. Whitespace may also occur inside a string, where it is significant.
The lexical syntax includes several comment forms. In all cases, comments are invisible to Scheme, except that they act as delimiters, so, for example, a comment cannot appear in the middle of an identifier or representation of a number object.
A semicolon (;
) indicates the start of a line comment. The
comment continues to the end of the line on which the semicolon appears.
Another way to indicate a comment is to prefix a datum
with #;
, possibly with
interlexeme-space
before the datum
. The comment consists
of the comment prefix #;
and the datum
together. This
notation is useful for “commenting out” sections of code.
Block comments may be indicated with properly nested #|
and
|#
pairs.
#| The FACT procedure computes the factorial of a non-negative integer. |# (define fact (lambda (n) ;; base case (if (= n 0) #;(= n 1) 1 ; identity of * (* n (fact (- n 1))))))
identifier
::=
initial
subsequent
*
| peculiar-identifier
initial
::=
constituent
| special-initial
| inline-hex-escape
letter
::=
a
| b
| c
| ... | z
| A
| B
| C
| ... | Z
constituent
::=
letter
| any character whose Unicode scalar value is greater than
127, and whose category is Lu, Ll, Lt, Lm, Lo, Mn,
Nl, No, Pd, Pc, Po, Sc, Sm, Sk, So, or Co
special-initial
::=
!
| $
| %
| &
| *
| /
| <
| =
| >
| ?
| ^
| _
| ~
subsequent
::=
initial
| digit
| any character whose category is Nd, Mc, or Me
| special-subsequent
digit
::=
0
| 1
| 2
| 3
| 4
| 5
| 6
| 7
| 8
| 9
oct-digit
::=
0
| 1
| 2
| 3
| 4
| 5
| 6
| 7
hex-digit
::=
digit
| a
| A
| b
| B
| c
| C
| d
| D
| e
| E
| f
| F
special-subsequent
::=
+
| -
| .
| @
escape-sequence
::=
inline-hex-escape
| \
character-except-x
| multi-escape-sequence
inline-hex-escape
::=
\x
hex-scalar-value
;
hex-scalar-value
::=
hex-digit
+
multi-escape-sequence
::=
|
symbol-element
^*|
symbol-element
::=
any character except |
or @backslashchar{}
| inline-hex-escape
| mnemonic-escape
| @backslashchar{}|
character-except-x
::=
any character except x
peculiar-identifier
::=
+
| -
| ...
| ->
subsequent
^*
Most identifiers allowed by other programming languages are also
acceptable to Scheme. In general, a sequence of letters, digits, and
“extended alphabetic characters” is an identifier when it begins with
a character that cannot begin a representation of a number object. In
addition, +
, -
, and ...
are identifiers, as is a
sequence of letters, digits, and extended alphabetic characters that
begins with the two–character sequence ->
. Here are some
examples of identifiers:
lambda q soup list->vector + V17a <= a34kTMNs ->- the-word-recursion-has-many-meanings
Extended alphabetic characters may be used within identifiers as if they were letters. The following are extended alphabetic characters:
! $ % & * + - . / < = > ? @ ^ _ ~
Moreover, all characters whose Unicode scalar values are greater than
127 and whose Unicode category is Lu, Ll, Lt, Lm, Lo, Mn, Mc, Me, Nd,
Nl, No, Pd, Pc, Po, Sc, Sm, Sk, So, or Co can be used within
identifiers. In addition, any character can be used within an
identifier when specified using an escape-sequence
. For example,
the identifier H\x65;llo
is the same as the identifier
Hello
.
Kawa supports two additional non-R6RS ways of making
identifiers using special characters, both taken from Common Lisp:
Any character (except x
) following a backslash is treated
as if it were a letter
;
as is any character between a pair of vertical bars.
Identifiers have two uses within Scheme programs:
Any identifier may be used as a variable or as a syntactic keyword.
When an identifier appears as or with in literal, it is being used to denote a symbol.
In contrast with older versions of Scheme, the syntax distinguishes between upper and lower case in identifiers and in characters specified via their names, but not in numbers, nor in inline hex escapes used in the syntax of identifiers, characters, or strings. The following directives give explicit control over case folding.
These directives may appear anywhere comments are permitted and are
treated as comments, except that they affect the reading of subsequent
data. The #!fold-case
directive causes the read
procedure to case-fold (as if by string-foldcase
) each
identifier and character name subsequently read from the same
port. The #!no-fold-case
directive causes the read
procedure to return to the default, non-folding behavior.
Note that colon :
is treated specially for
colon notation in Kawa Scheme,
though it is a special-initial
in standard Scheme (R6RS).
((INCOMPLETE))
number
::=
((TODO))
| quantity
decimal
::=
digit
+ optional-exponent
| .
digit
+ optional-exponent
| digit
+ .
digit
+ optional-exponent
optional-exponent
::=
empty
| exponent-marker
optional-sign
digit
+
exponent-marker
::=
e
| s
| f
| d
| l
The letter used for the exponent in a floating-point literal determines its type:
e
Returns a
gnu.math.DFloat
- for example12e2
. Note this matches the default when there is noexponent-marker
.-
s
orf
Returns a primitive
float
(orjava.lang.Float
when boxed as an object) - for example12s2
or12f2
.d
Returns a primitive
double
(orjava.lang.Double
when boxed) - for example12d2
.l
Returns a
java.math.BigDecimal
- for example12l2
.
The datum syntax describes the syntax of syntactic data in terms of a
sequence of lexeme
s, as defined in the lexical syntax.
The following grammar describes the syntax of syntactic data in terms of various kinds of lexemes defined in the grammar in section “Lexical Syntax”:
datum
::=
defining-datum
| nondefining-datum
| defined-datum
nondefining-datum
::=
lexeme-datum
| compound-datum
lexeme-datum
::=
boolean
| number
| character
| string
| symbol
symbol
::=
identifier
compound-datum
::=
list
| vector
| uniform-vector
| extended-string-literal
| xml-literal
list
::=
(
datum
*)
| (
datum
+ .
datum
)
| abbreviation
vector
::=
#(
datum
*)
uniform-vector
::=
TODO
datum-label
::=
#
indexnum
=
defining-datum
::=
datum-label
^+nondefining-datum
defined-datum
::=
#
indexnum
#
indexnum
::=
digit
^+
The lexical syntax #
reads the
same as n
=datum
datum
, but also results in datum
being
labelled by n
, which must a sequence of digits.
The lexical syntax #
serves as a reference to some
object labelled by n
##
; the result is the same object
(in the sense of n
=eq?
) as the #
.
n
=
Together, these syntaxes permit the notation of structures with shared or circular substructure.
(let ((x (list 'a 'b 'c))) (set-cdr! (cddr x) x) x) ⇒ #0=(a b c . #0#)
The scope of a datum label is the portion of the outermost
datum in which it appears that is to the right of the label.
Consequently, a reference #
can occur
only after a label n
##
;
it is an error to attempt a forward reference.
In addition, it is an error if the reference appears as the labelled
object itself (as in n
=#
), because the object
labelled by n
=#n
##
is not well defined in this case.
n
=
abbreviation
::=
r6rs-abbreviation
| kawa-abbreviation
r6rs-abbreviation
::=
abbrev-prefix
datum
abbrev-prefix
::=
'
| `
| ,
| ,@
| #'
| #`
kawa-abbreviation
::=
XXX
The following abbreviations are expanded at read-time:
-
'
datum
means
(quote
datum
)
.-
`
datum
means
(quasiquote
datum
)
.-
,
datum
means
(unquote
datum
)
.-
,@
datum
means
(unquote-splicing
datum
)
.-
#'
datum
means
(syntax
datum
)
.-
#`
datum
means
(quasisyntax
datum
)
.-
#,
datum
means
(unsyntax
datum
)
. This abbreviation is currently only recognized when nested inside an explicit#`
datum
form, because of a conflict with SRFI-10 named constructors.-
#,@
datum
means
(unsyntax-splicing
datum
)
.-
datum1
:
datum2
means
($lookup$
datum1
(quasiquote
datum2
))
. See the section called “Property access using colon notation”.-
[
expression
...]
means
($bracket-list$
expression
...)
.-
operator
[
expression
...]
means
($bracket-apply$
operator
expression
...)
.
A number of different special forms are indicated by an
initial hash (number) symbols (#
).
Here is a table summarizing them.
Case is ignored for the character followed the #
.
Thus #x
and #X
are the same.
-
#:
keyword
Guile-style keyword syntax (deprecated).
#\
#!
-
#`
datum
Equivalent to
(quasisyntax
. Convenience syntax for syntax-case macros.datum
)-
#'
datum
Equivalent to
(syntax
. Convenience syntax for syntax-case macros.datum
)-
#,
datum
Equivalent to
(unsyntax
. Currently only recognized when inside adatum
)#`
form. Convenience syntax for syntax-case macros.template
-
#,(
name
datum
...)
Special named constructors. This syntax is deprecated, because it conflicts with
unsyntax
. It is only recognized when not in a#`
form.template
-
#,@
datum
Equivalent to
(unsyntax-splicing
.datum
)#(
A vector.
#|
Start of nested-comment.
-
#/
regex
/
#<
-
#;
datum
A datum comment - the
datum
is ignored. (Aninterlexeme-space
may appear before thedatum
.)-
#
number
=
datum
A reference definition, allowing cyclic and shared structure. Equivalent to the
datum
, but also defines an association between the integernumber
and thatdatum
, which can be used by a subsequent#
form.number
#-
#
number
#
A back-reference, allowing cyclic and shared structure.
#b
A binary (base-2) number.
#d
A decimal (base-10) number.
#e
A prefix to treat the following number as exact.
#f
#false
The standard boolean false object.
-
#f
n
(
number
...)
A uniform vector of floating-point numbers. The parameter
n
is a precision, which can be 32 or 64. See the section called “Uniform vectors”.#i
A prefix to treat the following number as inexact.
#o
An octal (base-8) number.
-
#
base
r
A number in the specified
base
(radix).-
#s
n
(
number
...)
A uniform vector of signed integers. The parameter
n
is a precision, which can be 8, 16, 32, or 64. See the section called “Uniform vectors”.#t
#true
The standard boolean true object.
-
#u
n
(
number
...)
A uniform vector of unsigned integers. The parameter
n
is a precision, which can be 8, 16, 32, or 64. See the section called “Uniform vectors”.#x
A hexadecimal (base-16) number.
The follow named constructor forms are supported:
-
#,(path
path
)
-
#,(filepath
path
)
-
#,(URI
path
)
-
#,(symbol
local-name
[uri
[prefix
]])
-
#,(symbol
local-name
namespace
)
-
#,(namespace
uri
[prefix
])
-
#,(duration
duration
)
expression
::=
literal-expression
| variable-reference
| procedure-call
| TODO
literal-expression
::=
(quote
datum
)
| '
datum
| constant
constant
::=
number
| boolean
| character
| string
(quote
evaluates to datum
)datum
,
which may be any external representation of a Scheme object.
This notation is used to include literal constants in Scheme code.
(quote a) ⇒ a (quote #(a b c)) ⇒ #(a b c) (quote (+ 1 2)) ⇒ (+ 1 2)
(quote
may be abbreviated as datum
)'
.
The two notations are equivalent in all respects.
datum
’a ⇒ a ’#(a b c) ⇒ #(a b c) ’() ⇒ () ’(+ 1 2) ⇒ (+ 1 2) ’(quote a) ⇒ (quote a) ’’a ⇒ (quote a)
Numerical constants, string constants, character constants, bytevector constants, and boolean constants evaluate to themselves; they need not be quoted.
145932 ⇒ 145932 #t ⇒ #t "abc" ⇒ "abc"
variable-reference
::=
identifier
An expression consisting of a variable is a variable reference if it is not a macro use (see below). The value of the variable reference is the value stored in the location to which the variable is bound. It is a syntax violation to reference an unbound variable.
The following example examples assumes the base library has been imported:
(define x 28) x ⇒ 28
procedure-call
::=
(
operator
operand
…)
operator
::=
expression
operand
::=
expression
A procedure call consists of expressions for the procedure to be called
and the arguments to be passed to it, with enclosing parentheses. A
form in an expression context is a procedure call if operator
is
not an identifier bound as a syntactic keyword.
When a procedure call is evaluated, the operator and operand expressions are evaluated (in an unspecified order) and the resulting procedure is passed the resulting arguments.
(+ 3 4) ⇒ 7 ((if #f + *) 3 4) ⇒ 12
The colon notation accesses named parts (properties) of a value.
It is used to get and set fields, call methods, construct compound symbols,
and more.
Evaluating the form
evaluates the owner
:property
then it extracts the named owner
of the result.
property
property-access-abbreviation
::=
property-owner-expression
:
property-name
property-owner-expression
::=
expression
property-name
::=
identifier
| ,
expression
The property-name
is usually a literal name,
but it can be an unquoted expression
(i.e. following a ,
),
in which case the name is evaluated at run-time.
No separators are allowed on either side of the colon.
The input syntax
is translated by
the Scheme reader to the internal representation owner
:part
($lookup$
.
owner
(quasiquote part
))
Evaluation proceeds as follows.
First property-owner-expression
is
evaluated to yield a owner
object.
Evaluating the property-name
yields a part
name,
which is a simple symbol: Either
the literal identifier
, or the result of evaluating the
property-name expression
.
If the expression
evaluates to a string, it is converted to
a symbol, as if using string->symbol
.
-
If the
owner
implementsgnu.mapping.HasNamedParts
, then the result is that of invoking theget
method of theowner
with thepart
name as a parameter.As a special case of this rule, if
owner
is agnu.mapping.Namespace
, then the result is the compound symbol in that namespace. If
owner
is ajava.lang.Class
or agnu.bytecode.ObjectType
, the result is the static member namedpart
(i.e. a static field, method, or member class).If
owner
is ajava.lang.Package
object, we get the member class or sub-package namedpart
.-
Otherwise, we look for a named member (instance member or field).
Note you can't use colon notation to invoke instance methods of a
Class
, because it will match a previous rule. For example if you want to invoke thegetDeclaredMethod
method of thejava.util.List
, you can't write(java.util.List:getDeclaredMethod
because that will look for a static method injava.util.List
.
If the colon form is on the left-hand-side of an assignment (set!
),
then the named part is modified as appropriate.
(
instance
:
method-name
arg
...)
(
class
:
method-name
instance
arg
...)
(
class
:
method-name
arg
...)
(*:
method-name
instance
arg
...)
For details see the section called “Calling Java methods from Scheme”.
class
:
field-name
instance
:
field-name
(
prefix
:.
field-name
instance
)
For details see the section called “Accessing object fields”.
(
expression
:.length)
The body
of a lambda
, let
, let*
,
let-values
, let*-values
, letrec
, or letrec*
expression, or that of a definition with a body consists of zero or more
definitions or expressions followed by a final expression.
(Standard Scheme requires that all definitions precede all expressions.)
body
::=
statement
...expression
statement
::=
definition
|expression
Each identifier defined by a definition is local to the body
.
That is, the identifier is bound, and the region of the binding is the
entire body
.
Example:
(let ((x 5)) (define foo (lambda (y) (bar x y))) (define bar (lambda (a b) (+ (* a b) a))) (foo (+ x 3))) ⇒ 45
When begin
, let-syntax
, or letrec-syntax
forms
occur in a body prior to the first expression, they are spliced into the
body. Some or all of the body, including portions wrapped in
begin
, let-syntax
, or letrec-syntax
forms, may be
specified by a macro use.
An expanded body
containing variable definitions can be
converted into an equivalent letrec*
expression.
(If there is a definition following expressions you may need to
convert the expressions to dummy definitions.) For example,
the let
expression in the above example is equivalent to
(let ((x 5)) (letrec* ((foo (lambda (y) (bar x y))) (bar (lambda (a b) (+ (* a b) a)))) (foo (+ x 3))))
cond-expand
cond-expand-clause*
[(else
command-or-definition*
])
cond-expand-clause
::=
(
feature-requirement
command-or-definition
*)
feature-requirement
::=
feature-identifier
|(and
feature-requirement
*)
|(or
feature-requirement
*)
|(not
feature-requirement
)
feature-identifier
::=
a symbol which is the name or alias of a SRFI
The
cond-expand
form tests for the existence of features at macro-expansion time. It either expands into the body of one of its clauses or signals an error during syntactic processing.cond-expand
expands into the body of the first clause whose feature requirement is currently satisfied; theelse
clause, if present, is selected if none of the previous clauses is selected.A feature requirement has an obvious interpretation as a logical formula, where the
feature-identifier
variables have meaning true if the feature corresponding to the feature identifier, as specified in the SRFI registry, is in effect at the location of thecond-expand
form, and false otherwise. A feature requirement is satisfied if its formula is true under this interpretation.Examples:
(cond-expand ((and srfi-1 srfi-10) (write 1)) ((or srfi-1 srfi-10) (write 2)) (else))(cond-expand (command-line (define (program-name) (car (argv)))))The second example assumes that
command-line
is an alias for some feature which gives access to command line arguments. Note that an error will be signaled at macro-expansion time if this feature is not present.You can use
java-6
,java-7
,java-8
, orjava-9
to check check if the underlying Java is a specific version or newer. For example the namejava-7
matches for either Java 7, Java 8, or newer, as reported bySystem
property"java.version"
.You can use
class-exists:
to check ifClassName
exists at compile-time.
ClassName
Returns a list of feature identifiers which
cond-expand
treats as true. This not a complete list - for exampleclass-exists:
feature identifiers are not included. It is an error to modify this list. Here is example of whatClassName
features
might return:(features) ⇒ (complex exact-complex full-unicode java-7 java-6 kawa ratios srfi-0 srfi-4 srfi-6 srfi-8 srfi-9 srfi-11 srfi-16 srfi-17 srfi-23 srfi-25 srfi-26 srfi-28 srfi-30 srfi-39 string-normalize-unicode threads)
Read the contents of the file at
path
as a sequence of forms, and treat the result as if the resulting forms were the forms of abegin
.
Libraries and top–level programs can define and use new kinds of derived expressions and definitions called syntactic abstractions or macros. A syntactic abstraction is created by binding a keyword to a macro transformer or, simply, transformer.
The transformer determines how a use of the macro (called a macro use) is transcribed into a more primitive form.
Most macro uses have the form:
(keyword
datum
…)
where keyword
is an identifier that uniquely determines the kind
of form. This identifier is called the syntactic keyword, or
simply keyword. The number of datum
s and the syntax of
each depends on the syntactic abstraction.
Macro uses can also take the form of improper lists, singleton
identifiers, or set!
forms, where the second subform of the
set!
is the keyword:
(keyword
datum
… .datum
)keyword
(set!keyword
datum
)
The define-syntax
, let-syntax
and letrec-syntax
forms create bindings for keywords, associate them with macro
transformers, and control the scope within which they are visible.
The syntax-rules
and identifier-syntax
forms create
transformers via a pattern language. Moreover, the syntax-case
form allows creating transformers via arbitrary Scheme code.
Keywords occupy the same name space as variables. That is, within the same scope, an identifier can be bound as a variable or keyword, or neither, but not both, and local bindings of either kind may shadow other bindings of either kind.
Macros defined using syntax-rules
and identifier-syntax
are “hygienic” and “referentially transparent” and thus preserve
Scheme's lexical scoping.
If a macro transformer inserts a binding for an identifier (variable or keyword) not appearing in the macro use, the identifier is in effect renamed throughout its scope to avoid conflicts with other identifiers.
If a macro transformer inserts a free reference to an identifier, the reference refers to the binding that was visible where the transformer was specified, regardless of any local bindings that may surround the use of the macro.
Macros defined using the syntax-case
facility are also hygienic
unless datum->syntax
is used.
Kawa supports most of the syntax-case
feature.
Syntax definitions are valid wherever definitions are. They have the following form:
define-syntax
keyword
transformer-spec
The
keyword
is a identifier, andtransformer-spec
is a function that maps syntax forms to syntax forms, usually an instance ofsyntax-rules
. If thedefine-syntax
occurs at the top level, then the top-level syntactic environment is extended by binding thekeyword
to to the specified transformer, but existing references to any top-level binding forkeyword
remain unchanged. Otherwise, it is a internal syntax definition, and is local to thebody
in which it is defined.(let ((x 1) (y 2)) (define-syntax swap! (syntax-rules () ((swap! a b) (let ((tmp a)) (set! a b) (set! b tmp))))) (swap! x y) (list x y)) ⇒ (2 1)Macros can expand into definitions in any context that permits them. However, it is an error for a definition to define an identifier whose binding has to be known in order to determine the meaning of the definitoion itself, or of any predecing definiton that belongs to the same group of internal definitions.
define-syntax-case
name
(
literals
)
(
pattern
expr
)
...
A convenience macro to make it easy to define
syntax-case
-style macros. Defines a macro with the givenname
and list ofliterals
. Eachpattern
has the form of asyntax-rules
-style pattern, and it is matched against the macro invocation syntax form. When a match is found, the correspondingexpr
is evaluated. It must evaluate to a syntax form, which replaces the macro invocation.(define-syntax-case macro-name (literals) (pat1 result1) (pat2 result2))is equivalent to:
(define-syntax macro-name (lambda (form) (syntax-case form (literals) (pat1 result1) (pat2 result2))))
define-macro
(
name
lambda-list
)
form
...
This form is deprecated. Functionally equivalent to
defmacro
.
defmacro
name
lambda-list
form
...
This form is deprecated. Instead of
(defmacro (name
...) (let ... `(... ,exp
...)))you should probably do:
(define-syntax-casename
() ((_ ...) (let #`(... #,exp
...))))and instead of
(defmacro (name
...var
...) `(...var
...))you should probably do:
(define-syntax-casename
() ((_ ...var
...) #`(...var
...))Defines an old-style macro a la Common Lisp, and installs
(lambda
as the expansion function forlambda-list
form
...)name
. When the translator sees an application ofname
, the expansion function is called with the rest of the application as the actual arguments. The resulting object must be a Scheme source form that is futher processed (it may be repeatedly macro-expanded).
Returns a new (interned) symbol each time it is called. The symbol names are implementation-dependent. (This is not directly macro-related, but is often used in conjunction with
defmacro
to get a fresh unique identifier.)
The result of evaluating
form
is treated as a Scheme expression, syntax-expanded to internal form, and then converted back to (roughly) the equivalent expanded Scheme form.This can be useful for debugging macros.
To access this function, you must first
(require 'syntax-utils)
.(require 'syntax-utils) (expand '(cond ((> x y) 0) (else 1))) ⇒ (if (> x y) 0 1)
Return
#t
ifobj
is an identifier, i.e., a syntax object representing an identifier, and#f
otherwise.The
identifier?
procedure is often used within a fender to verify that certain subforms of an input form are identifiers, as in the definition ofrec
, which creates self–contained recursive objects, below.(define-syntax rec (lambda (x) (syntax-case x () ((_ x e) (identifier? #'x) #'(letrec ((x e)) x))))) (map (rec fact (lambda (n) (if (= n 0) 1 (* n (fact (- n 1)))))) '(1 2 3 4 5)) ⇒ (1 2 6 24 120) (rec 5 (lambda (x) x)) ⇒ exception
The procedures bound-identifier=?
and free-identifier=?
each take two identifier arguments and return #t
if their arguments
are equivalent and #f
otherwise. These predicates are used to
compare identifiers according to their intended use as free
references or bound identifiers in a given context.
id
_1 andid
_2 must be identifiers.The procedure
bound-identifier=?
returns#t
if a binding for one would capture a reference to the other in the output of the transformer, assuming that the reference appears within the scope of the binding, and#f
otherwise.In general, two identifiers are
bound-identifier=?
only if both are present in the original program or both are introduced by the same transformer application (perhaps implicitly, seedatum->syntax
).The
bound-identifier=?
procedure can be used for detecting duplicate identifiers in a binding construct or for other preprocessing of a binding construct that requires detecting instances of the bound identifiers.
id
_1 andid
_2 must be identifiers.The
free-identifier=?
procedure returns#t
if and only if the two identifiers would resolve to the same binding if both were to appear in the output of a transformer outside of any bindings inserted by the transformer. (If neither of two like–named identifiers resolves to a binding, i.e., both are unbound, they are considered to resolve to the same binding.)Operationally, two identifiers are considered equivalent by
free-identifier=?
if and only the topmost matching substitution for each maps to the same binding or the identifiers have the same name and no matching substitution.The
syntax-case
andsyntax-rules
forms internally usefree-identifier=?
to compare identifiers listed in the literals list against input identifiers.(let ((fred 17)) (define-syntax a (lambda (x) (syntax-case x () ((_ id) #'(b id fred))))) (define-syntax b (lambda (x) (syntax-case x () ((_ id1 id2) #`(list #,(free-identifier=? #'id1 #'id2) #,(bound-identifier=? #'id1 #'id2)))))) (a fred)) ⇒ (#t #f)The following definition of unnamed
let
usesbound-identifier=?
to detect duplicate identifiers.(define-syntax let (lambda (x) (define unique-ids? (lambda (ls) (or (null? ls) (and (let notmem? ((x (car ls)) (ls (cdr ls))) (or (null? ls) (and (not (bound-identifier=? x (car ls))) (notmem? x (cdr ls))))) (unique-ids? (cdr ls)))))) (syntax-case x () ((_ ((i v) ...) e1 e2 ...) (unique-ids? #'(i ...)) #'((lambda (i ...) e1 e2 ...) v ...)))))The argument
#'(i ...)
tounique-ids?
is guaranteed to be a list by the rules given in the description ofsyntax
above.With this definition of
let
:(let ((a 3) (a 4)) (+ a a)) ⇒ syntax errorHowever,
(let-syntax ((dolet (lambda (x) (syntax-case x () ((_ b) #'(let ((a 3) (b 4)) (+ a b))))))) (dolet a)) ⇒ 7since the identifier
a
introduced bydolet
and the identifiera
extracted from the input form are notbound-identifier=?
.Rather than including
else
in the literals list as before, this version ofcase
explicitly tests forelse
usingfree-identifier=?
.(define-syntax case (lambda (x) (syntax-case x () ((_ e0 ((k ...) e1 e2 ...) ... (else-key else-e1 else-e2 ...)) (and (identifier? #'else-key) (free-identifier=? #'else-key #'else)) #'(let ((t e0)) (cond ((memv t '(k ...)) e1 e2 ...) ... (else else-e1 else-e2 ...)))) ((_ e0 ((ka ...) e1a e2a ...) ((kb ...) e1b e2b ...) ...) #'(let ((t e0)) (cond ((memv t '(ka ...)) e1a e2a ...) ((memv t '(kb ...)) e1b e2b ...) ...))))))With either definition of
case
,else
is not recognized as an auxiliary keyword if an enclosing lexical binding forelse
exists. For example,(let ((else#f
)) (case 0 (else (write "oops")))) ⇒ syntax errorsince
else
is bound lexically and is therefore not the sameelse
that appears in the definition ofcase
.
syntax-object->datum
syntax-object
Strip all syntactic information from a syntax object and returns the corresponding Scheme datum.
Identifiers stripped in this manner are converted to their symbolic names, which can then be compared with
eq?
. Thus, a predicatesymbolic-identifier=?
might be defined as follows.(define symbolic-identifier=? (lambda (x y) (eq? (syntax->datum x) (syntax->datum y))))
datum->syntax
template-id
datum
[srcloc
]
datum->syntax-object
template-id
datum
template-id
must be a template identifier anddatum
should be a datum value.The
datum->syntax
procedure returns a syntax-object representation ofdatum
that contains the same contextual information astemplate-id
, with the effect that the syntax object behaves as if it were introduced into the code whentemplate-id
was introduced.If
srcloc
is specified (and neither#f
or#!null
), it specifies the file position (including line number) for the result. In that case it should be a syntax object representing a list; otherwise it is currently ignored, though future extensions may support other ways of specifying the position.The
datum->syntax
procedure allows a transformer to “bend” lexical scoping rules by creating implicit identifiers that behave as if they were present in the input form, thus permitting the definition of macros that introduce visible bindings for or references to identifiers that do not appear explicitly in the input form. For example, the following defines aloop
expression that uses this controlled form of identifier capture to bind the variablebreak
to an escape procedure within the loop body. (The derivedwith-syntax
form is likelet
but binds pattern variables.)(define-syntax loop (lambda (x) (syntax-case x () ((k e ...) (with-syntax ((break (datum->syntax #'k 'break))) #'(call-with-current-continuation (lambda (break) (let f () e ... (f))))))))) (let ((n 3) (ls '())) (loop (if (= n 0) (break ls)) (set! ls (cons 'a ls)) (set! n (- n 1)))) ⇒ (a a a)Were
loop
to be defined as:(define-syntax loop (lambda (x) (syntax-case x () ((_ e ...) #'(call-with-current-continuation (lambda (break) (let f () e ... (f))))))))the variable
break
would not be visible ine ...
.The datum argument
datum
may also represent an arbitrary Scheme form, as demonstrated by the following definition ofinclude
.(define-syntax include (lambda (x) (define read-file (lambda (fn k) (let ((p (open-file-input-port fn))) (let f ((x (get-datum p))) (if (eof-object? x) (begin (close-port p) '()) (cons (datum->syntax k x) (f (get-datum p)))))))) (syntax-case x () ((k filename) (let ((fn (syntax->datum #'filename))) (with-syntax (((exp ...) (read-file fn #'k))) #'(begin exp ...)))))))
(include "filename")
expands into abegin
expression containing the forms found in the file named by"filename"
. For example, if the fileflib.ss
contains:(define f (lambda (x) (g (* x x))))and the file
glib.ss
contains:(define g (lambda (x) (+ x x)))the expression:
(let () (include "flib.ss") (include "glib.ss") (f 5))evaluates to
50
.The definition of
include
usesdatum->syntax
to convert the objects read from the file into syntax objects in the proper lexical context, so that identifier references and definitions within those expressions are scoped where theinclude
form appears.Using
datum->syntax
, it is even possible to break hygiene entirely and write macros in the style of old Lisp macros. Thelisp-transformer
procedure defined below creates a transformer that converts its input into a datum, calls the programmer's procedure on this datum, and converts the result back into a syntax object scoped where the original macro use appeared.(define lisp-transformer (lambda (p) (lambda (x) (syntax-case x () ((kwd . rest) (datum->syntax #'kwd (p (syntax->datum x))))))))
with-syntax
((pattern
)expression
…
)
body
The
with-syntax
form is used to bind pattern variables, just aslet
is used to bind variables. This allows a transformer to construct its output in separate pieces, then put the pieces together.Each
pattern
is identical in form to asyntax-case
pattern. The value of eachexpression
is computed and destructured according to the correspondingpattern
, and pattern variables within thepattern
are bound as withsyntax-case
to the corresponding portions of the value withinbody
.The
with-syntax
form may be defined in terms ofsyntax-case
as follows.(define-syntax with-syntax (lambda (x) (syntax-case x () ((_ ((p e0) ...) e1 e2 ...) (syntax (syntax-case (list e0 ...) () ((p ...) (let () e1 e2 ...))))))))The following definition of
cond
demonstrates the use ofwith-syntax
to support transformers that employ recursion internally to construct their output. It handles allcond
clause variations and takes care to produce one-armedif
expressions where appropriate.(define-syntax cond (lambda (x) (syntax-case x () ((_ c1 c2 ...) (let f ((c1 #'c1) (c2* #'(c2 ...))) (syntax-case c2* () (() (syntax-case c1 (else =>) (((else e1 e2 ...) #'(begin e1 e2 ...)) ((e0) #'e0) ((e0 => e1) #'(let ((t e0)) (if t (e1 t)))) ((e0 e1 e2 ...) #'(if e0 (begin e1 e2 ...))))) ((c2 c3 ...) (with-syntax ((rest (f #'c2 #'(c3 ...)))) (syntax-case c1 (=>) ((e0) #'(let ((t e0)) (if t t rest))) ((e0 => e1) #'(let ((t e0)) (if t (e1 t) rest))) ((e0 e1 e2 ...) #'(if e0 (begin e1 e2 ...) rest)))))))))))
The
quasisyntax
form is similar tosyntax
, but it allows parts of the quoted text to be evaluated, in a manner similar to the operation ofquasiquote
.Within a
quasisyntax
template
, subforms ofunsyntax
andunsyntax-splicing
forms are evaluated, and everything else is treated as ordinary template material, as withsyntax
.The value of each
unsyntax
subform is inserted into the output in place of theunsyntax
form, while the value of eachunsyntax-splicing
subform is spliced into the surrounding list or vector structure. Uses ofunsyntax
andunsyntax-splicing
are valid only withinquasisyntax
expressions.A
quasisyntax
expression may be nested, with eachquasisyntax
introducing a new level of syntax quotation and eachunsyntax
orunsyntax-splicing
taking away a level of quotation. An expression nested within nquasisyntax
expressions must be within n unsyntax orunsyntax-splicing
expressions to be evaluated.As noted in
abbreviation
,#`
is equivalent totemplate
(quasisyntax
,template
)#,
is equivalent totemplate
(unsyntax
, andtemplate
)#,@
is equivalent totemplate
(unsyntax-splicing
. Note that for backwards compatibility, you should only usetemplate
)#,
inside a literaltemplate
#`
form.template
The
quasisyntax
keyword can be used in place ofwith-syntax
in many cases. For example, the definition ofcase
shown under the description ofwith-syntax
above can be rewritten usingquasisyntax
as follows.(define-syntax case (lambda (x) (syntax-case x () ((_ e c1 c2 ...) #`(let ((t e)) #,(let f ((c1 #'c1) (cmore #'(c2 ...))) (if (null? cmore) (syntax-case c1 (else) ((else e1 e2 ...) #'(begin e1 e2 ...)) (((k ...) e1 e2 ...) #'(if (memv t '(k ...)) (begin e1 e2 ...))]) (syntax-case c1 () (((k ...) e1 e2 ...) #`(if (memv t '(k ...)) (begin e1 e2 ...) #,(f (car cmore) (cdr cmore))))))))))))
Note: Any
syntax-rules
form can be expressed withsyntax-case
by making thelambda
expression andsyntax
expressions explicit, andsyntax-rules
may be defined in terms ofsyntax-case
as follows.(define-syntax syntax-rules (lambda (x) (syntax-case x () ((_ (lit ...) ((k . p) t) ...) (for-all identifier? #'(lit ... k ...)) #'(lambda (x) (syntax-case x (lit ...) ((_ . p) #'t) ...))))))
Traditional Scheme has only a few kinds of values, and thus only a few builtin kinds of literals. Modern Scheme allows defining new types, so it is desirable to have a mechanism for defining literal values for the new types.
Consider the URI
type.
You can create a new instance of a URI
using a
constructor function:
(URI "http://example.com/")
This isn't too bad, though the double-quote characters are an ugly distraction. However, if you need to construct the string it gets messy:
(URI (string-append base-uri "icon.png"))
Instead use can write:
&URI{http://example.com/}
or:
&URI{&[base-uri]icon.png}
This syntax is translated by the Scheme reader to the more familiar but more verbose equivalent forms:
($construct$:URI "http://example.com/") ($construct$:URI $<<$ base-uri $>>$ "icon.png")
So for this to work there just needs to be a definition
of $construct$:URI
, usually a macro.
Normal scope rules apply; typically you'd define $construct$:URI
in
a module.
The names $<<$
and $>>$
are bound to unique zero-length strings.
They are used to allow the implementation of $construct$:URI
to determine which arguments are literal and which come from
escaped expressions.
If you want to define your own $construct$:
,
or to read motivation and details, see the
SRFI 108 specification.
tag
extended-datum-literal
::=
&
cname
@lbracechar{}
[initial-ignored
] named-literal-part
^* @rbracechar{}
| &
cname
[
expression
^* ]@lbracechar{}
[initial-ignored
] named-literal-part
^* @rbracechar{}
cname
::=
identifier
named-literal-part
::=
any character except &
, @lbracechar{}
or @rbracechar{}
| @lbracechar{}
named-literal-part
^+ @rbracechar{}
| char-ref
| entity-ref
| special-escape
| enclosed-part
| extended-datum-literal