1 Introduction

Haskell is a general purpose, purely functional programming language incorporating many recent innovations in programming language design. Haskell provides higher-order functions, non-strict semantics, static polymorphic typing, user-defined algebraic datatypes, pattern-matching, list comprehensions, a module system, a monadic I/O system, and a rich set of primitive datatypes, including lists, arrays, arbitrary and fixed precision integers, and floating-point numbers. Haskell is both the culmination and solidification of many years of research on lazy functional languages.

This report defines the syntax for Haskell programs and an informal abstract semantics for the meaning of such programs. We leave as implementation dependent the ways in which Haskell programs are to be manipulated, interpreted, compiled, etc. This includes such issues as the nature of programming environments and the error messages returned for undefined programs (i.e. programs that formally evaluate to bottom).

1.1 Program Structure

In this section, we describe the abstract syntactic and semantic structure of Haskell, as well as how it relates to the organization of the rest of the report.

  1. At the topmost level a Haskell program is a set of modules, described in Section 5. Modules provide a way to control namespaces and to re-use software in large programs.

  2. The top level of a module consists of a collection of declarations, of which there are several kinds, all described in Section 4. Declarations define things such as ordinary values, datatypes, type classes, and fixity information.

  3. At the next lower level are expressions, described in Section 3. An expression denotes a value and has a static type; expressions are at the heart of Haskell programming "in the small."

  4. At the bottom level is Haskell's lexical structure, defined in Section 2. The lexical structure captures the concrete representation of Haskell programs in text files.

This report proceeds bottom-up with respect to Haskell's syntactic structure.

The sections not mentioned above are Section 6, which describes the standard built-in datatypes and classes in Haskell, and Section 7, which discusses the I/O facility in Haskell (i.e. how Haskell programs communicate with the outside world). Also, there are several appendices describing the Prelude, the concrete syntax, literate programming, the specification of derived instances, and pragmas supported by most Haskell compilers.

Examples of Haskell program fragments in running text are given in typewriter font:


 let x = 1
     z = x+y
 in  z+1
"Holes" in program fragments representing arbitrary pieces of Haskell code are written in italics, as in if e1 then e2 else e3. Generally the italicized names are mnemonic, such as e for expressions, d for declarations, t for types, etc.

1.2 The Haskell Kernel

Haskell has adopted many of the convenient syntactic structures that have become popular in functional programming. In all cases, their formal semantics can be given via translation into a proper subset of Haskell called the Haskell kernel. It is essentially a slightly sugared variant of the lambda calculus with a straightforward denotational semantics. The translation of each syntactic structure into the kernel is given as the syntax is introduced. This modular design facilitates reasoning about Haskell programs and provides useful guidelines for implementors of the language.

1.3 Values and Types

An expression evaluates to a value and has a static type. Values and types are not mixed in Haskell. However, the type system allows user-defined datatypes of various sorts, and permits not only parametric polymorphism (using a traditional Hindley-Milner type structure) but also ad hoc polymorphism, or overloading (using type classes).

Errors in Haskell are semantically equivalent to bottom. Technically, they are not distinguishable from nontermination, so the language includes no mechanism for detecting or acting upon errors. Of course, implementations will probably try to provide useful information about errors.

1.4 Namespaces

Haskell provides a lexical syntax for infix operators (either functions or constructors). To emphasize that operators are bound to the same things as identifiers, and to allow the two to be used interchangeably, there is a simple way to convert between the two: any function or constructor identifier may be converted into an operator by enclosing it in backquotes, and any operator may be converted into an identifier by enclosing it in parentheses. For example, x + y is equivalent to (+) x y, and f x y is the same as x `f` y. These lexical matters are discussed further in Section 2.

There are six kinds of names in Haskell: those for variables and constructors denote values; those for type variables, type constructors, and type classes refer to entities related to the type system; and module names refer to modules. There are three constraints on naming:

  1. Names for variables and type variables are identifiers beginning with lowercase letters; the other four kinds of names are identifiers beginning with uppercase letters.
  2. Constructor operators are operators beginning with ":"; variable operators are operators not beginning with ":".
  3. An identifier must not be used as the name of a type constructor and a class in the same scope.
These are the only constraints; for example, Int may simultaneously be the name of a module, class, and constructor within a single scope.

1.5 Layout

In the syntax given in the rest of the report, declaration lists are always preceded by the keyword where, let, do, or of, and are enclosed within curly braces ({ }) with the individual declarations separated by semicolons (;). For example, the syntax of a let expression is:

let { decl1 ; decl2 ; ... ; decln [;] } in exp

Haskell permits the omission of the braces and semicolons by using layout to convey the same information. This allows both layout-sensitive and -insensitive styles of coding, which can be freely mixed within one program. Because layout is not required, Haskell programs can be straightforwardly produced by other programs.

The layout (or "off-side") rule takes effect whenever the open brace is omitted after the keyword where, let, do, or of. When this happens, the indentation of the next lexeme (whether or not on a new line) is remembered and the omitted open brace is inserted (the whitespace preceding the lexeme may include comments). For each subsequent line, if it contains only whitespace or is indented more, then the previous item is continued (nothing is inserted); if it is indented the same amount, then a new item begins (a semicolon is inserted); and if it is indented less, then the declaration list ends (a close brace is inserted). A close brace is also inserted whenever the syntactic category containing the declaration list ends; that is, if an illegal lexeme is encountered at a point where a close brace would be legal, a close brace is inserted. The layout rule matches only those open braces that it has inserted; an explicit open brace must be matched by an explicit close brace. Within these explicit open braces, no layout processing is performed for constructs outside the braces, even if a line is indented to the left of an earlier implicit open brace.

Given these rules, a single newline may actually terminate several declaration lists. Also, these rules permit:


f x = let a = 1; b = 2 
          g y = exp2
       in exp1
making a, b and g all part of the same declaration list.

To facilitate the use of layout at the top level of a module (an implementation may allow several modules may reside in one file), the keyword module and the end-of-file token are assumed to occur in column 0 (whereas normally the first column is 1). Otherwise, all top-level declarations would have to be indented.

See also Section B.3.

As an example, Figure 1 shows a (somewhat contrived) module and Figure 2 shows the result of applying the layout rule to it. Note in particular: (a) the line beginning }};pop, where the termination of the previous line invokes three applications of the layout rule, corresponding to the depth (3) of the nested where clauses, (b) the close braces in the where clause nested within the tuple and case expression, inserted because the end of the tuple was detected, and (c) the close brace at the very end, inserted because of the column 0 indentation of the end-of-file token.


module AStack( Stack, push, pop, top, size ) where
data Stack a = Empty 
             | MkStack a (Stack a)

push :: a -> Stack a -> Stack a
push x s = MkStack x s

size :: Stack a -> Integer
size s = length (stkToLst s)  where
           stkToLst  Empty         = []
           stkToLst (MkStack x s)  = x:xs where xs = stkToLst s

pop :: Stack a -> (a, Stack a)
pop (MkStack x s)
  = (x, case s of r -> i r where i x = x) -- (pop Empty) is an error

top :: Stack a -> a
top (MkStack x s) = x                     -- (top Empty) is an error

Figure 1

A sample program


module AStack( Stack, push, pop, top, size ) where
{data Stack a = Empty 
             | MkStack a (Stack a)

;push :: a -> Stack a -> Stack a
;push x s = MkStack x s

;size :: Stack a -> Integer
;size s = length (stkToLst s)  where
           {stkToLst  Empty         = []
           ;stkToLst (MkStack x s)  = x:xs where {xs = stkToLst s

}};pop :: Stack a -> (a, Stack a)
;pop (MkStack x s)
  = (x, case s of {r -> i r where {i x = x}}) -- (pop Empty) is an error

;top :: Stack a -> a
;top (MkStack x s) = x                        -- (top Empty) is an error
}

Figure 2

Sample program with layout expanded

When comparing indentations for standard Haskell programs, a fixed-width font with this tab convention is assumed: tab stops are 8 characters apart (with the first tab stop in column 9), and a tab character causes the insertion of enough spaces (always >= 1) to align the current position with the next tab stop. Particular implementations may alter this rule to accommodate variable-width fonts and alternate tab conventions, but standard Haskell programs must observe this rule.

Next section: Lexical Structure
The Haskell 1.3 Report