Haskell is a functional programming language named after Haskell Curry, the developer of the lambda calculus upon which the language is based. It was standardised in early 1999 with the introduction of Haskell 98, mostly solving the problem which had previously arisen of multiple and incompatible implementations of Haskell.
The lineage of Haskell can be traced back to the 1920s, when Haskell Curry and Alonzo Church were developing the lambda calculus, which formed a partial basis of Lisp in the 1960s. Lisp is not a pure functional language as it still retains variable assignments, but it is very well-known and powerful, and is traditionally regarded as a good language for developing Artificial Intelligence applications.
In the late 1960s, the first purely functional language was developed by Peter Landin and called ISWIM. This was followed in 1978 by John Backus' FP, with Robin Milner's ML arriving at roughly the same time. ML is regarded as the first modern functional language, with type inference and polymorphic types, both important in Haskell. In the early 1980s, David Turner developed a so-called 'lazy functional language' called Miranda. This was a commercial product.
Finally, in 1988, Haskell was developed by a committee of researchers. Haskell was then standardised in 1999 as Haskell 98, and given a logo suspiciously like that used for Nintendo's Nintendo 64 games console.
Haskell is one of the very few pure functional programming languages. Functional Programming is a paradigm of program design (others include Procedural Programming and Object-Oriented Programming). It focusses on functions as the basic unit of a program. A Haskell program is essentially a function, which in turn calls other functions, and so on down to functions which call bottom-level compiler or interpreter functions only.
Haskell is a lazy functional language, in that it only evaluates what it needs to - unnecessary evaluation is never performed. This of course saves memory and processor time, but also allows certain functions like repeat to return infinite lists without causing problems, because only the part of the list that is actually used will ever be produced.
What can Haskell do?
Many things. As well as performing mathematical and other manipulations on lists (which are supported at the grassroots level by the language), Haskell can be used for graphics, sound, text processing and more or less anything else you can think of. Microsoft is funding a research project into using Haskell for graphics and animation. If you're using Haskell on a Linux or similar UNIX system, GTK-Haskell allows you to develop graphical user interfaces for your Haskell programs using the GTK+ libraries.
Developing Haskell Programs
There are numerous tools available for developing programs in Haskell, but the most popular is a system called Hugs. Hugs is a text-based interactive system that allows evaluation of Haskell expressions at the command line. It also supports loading of a Haskell script file, which contains one or more function and/or type definitions. Once a script is loaded, the functions in it can be executed - this is how Hugs deals with loading programs.
Because Hugs is an interpreter, it runs Haskell considerably slower than the language is capable of running. However, it is a very useful environment for developing and testing Haskell programs due to its interactive nature. Hugs is available for most popular operating systems including GNU/Linux, Apple Mac OS and Microsoft Windows. It should also compile on most UNIX-like operating systems, as the source code is available for download.
Haskell compilers are also available. The most popular of these is probably the Glasgow Haskell Compiler (GHC), which is itself written in Haskell! Code compiled by GHC runs much faster than code executed within Hugs. GHC lends itself well to large-scale Haskell development, and for tasks such as graphics and animation.
Another Haskell compiler worth mentioning is nhc98, which produces extremely compact code.
Basic Types in Haskell
Haskell has a number of basic types built into the language, including:
- Int (fixed precision integer)
- Integer (arbitrary precision integer)
- Float (floating-point value)
- Bool (Boolean value - either True or False)
- Char (Unicode character).
In addition, Haskell supports lists and tuple types. Lists are, as the name suggests, lists of values. These values must be all the same type, but can be any type, including other lists. Lists can be of any length, and are a very important feature of Haskell - virtually nothing is possible without using at least one list. The order of elements in a list is important, and elements do not have to be unique. Lists are delimited by [ and ], with elements separated by commas, thus [1,2,3,4,5] is a valid list of type [Int] (a list of items which are of the type Int).
Tuples are essentially a way to group variables together. They are declared to be a specific size and to contain specific types in specific places. For example, you could declare a tuple type (Int, Int) which would contain two integers, or a tuple (Int, Float, [Bool]) which would contain an integer, a floating-point number and a list of Boolean values. Tuple types are used for many things, but one important one is to return multiple values from functions - you just put them in a tuple and Haskell thinks you're returning a single value, which is fine. Lists of tuples are also a very effective way to store data such as key/value pairs, as it is fairly trivial to write functions which retrieve data from such lists, provided that the keys are unique.
As the basic unit of any Haskell program, functions deserve a good look. Functions in Haskell can be large and complex, but good style is considered to be to write lots of small functions which call each other rather than large and complex functions. The latter case can in many cases be impossible, as Haskell contains no looping constructs like those seen in languages like C++ or Pascal - looping in Haskell is done using recursive functions. Sometimes, though, when it might seem best to write multiple functions it is actually possible to use lambda functions to produce a more efficient result.
Basic Function Definitions
A function is defined in two parts: firstly the type, then the implementation. The type is always one line - the implementation is sometimes two or more, especially with recursive functions and programs using IO functions. A typical definition for a function that takes one parameter might look like this:
even :: Int -> Bool
even n = if n `mod` 2 == 0 then True else False
The first line of the definition gives the type - the :: operator means "has type". The -> indicates that what is input on the left hand side is output on the right. It should be noted that Haskell functions can take functions as arguments, and return other functions as results. This can be extremely useful, as we will see. The second line of the definition uses an if...then...else statement to check if the input parameter is even or not. It is not the intent of this Entry to teach Haskell, so other than noting that in Haskell the else part is compulsory in all situations, no more shall be mentioned of the implementation.
While it is possible to use tuple types to pass multiple parameters to a function:
add :: (Int, Int) -> Int
add (x,y) = x + y
It is not generally considered to be a good way to program. Instead, functions taking multiple arguments should be defined in Curried form (named, as is the language, after Haskell Curry). Curried functions basically take their arguments one at a time, creating intermediate functions along the way - and hence the usefulness of Curried functions. It is possible to pass too few parameters to a Curried function, in which case the result is an intermediate function that can be given a name, and passed the remaining parameters at leisure. Among other benefits, playing with functions in this way can help make a program much more readable.
A Curried function is defined like this:
add :: Int -> Int -> Int
add x y = x + y
Note that the implementation is the same - only the type has changed. Therefore this version of the add function performs an identical function to the first. However, because the first requires a tuple to be passed to it, they are called in different ways1:
> add (1,2)
For the first, and:
> add 1 2
For the second. If nothing else, the Curried form requires less typing. However, using the features of Curried functions we can define a new function called add2 that adds two to every number passed to it. While it would be easy to define this in terms of + or even the non-Curried add, defining it in Curried form is a lot more elegant, and for more complex functions it is a lot more practical.
add2 :: Int -> Int
add2 = add 2
This new function takes one integer parameter and returns that number incremented by 2. This is a trivial example, but there are more complex applications where it is very, very useful.
Probably the most important programming technique in Haskell is the use of recursive functions. These are functions that are defined in terms of themselves, usually ones which process each element of a list to produce a final result. For example:
incall :: [Int] -> [Int]
incall  = 
incall (x:xs) = (x + 1): incall xs
Which takes a list of integers and adds one to each element. It is defined recursively in two parts - the first (after the type definition) is the base case - i.e. what happens when we process (in this case) the empty list. For the incall function we simply return the empty list and exit. The second implementation line defines the recursive case, which takes one parameter, a list. The list parameter is placed as (x:xs) instead of just xs because the : operator takes the first element off the list, calling it x, removing the need to use the head and tail functions within the function definition itself. The way the function works is simple - it adds one to the head of the list, then concatenates it to the front of the list produced by calling incall on the remainder of the list.
Interestingly enough, this function can also be defined using list comprehension (which is reduced to recursion in the compiler anyway, but can be easier to read).
incall :: [Int] -> [Int]
incall xs = [x + 1 | x <- xs]
This notation may be familiar to those who have studied set theory and encountered set comprehensions. In this example, it constructs the list of all x+1 such that x is taken from the list xs.
The same function can also be defined in terms of the higher-order function foldr, which encapsulates simple recursion. To do this we need to use what is called a lambda function, which is basically a temporary, unnamed function that's created for a specific purpose - frequently for defining functions using things like foldr.
incall :: [Int] -> [Int]
incall = foldr (\x y -> x+1:y) 
Lambda functions are, as in the example above, normally used in conjunction with higher-order functions to avoid declaring a new function that would be used only once (and which would be less efficient). They are declared using \, followed by parameters, which are not typed - the Haskell system calculates which types are permissible by looking at the implementation of the function. The -> operator indicates the start of the function itself. Lambda functions can be nested, producing an effect much like Currying, but this is not usually necessary.
Type Classes and Polymorphic Types
Most types in Haskell (certainly all the predefined types) belong to one or more type classes which group types together based on which operations can be performed on them. For example, the type class Num contains types which represent numbers of some sort, so it includes Int and Float. The type class Eq contains all the types which can be tested for equality. These type classes, combined with polymorphic types, are fantastically useful for defining functions at the most abstract level possible.
C++ programmers using later revisions of the language will probably have come across templates, whereby a class is defined in terms of an unknown data type, which is filled in at compile time, and which allows classes to be defined to work with multiple types without writing multiple classes. Haskell's polymorphic types allow a vaguely similar thing to occur, in that functions do not have to be declared with a particular type, but can have the types filled in at run time. This works by using what might be considered to be variables for the types in the function definition. For example, the definition of the head function, which returns the first element of a list, is as follows:
head :: [a] -> a
head (x:_) = x
As you can see, the type definition of the function declares it as taking a list of type a and returning a value of type a. This means that it can take any list whatsoever, but it will always return a value of the same type as that which is inside the list. Therefore head will always work on all lists, even those populated with user-defined data types.
Type classes can be used to restrict the domain of types allowed in a function that uses polymorphic types. This is especially useful with functions that test for equality, as the Eq class can be used to restrict accepted types to those which can be tested for equality. Below is the definition of a function which takes a list of any type that can be checked for equality, and recursively checks the first element to see if it's equal to any other in the list. If it is it is discarded and the function calls itself on the rest of the list, if not the function still calls itself on the rest of the list but concatenates the first element to the result of that function. In this way it tests each element for equality with the rest of the list, and all duplicates are removed. It works on any list containing a type that is a member of Eq.
removedups :: Eq a => [a] -> [a]
removedups  = 
removedups (x:xs) = if any (==x) xs then removedups xs else x:removedups xs
The final thing to look at in this Entry is the realm of user-defined data types in Haskell, and specifically recursive data types, which have many uses.
Datatypes in Haskell can be defined using either the type or the data directive. type basically creates an alias, so you could define
type IntPairs = [(Int,Int)]
Which would then cause Haskell to treat every occurrence of IntPairs as [(Int,Int)]. This is quite simple and useful for making programs more readable and less prone to errors.
The data directive is more powerful, and allows definition of new types with new values. For example, if you were to define the built-in type Bool, it would look like this:
data Bool = True | False
Which means that Bool takes either True or False as values. It should be noted that names used for data type values or names have no meaning to the system, and only gain meaning through the functions which act upon them (some of which are, admittedly, built into the system).
Anyone who's ever studied context-free languages might recognise the syntax used above as being similar to a context-free grammar. This is because it is, and Haskell also supports some of the more complex ways of declaring a grammar, including the use of parameters. You could define a data type to represent basic geometric shapes, for example, which might look like this:
data Shape = Circle Float | Square Float
This would create a type Shape which can be either a circle, having a single Float parameter (which might be the radius or the diameter), or a square, which has a single Float parameter to specify the side length.
This is all very well and useful, and slightly reminiscent of inheritance trees in object-oriented languages (perhaps with an abstract ancestor class called Shape and two descendants, Circle and Square). However, this is not the real power of Haskell's data types. That lies in recursive data types.
Recursive data types, like recursive functions, are data types which are defined in terms of themselves. The most obvious application of this is for representing context-free languages in your program, as you can convert a context-free grammar almost directly into Haskell.
data Prop = Const Bool | Var Char | And Prop Prop | Or Prop Prop
The code above defines a data type called Prop, which stores basic logical propositions consisting of Boolean constants, variables represented by characters, the and operator (conjunction) and the or operator (disjunction). Note how And and Or are defined taking Prop as their parameters - this allows propositions to be nested in a tree-like structure. Eventually it would descend to a Const or Var directive which would terminate the tree, thus preventing infinite recursion which would be impossible to implement.
Through the use of a suitable set of functions, and extension of the data type itself, the Prop data type could be used for evaluating logical propositions in some way based on the values of their variables. It would be conceivably possible to use a recursive data type to represent an entire Java program, as the language specification is published as a context-free grammar. This would be extremely complicated, but it is one of the reasons Haskell can be used to write parsers and compilers without too much difficulty.
Learning more about Haskell
If you're interested in learning more about Haskell, a good place to start is haskell.org, which contains resources, compilers, interpreters and various other useful things for Haskell programmers.
The author is indebted to Dr. Graham Hutton of the University of Nottingham, UK, for writing and lecturing the Functional Programming module in the School of Computer Science and IT, which is where the knowledge to write this Entry comes from.