Introduction to FunctionalProgrammingJohn [email protected] December 1997


PrefaceThese are the lecture notes accompanying the course Introduction to FunctionalProgramming, which I taught at Cambridge University in the academic year1996/7.This course has mainly been taught in previous years by Mike Gordon. I haveretained the basic structure of his course, with a blend of theory and practice,and have borrowed heavily in what follows from his own lecture notes, availablein book form as Part II of (Gordon 1988). I have also been influenced by thosewho have taught related courses here, such as Andy Gordon and Larry Paulsonand, in the chapter on types, by Andy Pitts’s course on the subject.The large chapter on examples is not directly examinable, though studying itshould improve the reader’s grasp of the early parts and give a better idea abouthow ML is actually used.Most chapters include some exercises, either invented specially for this courseor taken from various sources. They are normally intended to require a littlethought, rather than just being routine drill. Those I consider fairly difficult aremarked with a (*).These notes have not yet been tested extensively and no doubt contain variouserrors and obscurities. I would be grateful for constructive criticism from anyreaders.John Harrison ([email protected]).ii

Plan of the lecturesThis chapter indicates roughly how the material is to be distributed over a courseof twelve lectures, each of slightly less than one hour.1. Introduction and Overview Functional and imperative programming:contrast, pros and cons. General structure of the course: how lambda calculus turns out to be a general programming language. Lambda notation:how it clarifies variable binding and provides a general analysis of mathematical notation. Currying. Russell’s paradox.2. Lambda calculus as a formal system Free and bound variables. Substitution. Conversion rules. Lambda equality. Extensionality. Reductionand reduction strategies. The Church-Rosser theorem: statement and consequences. Combinators.3. Lambda calculus as a programming language Computability background; Turing completeness (no proof). Representing data and basic operations: truth values, pairs and tuples, natural numbers. The predecessoroperation. Writing recursive functions: fixed point combinators. Let expressions. Lambda calculus as a declarative language.4. Types Why types? Answers from programming and logic. Simply typedlambda calculus. Church and Curry typing. Let polymorphism. Mostgeneral types and Milner’s algorithm. Strong normalization (no proof),and its negative consequences for Turing completeness. Adding a recursionoperator.5. ML ML as typed lambda calculus with eager evaluation. Details of evaluation strategy. The conditional. The ML family. Practicalities of interacting with ML. Writing functions. Bindings and declarations. Recursive andpolymorphic functions. Comparison of functions.6. Details of ML More about interaction with ML. Loading from files. Comments. Basic data types: unit, booleans, numbers and strings. Built-inoperations. Concrete syntax and infixes. More examples. Recursive typesand pattern matching. Examples: lists and recursive functions on lists.iii

iv7. Proving programs correct The correctness problem. Testing and verification. The limits of verification. Functional programs as mathematicalobjects. Examples of program proofs: exponential, GCD, append and reverse.8. Effective ML Using standard combinators. List iteration and other usefulcombinators; examples. Tail recursion and accumulators; why tail recursionis more efficient. Forcing evaluation. Minimizing consing. More efficientreversal. Use of ‘as’. Imperative features: exceptions, references, arraysand sequencing. Imperative features and types; the value restriction.9. ML examples I: symbolic differentiation Symbolic computation. Datarepresentation. Operator precedence. Association lists. Prettyprintingexpressions. Installing the printer. Differentiation. Simplification. Theproblem of the ‘right’ simplification.10. ML examples II: recursive descent parsing Grammars and the parsingproblem. Fixing ambiguity. Recursive descent. Parsers in ML. Parsercombinators; examples. Lexical analysis using the same techniques. Aparser for terms. Automating precedence parsing. Avoiding backtracking.Comparison with other techniques.11. ML examples III: exact real arithmetic Real numbers and finite representations. Real numbers as programs or functions. Our representation ofreals. Arbitrary precision integers. Injecting integers into the reals. Negation and absolute value. Addition; the importance of rounding division.Multiplication and division by integers. General multiplication. Inverse anddivision. Ordering and equality. Testing. Avoiding reevaluation throughmemo functions.12. ML examples IV: Prolog and theorem proving Prolog terms. Casesensitive lexing. Parsing and printing, including list syntax. Unification.Backtracking search. Prolog examples. Prolog-style theorem proving. Manipulating formulas; negation normal form. Basic prover; the use of continuations. Examples: Pelletier problems and whodunit.

Contents1 Introduction1.1 The merits of functional programming . . . . . . . . . . . . . . .1.2 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Lambda calculus2.1 The benefits of lambda notation . . .2.2 Russell’s paradox . . . . . . . . . . .2.3 Lambda calculus as a formal system .2.3.1 Lambda terms . . . . . . . . .2.3.2 Free and bound variables . . .2.3.3 Substitution . . . . . . . . . .2.3.4 Conversions . . . . . . . . . .2.3.5 Lambda equality . . . . . . .2.3.6 Extensionality . . . . . . . . .2.3.7 Lambda reduction . . . . . .2.3.8 Reduction strategies . . . . .2.3.9 The Church-Rosser theorem .2.4 Combinators . . . . . . . . . . . . . 4 Types4.1 Typed lambda calculus . . . . . . . . . . . . . . . . . . . . . . . .4.1.1 The stock of types . . . . . . . . . . . . . . . . . . . . . .4.1.2 Church and Curry typing . . . . . . . . . . . . . . . . . .38394041.3 Lambda calculus as a programming language3.1 Representing data in lambda calculus . . . . .3.1.1 Truth values and the conditional . . .3.1.2 Pairs and tuples . . . . . . . . . . . . .3.1.3 The natural numbers . . . . . . . . . .3.2 Recursive functions . . . . . . . . . . . . . . .3.3 Let expressions . . . . . . . . . . . . . . . . .3.4 Steps towards a real programming language .3.5 Further reading . . . . . . . . . . . . . . . . .v.

viCONTENTS4. Formal typability rules4.1.4 Type preservation . . .Polymorphism . . . . . . . . .4.2.1 Let polymorphism . .4.2.2 Most general types . .Strong normalization . . . . .5 A taste of ML5.1 Eager evaluation . . . . . . . . .5.2 Consequences of eager evaluation5.3 The ML family . . . . . . . . . .5.4 Starting up ML . . . . . . . . . .5.5 Interacting with ML . . . . . . .5.6 Bindings and declarations . . . .5.7 Polymorphic functions . . . . . .5.8 Equality of functions . . . . . . .6 Further ML6.1 Basic datatypes and operations6.2 Syntax of ML phrases . . . . .6.3 Further examples . . . . . . . .6.4 Type definitions . . . . . . . . .6.4.1 Pattern matching . . . .6.4.2 Recursive types . . . . .6.4.3 Tree structures . . . . .6.4.4 The subtlety of recursive.424344454647.505053545455565860. . . . . . . . . . . . . . . . . . . . . .types.636466687071737678objects. . . . . . . . . . . . . . . . .818384858687.9494969698101102102104.7 Proving programs correct7.1 Functional programs as mathematical7.2 Exponentiation . . . . . . . . . . . .7.3 Greatest common divisor . . . . . . .7.4 Appending . . . . . . . . . . . . . . .7.5 Reversing . . . . . . . . . . . . . . .8 Effective ML8.1 Useful combinators . . . . . . . . . . .8.2 Writing efficient code . . . . . . . . . .8.2.1 Tail recursion and accumulators8.2.2 Minimizing consing . . . . . . .8.2.3 Forcing evaluation . . . . . . .8.3 Imperative features . . . . . . . . . . .8.3.1 Exceptions . . . . . . . . . . . .8.3.2 References and arrays . . . . . .

CONTENTSvii8. . . . . . . . . . . . . . . . . . . . . . . . . . . 105Interaction with the type system . . . . . . . . . . . . . . 1069 Examples9.1 Symbolic differentiation . . . . . . . .9.1.1 First order terms . . . . . . . .9.1.2 Printing . . . . . . . . . . . . .9.1.3 Derivatives . . . . . . . . . . .9.1.4 Simplification . . . . . . . . . .9.2 Parsing . . . . . . . . . . . . . . . . . .9.2.1 Recursive descent parsing . . .9.2.2 Parser combinators . . . . . . .9.2.3 Lexical analysis . . . . . . . . .9.2.4 Parsing terms . . . . . . . . . .9.2.5 Automatic precedence parsing .9.2.6 Defects of our approach . . . .9.3 Exact real arithmetic . . . . . . . . . .9.3.1 Representation of real numbers9.3.2 Arbitrary-precision integers . .9.3.3 Basic operations . . . . . . . .9.3.4 General multiplication . . . . .9.3.5 Multiplicative inverse . . . . . .9.3.6 Ordering relations . . . . . . . .9.3.7 Caching . . . . . . . . . . . . .9.4 Prolog and theorem proving . . . . . .9.4.1 Prolog terms . . . . . . . . . .9.4.2 Lexical analysis . . . . . . . . .9.4.3 Parsing . . . . . . . . . . . . .9.4.4 Unification . . . . . . . . . . . .9.4.5 Backtracking . . . . . . . . . .9.4.6 Examples . . . . . . . . . . . .9.4.7 Theorem proving . . . . . . . 31135136138138141142142143144146147149

Chapter 1IntroductionPrograms in traditional languages, such as FORTRAN, Algol, C and Modula-3,rely heavily on modifying the values of a collection of variables, called the state. Ifwe neglect the input-output operations and the possibility that a program mightrun continuously (e.g. the controller for a manufacturing process), we can arriveat the following abstraction. Before execution, the state has some initial valueσ, representing the inputs to the program, and when the program has finished,the state has a new value σ 0 including the result(s). Moreover, during execution,each command changes the state, which has therefore proceeded through somefinite sequence of values:σ σ0 σ1 σ2 · · · σn σ 0For example in a sorting program, the state initially includes an array ofvalues, and when the program has finished, the state has been modified in such away that these values are sorted, while the intermediate states represent progresstowards this goal.The state is typically modified by assignment commands, often written inthe form v E or v : E where v is a variable and E some expression. Thesecommands can be executed in a sequential manner by writing them one after theother in the program, often separated by a semicolon. By using statements likeif and while, one can execute these commands conditionally, and repeatedly,depending on other properties of the current state. The program amounts to a setof instructions on how to perform these state changes, and therefore this style ofprogramming is often called imperative or procedural. Correspondingly, the traditional languages intended to support it are known as imperative or procedurallanguages.Functional programming represents a radical departure from this model. Essentially, a functional program is simply an expression, and execution meansevaluation of the expression.1 We can see how this might be possible, in gen1Functional programming is often called ‘applicative programming’ since the basic mecha-1

2CHAPTER 1. INTRODUCTIONeral terms, as follows. Assuming that an imperative program (as a whole) isdeterministic, i.e. the output is completely determined by the input, we can saythat the final state, or whichever fragments of it are of interest, is some function of the initial state, say σ 0 f (σ).2 In functional programming this viewis emphasized: the program is actually an expression that corresponds to themathematical function f . Functional languages support the construction of suchexpressions by allowing rather powerful functional constructs.Functional programming can be contrasted with imperative programming either in a negative or a positive sense. Negatively, functional programs do notuse variables — there is no state. Consequently, they cannot use assignments,since there is nothing to assign to. Furthermore the idea of executing commandsin sequence is meaningless, since the first command can make no difference tothe second, there being no state to mediate between them. Positively however,functional programs can use functions in much more sophisticated ways. Functions can be treated in exactly the same way as simpler objects like integers: theycan be passed to other functions as arguments and returned as results, and ingeneral calculated with. Instead of sequencing and looping, functional languagesuse recursive functions, i.e. functions that are defined in terms of themselves. Bycontrast, most traditional languages provide poor facilities in these areas. C allows some limited manipulation of functions via pointers, but does not allow oneto create new functions dynamically. FORTRAN does not even support recursionat all.To illustrate the distinction between imperative and functional programming,the factorial function might be coded imperatively in C (without using C’s unusual assignment operations) as:int fact(int{ int x 1;while (n { x x *n n }return x;}n)0)n;1;whereas it would be expressed in ML, the functional language we discuss later,as a recursive function:let rec fact n if n 0 then 1else n * fact(n - 1);;nism is the application of functions to arguments.2Compare Naur’s remarks (Raphael 1966) that he can write any program in a single statement Output P rogram(Input).

1.1. THE MERITS OF FUNCTIONAL PROGRAMMING3In fact, this sort of definition can be used in C too. However for more sophisticated uses of functions, functional languages stand in a class by themselves.1.1The merits of functional programmingAt first sight, a language without variables or sequencing might seem completelyimpractical. This impression cannot be dispelled simply by a few words here.But we hope that by studying the material that follows, readers will gain anappreciation of how it is possible to do a lot of interesting programming in thefunctional manner.There is nothing sacred about the imperative style, familiar though it is.Many features of imperative languages have arisen by a process of abstractionfrom typical computer hardware, from machine code to assemblers, to macro assemblers, and then to FORTRAN and beyond. There is no reason to supposethat such languages represent the most palatable way for humans to communicate programs to a machine. After all, existing hardware designs are not sacredeither, and computers are supposed to do our bidding rather than conversely.Perhaps the right approach is not to start from the hardware and work upwards,but to start with programming languages as an abstract notation for specifyingalgorithms, and then work down to the hardware (Dijkstra 1976). Actually, thistendency can be detected in traditional languages too. Even FORTRAN allowsarithmetical expressions to be written in the usual way. The programmer is notburdened with the task of linearizing the evaluation of subexpressions and findingtemporary storage for intermediate results.This suggests that the idea of developing programming languages quite different from the traditional imperative ones is certainly defensible. However, toemphasize that we are not merely proposing change for change’s sake, we shouldsay a few words about why we might prefer functional programs to imperativeones.Perhaps the main reason is that functional programs correspond more directlyto mathematical objects, and it is therefore easier to reason about them. In orderto get a firm grip on exactly what programs mean, we might wish to assign anabstract mathematical meaning to a program or command — this is the aimof denotational semantics (semantics meaning). In imperative languages, thishas to be done in a rather indirect way, because of the implicit dependencyon the value of the state. In simple imperative languages, one can associatea command with a function Σ Σ, where Σ is the set of possible values forthe state. That is, a command takes some state and produces another state.It may fail to terminate (e.g. while true do x : x), so this function mayin general be partial. Alternative semantics are sometimes preferred, e.g. interms of predicate transformers (Dijkstra 1976). But if we add features that canpervert the execution sequence in more complex ways, e.g. goto, or C’s break

4CHAPTER 1. INTRODUCTIONand continue, even these interpretations no longer work, since one commandcan cause the later commands to be skipped. Instead, one typically uses a morecomplicated semantics based on continuations.By contrast functional programs, in the words of Henson (1987), ‘wear theirsemantics on their sleeves’.3 We can illustrate this using ML. The basic datatypeshave a direct interpretation as mathematical objects. Using the standard notationof [[X]] for ‘the semantics of X’, we can say for example that [[int]] Z. Now theML function fact defined by:let rec fact n if n 0 then 1else n * fact(n - 1);;has one argument of type int, and returns a value of type int, so it can simplybe associated with an abstract partial function Z Z:([[fact]](n) n! if n 0 otherwise(Here denotes undefinedness, since for negative arguments, the programfails to terminate.) This kind of simple interpretation, however, fails in nonfunctional programs, since so-called ‘functions’ might not be functions at all inthe mathematical sense. For example, the standard C library features a functionrand(), which returns different, pseudo-random values on successive calls. Thissort of thing can be implemented by using a local static variable to rememberthe previous result, e.g:int rand(void){ static int n 0;return n 2147001325 * n 715136305;}Thus, one can see the abandonment of variables and assignments as the logicalnext step after the abandonment of goto, since each step makes the semanticssimpler. A simpler semantics makes reasoning about programs more straightforward. This opens up more possibilities for correctness proofs, and for provablycorrect transformations into more efficient programs.Another potential advantage of functional languages is the following. Since theevaluation of expressions has no side-effect on any state, separate subexpressionscan be evaluated in any order without affecting each other. This means thatfunctional programs may lend themselves well to parallel implementation, i.e.the computer can automatically farm out different subexpressions to different3More: denotational semantics can be seen as an attempt to turn imperative languages intofunctional ones by making the state explicit.

1.2. OUTLINE5processors. By contrast, imperative programs often impose a fairly rigid order ofexecution, and even the limited interleaving of instructions in modern pipelinedprocessors turns out to be complicated and full of technical problems.Actually, ML is not a purely functional programming language; it does havevariables and assignments if required. Most of the time, we will work inside thepurely functional subset. But even if we do use assignments, and lose some of thepreceding benefits, there are advantages too in the more flexible use of functionsthat languages like ML allow. Programs can often be expressed in a very conciseand elegant style using higher-order functions (functions that operate on otherfunctions).4 Code can be made more general, since it can be parametrized evenover other functions. For example, a program to add up a list of numbers and aprogram to multiply a list of numbers can be seen as instances of the same program, parametrized by the pairwise arithmetic operation and the correspondingidentity. In one case it is given and 0 and in the other case, and 1.5 Finally,functions can also be used to represent infinite data in a convenient way — forexample we will show later how to use functions to perform exact calculationwith real numbers, as distinct from floating point approximations.At the same time, functional programs are not without their problems. Sincethey correspond less directly the the eventual execution in hardware, it can bedifficult to reason about their exact usage of resources such as time and space.Input-output is also difficult to incorporate neatly into a functional model, thoughthere are ingenious techniques based on infinite sequences.It is up to readers to decide, after reading this book, on the merits of thefunctional style. We do not wish to enforce any ideologies, merely to point outthat there are different ways of looking at programming, and that in the rightsituations, functional programming may have considerable merits. Most of ourexamples are chosen from areas that might loosely be described as ‘symboliccomputation’, for we believe that functional programs work well in such applications. However, as always one should select the most appropriate tool for thejob. It may be that imperative programming, object-oriented programming orlogic programming are more suited to certain tasks. Horses for courses.1.2OutlineFor those used to imperative programming, the transition to functional programming is inevitably difficult, whatever approach is taken. While some will be im4Elegance is subjective and conciseness is not an end in itself. Functional languages, andother languages like APL, often create a temptation to produce very short tricky code whichis elegant to cognoscenti but obscure to outsiders.5This parallels the notion of abstraction in pure mathematics, e.g. that the additive andmultiplicative structures over numbers are instances of the abstract notion of a monoid. Thissimilarly avoids duplication and increases elegance.

6CHAPTER 1. INTRODUCTIONpatient to get quickly to real programming, we have chosen to start with lambdacalculus, and show how it can be seen as the theoretical underpinning for functional languages. This has the merit of corresponding quite well to the actualhistorical line of development.So first we introduce lambda calculus, and show how what was originallyintended as a formal logical system for mathematics turned out to be a completelygeneral programming language. We then discuss why we might want to add typesto lambda calculus, and show how it can be done. This leads us into ML, which isessentially an extended and optimized implementation of typed lambda calculuswith a certain evaluation strategy. We cover the practicalities of basic functionalprogramming in ML, and discuss polymorphism and most general types. Wethen move on to more advanced topics including exceptions and ML’s imperativefeatures. We conclude with some substantial examples, which we hope provideevidence for the power of ML.Further readingNumerous textbooks on ‘functional programming’ include a general introductionto the field and a contrast with imperative programming — browse through afew and find one that you like. For example, Henson (1987) contains a goodintroductory discussion, and features a similar mixture of theory and practiceto this text. A detailed and polemical advocacy of the functional style is givenby Backus (1978), the main inventor of FORTRAN. Gordon (1994) discusses theproblems of incorporating input-output into functional languages, and some solutions. Readers interested in denotational semantics, for imperative and functionallanguages, may look at Winskel (1993).

Chapter 2Lambda calculusLambda calculus is based on the so-called ‘lambda notation’ for denoting functions. In informal mathematics, when one wants to refer to a function, one usuallyfirst gives the function an arbitrary name, and thereafter uses that name, e.g.Suppose f : R R is defined by:(f (x) 0if x 022x sin(1/x ) if x 6 0Then f 0 (x) is not Lebesgue integrable over the unit interval [0, 1].Most programming languages, C for example, are similar in this respect: wecan define functions only by giving them names. For example, in order to usethe successor function (which adds 1 to its argument) in nontrivial ways (e.g.consider a pointer to it), then even though it is very simple, we need to name itvia some function definition such as:int suc(int n){ return n 1;}In either mathematics or programming, this seems quite natural, and generally works well enough. However it can get clumsy when higher order functions(functions that manipulate other functions) are involved. In any case, if we wantto treat functions on a par with other mathematical objects, the insistence onnaming is rather inconsistent. When discussing an arithmetical expression builtup from simpler ones, we just write the subexpressions down, without needing togive them names. Imagine if we always had to deal with arithmetic expressionsin this way:Define x and y by x 2 and y 4 respectively. Then xx y.7

8CHAPTER 2. LAMBDA CALCULUSLambda notation allows one to denote functions in much the same way as anyother sort of mathematical object. There is a mainstream notation sometimesused in mathematics for this purpose, though it’s normally still used as part ofthe definition of a temporary name. We can writex 7 t[x]to denote the function mapping any argument x to some arbitrary expression t[x],which usually, but not necessarily, contains x (it is occasionally useful to “throwaway” an argument). However, we shall use a different notation developed byChurch (1941):λx. t[x]which should be read in the same way. For example, λx. x is the identity functionwhich simply returns its argument, while λx. x2 is the squaring function.The symbol λ is completely arbitrary, and no significance should be read intoit. (Indeed one often sees, particularly in French texts, the alternative notation[x] t[x].) Apparently it arose by a complicated process of evolution. Originally,the famous Principia Mathematica (Whitehead and Russell 1910) used the ‘hat’notation t[x̂] for the function of x yielding t[x]. Church modified this to x̂. t[x],but since the typesetter could not place the hat on top of the x, this appeared as x. t[x], which then mutated into λx. t[x] in the hands of another typesetter.2.1The benefits of lambda notationUsing lambda notation we can clear up some of the confusion engendered byinformal mathematical notation. For example, it’s common to talk sloppily about‘f (x)’, leaving context to determine whether we mean f itself, or the result ofapplying it to particular x. A further benefit is that lambda notation givesan attractive analysis of practically the whole of mathematical notation. If westart with variables and constants, and build up expressions using just lambdaabstraction and application of functions to arguments, we can represent verycomplicated mathematical expressions.We will use the conventional notation f (x) for the application of a function fto an argument x, except that, as is traditional in lambda notation, the bracketsmay be omitted, allowing us to write just f x. For reasons that will becomeclear in the next paragraph, we assume that function application associates tothe left, i.e. f x y means (f (x))(y). As a shorthand for λx. λy. t[x, y] we will useλx y. t[x, y], and so on. We also assume that the scope of a lambda abstractionextends as far to the right as possible. For example λx.x y means λx.(x y) ratherthan (λx. x) y.

2.1. THE BENEFITS OF LAMBDA NOTATION9At first sight, we need some special notation for functions of several arguments.However there is a way of breaking down such applications into ordinary lambdanotation, called currying, after the logician Curry (1930). (Actually the devicehad previously been used by both Frege (1893) and Schönfinkel (1924), but it’seasy to understand why the corresponding appellations haven’t caught the publicimagination.) The idea is to use expressions like λx y. x y. This may beregarded as a function R (R R), so it is said to be a ‘higher order function’or ‘functional’ since when applied to one argument, it yields another function,which then accepts the second argument. In a sense, it takes its arguments oneat a time rather than both together. So we have for example:(λx y. x y) 1 2 (λy. 1 y) 2 1 2Observe that function application is assumed to associate to the left in lambdanotation precisely because currying is used so much.Lambda notation is particularly helpful in providing a unified treatment ofbound variables. Variables in mathematics normally express the dependency ofsome expression on the value of that variable; for example, the value of x2 2depends on the value of x. In such contexts, we will

contrast, pros and cons. General structure of the course: how lambda cal-culus turns out to be a general programming language. Lambda notation: how it clarifies variable binding and provides a general analysis of mathe-matical notation. Currying. Russell’s paradox. 2. Lambda calculus as