
Transcription
Writing a Lexical Analyzer in HaskellToday– (Finish up last Thursday) User-defined datatypes– (Finish up last Thursday) Lexicographical analysis for punctuation andkeywords in Haskell– Regular languages and lexicographical analysis part IThis week– HW2: Due tonight– PA1: It is due in 6 days!– PA2 has been posted. We are starting to cover concepts needed for PA2.CS453 LectureRegular Languages and Lexical Analysis1
User-defined Datatypes in HaskellKindof like enumerate types but can have fieldsdata Bool False Truedata Shape Point Rect Int Int Int Int Circle IntCan derive handy propertiesdata Color Blue Red Yellow deriving (Show)main print Yellowdata Color Blue Red Yellow deriving (Show,Eq)if (Yellow Blue) then . else .Constructors can be used in pattern matchingfoo :: Shape - Stringfoo Point “Point”foo Rect p1 p2 p3 p4 “Rect “ (show p1) .CS453 LectureRegular Languages and Lexical Analysis2
Structure of a Typical CompilerAnalysisSynthesischaracter streamlexical analysistokens“words”syntactic analysisAST“sentences”semantic analysisannotated ASTIR code generationIRoptimizationIRcode generationtarget languageinterpreterCS453 LectureRegular Languages and Lexical Analysis3
Tokens for Example MeggyJava programimport meggy.Meggy;class PA3Flower {public static void main(String[] whatever){{// Upper left petal, clockwiseMeggy.setPixel( (byte)2, (byte)4, Meggy.Color.VIOLET );Meggy.setPixel( (byte)2, (byte)1, Meggy.Color.VIOLET); }}Tokens: TokenImportKW, TokenMeggyKW, TokenSemi,TokenClassKW, TokenID ”PA3Flower”, TokenLBrace, CS453 LectureRegular Languages and Lexical Analysis4
Some Lexical Analysis with Haskell (why is this broken?)module Lexer whereimport Data.Char-- needed for isSpace functiondata Token TokenIfKW TokenComma-- TODO: constructors for all other tokensderiving (Show,Eq)lexer :: String - [Token]lexer [] []lexer (‘i’:’f’:rest) TokenIfKW : lexer rest-- TODO: patterns for other keyword and punctuation tokenslexer (c:rest) if isSpace c then lexer rest else lexer (c:rest)CS453 LectureRegular Languages and Lexical Analysis5
General Approach for Lexical AnalysisRegular LanguagesFinite State Machines–DFAs: Deterministic Finite Automata–Complications when doing lexical analysis– NFAs: Non Deterministic Finite State AutomataFrom Regular Expressions to NFAsFrom NFAs to DFAsCS453 LectureRegular Languages and Lexical Analysis6
About The Slides on Languages and Finite AutomataSlides Originally Developed by Prof. Costas Busch (2004)– Many thanks to Prof. Busch for developing the original slide set.Adapted with permission by Prof. Dan Massey (Spring 2007)– Subsequent modifications, many thanks to Prof. Massey for CS 301 slidesAdapted with permission by Prof. Michelle Strout (Spring 2011)– Adapted for use in CS 453Adapted by Wim Bohm( added regular expr à NFA à DFA, Spr2012)Added slides from Profs. Christian Colberg and Saumya Debray (Fall 2016)CS453 LectureRegular Languages and Lexical Analysis7
LanguagesA language is a set of strings(sometimes called sentences)String: A finite sequence of lettersExamples: “cat”, “dog”, “house”, Defined over a fixed alphabet:Σ {a, b, c, , z}CS453 LectureRegular Languages and Lexical Analysis8
Empty StringA string with no letters: εObservations:ε 0εw wε wεabba abbaε abbaCS453 LectureRegular Languages and Lexical Analysis9
Regular ExpressionsRegular expressions describe regular languagesYou have probably seen them in OSs / editorsExample:(a (b)(c)) *describes the language L((a (b)(c))*) {ε,a,bc,aa,abc,bca,.}CS453 LectureRegular Languages and Lexical Analysis10
Recursive Definition for Specifying Regular ExpressionsPrimitive regular expressions:where α Σ, some alphabet , ε, αGiven regular expressions r1 and r2r1 r2r1 r2r1 *Are regular expressions(r1 )CS453 LectureRegular Languages and Lexical Analysis11
Regular operatorschoice: A Bconcatenation: A Brepetition:A*grouping:A (A)a string from L(A) or from L(B)a string from L(A) followed by astring from L(B)0 or more concatenations of stringsfrom L(A)1 or moreConcatenation has precedence over choice: A B C vs. (A B)CMore syntactic sugar, used in scanner generators:[abc] means a or b or c[\t\n ] means tab, newline, or space[a-z] means a,b,c, , or zCS453 LectureRegular Languages and Lexical Analysis12
Example Regular Expressions and Regular DefinitionsRegular definition:name : regular expressionname can then be used in other regular expressionsKeywords “print”, “while”Operations: “ ”, “-”, “*”Identifiers:let : [a-zA-Z] // chose from a to z or A to Zdig : [0-9]id : let (let dig)*Numbers: dig dig dig*CS453 LectureRegular Languages and Lexical Analysis13
Finite Automaton, or Finite State Machine (FSM)InputStringOutputFiniteAutomatonCS453 LectureRegular Languages and Lexical AnalysisString14
Finite State MachineInputStringFiniteAutomatonCS453 LectureOutput“Accept”or“Reject”Regular Languages and Lexical Analysis15
State Transition Graphabba -Finite Acceptera, bq5bq0 aaabq1 b q2 b q3 ainitialstatestateCS453 LecturetransitionRegular Languages and Lexical Analysisa, bq4finalstate“accept”16
Initial Configurationa b b aInput Stringa, bq5bq0 aCS453 Lectureaabq1 b q2 b q3 aRegular Languages and Lexical Analysisa, bq417
Reading the Inputa b b aa, bq5bq0 aCS453 Lectureaabq1 b q2 b q3 aRegular Languages and Lexical Analysisa, bq418
a b b aa, bq5bq0 aCS453 Lectureaabq1 b q2 b q3 aRegular Languages and Lexical Analysisa, bq419
a b b aa, bq5bq0 aCS453 Lectureaabq1 b q2 b q3 aRegular Languages and Lexical Analysisa, bq420
a b b aa, bq5bq0 aCS453 Lectureaabq1 b q2 b q3 aRegular Languages and Lexical Analysisa, bq421
Input finisheda b b aa, bq5bq0 aaabq1 b q2 b q3 aa, bq4Output: “accept”CS453 LectureRegular Languages and Lexical Analysis22
String Rejectiona b aa, bq5bq0 aCS453 Lectureaabq1 b q2 b q3 aRegular Languages and Lexical Analysisa, bq423
a b aa, bq5bq0 aCS453 Lectureaabq1 b q2 b q3 aRegular Languages and Lexical Analysisa, bq424
a b aa, bq5bq0 aCS453 Lectureaabq1 b q2 b q3 aRegular Languages and Lexical Analysisa, bq425
a b aa, bq5bq0 aCS453 Lectureaabq1 b q2 b q3 aRegular Languages and Lexical Analysisa, bq426
Input finisheda b aa, bbq0 aCS453 LectureOutput:q5 “reject”a, baabq1 b q2 b q3 aRegular Languages and Lexical Analysisq427
The Empty Stringεa, b bq0 aCS453 Lectureq5aabq1 b q2 b q3 aRegular Languages and Lexical Analysisa, bq428
εa, b bq0 aOutput:CS453 Lecture“reject”q5aabq1 b q2 b q3 aa, bq4Would it be possible to accept theempty string?Regular Languages and Lexical Analysis29
Another Examplea a ba, baq0CS453 Lecturebq1a, bRegular Languages and Lexical Analysisq230
a a ba, baq0CS453 Lecturebq1a, bRegular Languages and Lexical Analysisq231
a a ba, baq0CS453 Lecturebq1a, bRegular Languages and Lexical Analysisq232
a a ba, baq0CS453 Lecturebq1a, bRegular Languages and Lexical Analysisq233
Input finisheda a baq0CS453 LectureOutput: “accept”bq1a, bRegular Languages and Lexical Analysisa, bq234
Rejectionb a ba, baq0CS453 Lecturebq1a, bRegular Languages and Lexical Analysisq235
b a ba, baq0CS453 Lecturebq1a, bRegular Languages and Lexical Analysisq236
b a ba, baq0CS453 Lecturebq1a, bRegular Languages and Lexical Analysisq237
b a ba, baq0CS453 Lecturebq1a, bRegular Languages and Lexical Analysisq238
Input finishedb a bWhich strings are accepted?a, baq0bq1a, bq2Output: “reject”CS453 LectureRegular Languages and Lexical Analysis39
FormalitiesDeterministic Finite Automaton (DFA)M (Q, Σ, δ , q0 , F )Q : set of statesΣ : input alphabetδ : transition functionq0 : initial stateF : set of final (accepting) statesCS453 LectureRegular Languages and Lexical Analysis40
ΣInput AlphabetΣ {a, b}a, bq5bq0 aCS453 Lectureaabq1 b q2 b q3 aRegular Languages and Lexical Analysisa, bq441
QSet of StatesQ {q0 , q1, q2 , q3 , q4 , q5 }a, bq5bq0 aCS453 Lectureaabq1 b q2 b q3 aRegular Languages and Lexical Analysisa, bq442
q0Initial Statea, bq5bq0 aCS453 Lectureaabq1 b q2 b q3 aRegular Languages and Lexical Analysisa, bq443
FSet of Final StatesF {q4 }a, bq5bq0 aCS453 Lectureaabq1 b q2 b q3 aRegular Languages and Lexical Analysisa, bq444
δTransition Functionδ :Q Σ Qa, bq5bq0 aCS453 Lectureaabq1 b q2 b q3 aRegular Languages and Lexical Analysisa, bq445
δ (q0 , a ) q1a, bq5bq0 aCS453 Lectureaabq1 b q2 b q3 aRegular Languages and Lexical Analysisa, bq446
δ (q0 , b ) q5a, bq5bq0 aCS453 Lectureaabq1 b q2 b q3 aRegular Languages and Lexical Analysisa, bq447
δ (q2 , b ) q3a, bq5bq0 aCS453 Lectureaabq1 b q2 b q3 aRegular Languages and Lexical Analysisa, bq448
Transition Function / Tableδq0q1q2q3q4q5aq1q5q5q4q5q5CS453 Lecturebq5q2q3q5q5q5 bq0 aδa, bq5aabq1 b q2 b q3 aRegular Languages and Lexical Analysisa, bq449
Complications1. "1234" is an NUMBER but what about the “123” in “1234”or the “23”, etc. Also, the scanner must recognize many tokens,not one, only stopping at end of file.2. "if" is a keyword or reserved word IF, but "if" is also defined bythe reg. exp. for identifier ID. We want to recognize IF.3. We want to discard white space and comments.4. "123" is a NUMBER but so is "235" and so is "0", just as"a" is an ID and so is "bcd”, we want to recognize a token,but add attributes to it.CS453 LectureRegular Languages and Lexical Analysis50
Before Next TimeHW2: Due tonight!PA1: It is due in 6 days. Should be almost done.Read Chapters 2 and 3 in the online book.CS453 LectureRegular Languages and Lexical Analysis51
CS453 Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today - (Finish up last Thursday) User-defined datatypes - (Finish up last Thursday) Lexicographical analysis for punctuation and keywords in Haskell - Regular languages and lexicographical analysis part I This week - HW2: Due tonight