A bit of background information at the start of this post seems prudent. I am currently a PhD student at the University of Kent in Canterbury, England. My supervisor is Simon Thompson as I work on developing type refactorings with the Haskell Refactorer (HaRe). I am a moderately experienced in functional programming from my bachelor’s degree but have not dealt with Haskell much before arriving here.
Currently my longer term goal is to start implementing simple type refactorings in the new GHC compatible version of HaRe found here: https://github.com/alanz/HaRe. One thing that I kept being hung up on when it came to comprehending HaRe’s code is monads. I’ve been messing around in Haskell for a little over a term and really felt that the state monad in particular was causing me some major hangups and this deficiency of mine should be addressed more directly.
Simon suggested I write an evaluator to a simple arithmetic expression language that supported variable assignment e.g. ((<x:2 + 5) – x) => 5. This can use the state monad fairly easily as the values of each variable needs to be passed on from the evaluation from one sub-expression to the next.
The rest of this blog post will be a journal entry of sorts that details the issues I ran across implementing a monadic parser. I don’t really see this post as being useful for other people but could be interesting as it runs through certain problems that people may have when trying to understand the state monad. Other than that I have added all of my code to github for people to look at.
Attempt 1: Crash and Burn
I’m not going to go into much detail about my first attempt at parsing. I was still very confused at what all the parts of the state monad did and how they worked together. I had everything working but I was doing parsing and evaluation in one big step and nothing I had written could be described as monadic. As I understood it at this point I would write a function that would take one step of evaluation wrap that inside of a state monad and then… well I hadn’t really ever figured that out yet. This didn’t stop me from spending several days pouring over state monad tutorials trying to get anything to work. Eventually Simon and I had a meeting and I began to get it.
Attempt 2: More monadic still no monads
Several frustrating days of ugly Haskell code with little mental understanding behind it later I met with Simon and I think that my code helped Simon see what my issues were. Most of our time was spent writing a non monadic parser and evaluator which can be found here.
First we started out by really nailing down exactly how evaluation works and we tweaked the syntax a bit. Now each expression and sub-expression will be wrapped with parentheses, and assignment statements are now prepended with the ‘<‘ character. The type of the expression is fairly straightforward:
|data Expr = Var Char|
|| N Int|
|| Add Expr Expr|
|| Sub Expr Expr|
|| Assign Char Expr|
Now with the expression language in a much more recursive style began writing a simple parser (to be fair it was mostly Simon with me watching). Our parser is of type:
|parse :: String -> (Expr,String)|
This parser takes in a string that represents some expression finds the next complete sub-expression in that string and returns a tuple with the expression and a string with the characters that still need to be parsed. It is very simple in that it just assumes that there is nothing wrong with the expression. For example if the assignment character (‘<‘ ) is not followed by a single character for a variable name, a colon, and finally a single digit for a value of that variable the parser will throw an error. Once that was finished I think the most helpful part of writing the parser was seeing the big where clause at the end of it. It was becoming much clearer for me to see how you would thread all of that together in a do block.
I think that the evaluator was much easier for me to see monadically. I think what made it much simpler for me was that we were working on a data structure rather than a string and you just needed to pattern match on each of the constructors of type Expr. Again the structure of the non monadic code was written in a sufficiently monadic way that I could see how to rewrite eval using the state monad.
Attempt 3: Monads (Finally)
The job of translating the simple eval and parser code was surprisingly simple. The first step I took was deciding on the types of the state monads:
The evalState was easy to see which type should be the state and which type should be the result. The whole idea behind this project was that by parsing a language with re-assignable variables the state monad would have to keep track of those changes. The env type is just a list of (Char,Int) tuples that correspond to variables and their associated values. Since eval had such an obvious type I decided to start rewriting eval first.
Monadic eval has all of the same patterns that the other eval did and many of them are trivial.
|eval :: Expr -> EvalSt Int|
|eval (N n) = return n|
|eval (Add e1 e2) = do|
|v1 <- eval e1|
|v2 <- eval e2|
|return (v1 + v2)|
|eval (Sub e1 e2) = do|
|v1 <- eval e1|
|v2 <- eval e2|
|return (v1 – v2)|
These cases really just involve matching one of the Expr constructors and either just returning the value contained, or recursively calling eval on any expressions that are parameters to the constructor then applying addition or subtraction on that evaluation. The more interesting cases of eval are the Var and Assign constructors because for both of these access the state of env.
|eval :: Expr -> EvalSt Int|
|eval (Var v) = do|
|env <- get|
|return (head [val | (x,val) <- env, x==v])|
|eval (Assign x e) = do|
|v <- eval e|
|env <- get|
|put $ (x,v):env|
These constructors access the stateful env for either getting the value of a variable (Var) or updating the env (Assign). One of the parts of the state monad that confused me for a while was the use of get and put. Succinctly the way I understand them now is that inside of a do block get returns the shared state that is in the monad (in this case which is type Env) and put updates the state with the provided argument. The Var pattern of eval just has to get the state and find the appropriate entry to return the current value of a variable. Assign has a few more steps first we evaluate an expression to assign the value of the expression to the variable. Then, using get again, the environment variable is retrieved. Finally the new variable, value tuple is cons-ed to the front of the env list. We can get away with just appending the tuple without replacing the old entry because of the call to head in the var constructor.
Overall rewriting eval was very straight forward and I didn’t run into any problems. It made a nice preface to rewriting parse as it helped me understand how to use get and put in particular. I found parse to be a little more of a challenge though nothing major, just a nice step up from eval.
Conceptually parser was a little more difficult to understand how its three parts fit together. I knew that the parser would take in a string as input but I had a little bit of trouble figuring out what was going to be the stateful part of the monad and what the result was. In hindsight I should have realized that since the goal of the parser was to translate a string into a expression that the return type needed to be an expression but I think I was still wrapping my head around how to translate the big where clause into separate do blocks. After starting to write parse with the type:
|type ParseState = State Expr|
|parse :: String -> ParseState String|
I pretty quickly noticed that this didn’t make much sense. I had imagined that the state was the expression being built up from nothing and that the return value would be the string of what was left to be parsed. I’m not quite sure how I ever thought this was correct but I caught myself fairly quickly and redid parse so that it now looks like this:
|type ParseState = State String|
|parse :: String -> ParseState Expr|
|| isAlpha(ch) = do|
|return (Var ch)|
|| isDigit(ch) = do|
|return (N (fromEnum ch – fromEnum '0'))|
|| ch == '(' = do|
|e1 <- parse(chs)|
|rest1 <- get|
|let (op:rest2) = rest1|
|e2 <- parse(rest2)|
|rest3 <- get|
|let (')':restFinal) = rest3|
|return $ apply op e1 e2|
|| ch =='<' = do|
|let (v:':':rest) = chs|
|e <- parse(rest)|
|restFinal <- get|
|return (Assign v e)|
|where apply '+' = Add|
|apply '–' = Sub|
The parser really just looks at what character is next and guards based on that. I’m not going to step through this in detail as it seems fairly self-explanatory.
Some Final Thoughts
I thought I would quickly finish up this posting with my thoughts on the state monad as of now. I think that the key part of this project for me was understanding how state was represented within the monad. The first type parameter of the state monad is some object that you will be updating through the calculation that you are performing. Updating the state involves getting the state object with get then you use put to update the monad with the new object. Finally at this point I am more in agreement than not with the ideas in the “Monad Tutorial Fallacy”, it wasn’t until I started trying to actually implement something that some of the ideas behind State started to click.