Haskell basics

Haskell is a functional language, that means that everything is a function. For every input you give to a function, it gives you same output every time. It is called that these functions are pure.
Functions which are not pure have some side effects, for example changing global variable in Imperative language like C or Python.

Lists, map, foldr,

List comrpehension

[(a,b)| a<-[1..5],b<-["dog","cat","crocodile"]]

map

 map (\arg1 -> arg1*arg1) [1..10]

zip

zip [1..5] ["baf","haf","meow","kek","kuk"]

zipWith

zipWith (\ x y -> (x*2,y,y++y,"WHATEVER")) [1..5] ["baf","haf","meow","kek","kuk"]

foldl,foldr

Example with brontosaurus

foldl (\x y -> x++[y] ) [] ['b','r','o','n','t','o','s','a','u','r','u','s']

Output:

"brontosaurus"

Language basics

You can use let to define local functions and local variables of type:


    numbeeers::Int->Int
    numbeeers num = let oneLess int = int - 1
                        twoLess int = int - 2
                    in (oneLess num) + (twoLess num)

Where is used to define local functions and variables at the end of the function.


    numbeeers::Int->Int
    numbeeers num =  (oneLess num) + (twoLess num)
            where oneLess int = int - 1
                  twoLess int = int - 2

Guards is a case like structure, it goes from top to bottom and compares conditions.


    doYouLikeCake::String->String
    doYouLikeCake answer
        | answer == "Yes" = "Good boi."
        | answer == "No"  = "No cake for you."
        | otherwise       = "Sort it out, Yes or No?"

Recursion.

Map

Folds

Haskell type system, static typing, type variables, typeclasses

 :set -fbreak-on-exception

 :abandon

HTML parser

Let's define html data type following way:


data HTML = Html  [HTML] | Head   [HTML] | Body  [HTML] | P String  | Script  String deriving (Show)

Here is a function that takes this data and creates list of tags and strings


    parse::HTML->[String]
    parse (Html body) = ["<html>"]++( concat $ map parse body ) ++ ["</html>"]
    parse (Head body) = ["<head>"]++( concat $ map parse body ) ++ ["</head>"]
    parse (Body body) = ["<body>"]++( concat $ map parse body ) ++ ["</body>"]
    parse (P    body) = ["<p>"]++[body]++ ["</p>"]
    parse (Script body) = ["<script>"]++[body]++["</head>"]

So let's create sample data:

let web1 = Html [Head [Script "javascript"],Body [P "Dog is colorful.", P "What is this article"] ]

Applying parse on web1 gets:


["<html>","<head>","<script>","javascript","</head>","</head>","<body>","<p>","Dog is colorful.","</p>","<p>","What is this article","</p>","</body>","</html>"]

Now we need to print it to output:


print2::[String]->IO()
print2 list = do if ( length list >0)
                    then  do putStrLn $ head list
                             print2 $ tail list
                    else  do return ()

And output is:

<html>
<head>
<script>
javascript
</head>
</head>
<body>
<p>
Dog is colorful.
</p>
<p>
What is this article
</p>
</body>
</html>

Okay...but this is rather dull, lets add padding so it looks like real html!



    parse::HTML->Int->[(Int,String)]
    parse (Html body) level = [(level,"<html>")]++( concat $ map (\x -> parse x (level+1)) body  ) ++ [(level,"</html>")]
    parse (Head body) level = [(level,"<head>")]++( concat $ map (\x -> parse x (level+1)) body ) ++ [(level,"</head>")]
    parse (Body body) level = [(level,"<body>")]++( concat $ map (\x -> parse x (level+1)) body ) ++ [(level,"</body>")]
    parse (P    body) level = [(level,"<p>")]++[(level+1,body)]++ [(level,"</p>")]
    parse (Script body) level = [(level,"<script>")]++[(level+1,body)] ++ [(level,"</head>")]


    print2::[(Int,String)]->IO()
    print2 list = do if ( length list >0)
                        then  do putStrLn $  (padding $ fst $ head list)++(snd $ head list)
                                 print2 $ tail list
                        else  do return ()


    padding::Int->String
    padding num = take newNum $ repeat ' '
        where newNum = num*4

Output is:


<html>
    <head>
        <script>
            javascript
        </head>
    </head>
    <body>
        <p>
            Dog is colorful.
        </p>
        <p>
            What is this article
        </p>
    </body>
</html>

Huffman coding

Huffman coding is works like this. 1.) scan file you want to compress and compute list of (character,frequency) 2.) create huffman tree, which is binary tree with leaves that are characters and their frequencies 3.) encode original file with this tree

Simple database

We will create simple database.

Binary Trees

Binary tree is declared like this:


    data BT a = Leaf a | (BST a) (BST a) deriving (Show)

N-ary trees

N-ary trees can have arbitrary numbers of descendants, and are defined like this:


data NT a = N a [NT a] deriving Show

--number a number
aNumber::   a ->            -- thing to number
            Int->           -- starting number
            ((a,            -- thing
            Int),           -- number
            Int)            -- next number


aNumber a aNum = ((a,aNum),aNum+1)

--number a tree

nNumber:: NT a   ->  -- thing to number
          Int    ->  -- starting number
         (NT (a,Int) -- numbered thing
         ,Int)       -- next number

nNumber (N a list) i0 = (  N nThing numberedList ,i2)
    where   (nThing,i1)= aNumber a i0
            ( numberedList, i2) = ntsNumber list i1



-- -number a list of trees
ntsNumber:: [NT a] -- thing to number
            -> Int -- starting ntsNumber
            -> ([NT (a,Int)], -- thing to number
            Int    -- next following number
            )
ntsNumber []     i0 = ([],i0)
ntsNumber (x:xs) i0 = (tree:treeList,i2)
    where   (tree,i1)      = nNumber x i0
            (treeList,i2) = ntsNumber xs i1

KMP

Aho Corasick

Graph representation

We will use following data:


    a b 2
    a c 4
    b a 2
    b d 3
    b g 4
    c a 4
    c d 5
    c e 6
    d b 3
    d c 5
    d f 6
    e c 6
    e f 7
    f d 6
    f e 7
    f g 9
    g b 4
    g f 9

Lets transform this into haskell friendly form, line by line,

    a b 2

    ('a','b',2)

Then we'll have list of these triplets, as edge list with weights.
Here is how we do it:


    import System.IO
    import Data.List.Split

    main::IO()
    main = do
        inh <- openFile "data01.txt" ReadMode
        outh <- openFile "output.txt" WriteMode
        hPutChar outh '['
        mainloop "" inh outh
        hClose inh
        hClose outh


    mainloop::String->Handle -> Handle -> IO ()
    mainloop nextWord  inh outh =
        do  ineof <- hIsEOF inh
            if ineof
                then do hPutChar outh ']'
                        return ()
                else do if (nextWord=="")
                            then  do inpStr <- hGetLine inh
                                     hPutStr outh $ transform inpStr
                                     mainloop inpStr inh outh

                            else do  hPutChar outh ','
                                     inpStr <- hGetLine inh
                                     hPutStr outh $ transform inpStr
                                     mainloop inpStr inh outh


    transform::String->String
    transform inpStr = let parts =  splitOn " " inpStr
                       in "("++(parts!!0)++","++(parts!!1)++","++(parts!!2)++")"

Output:

[('a','b',2),('a','c',4),('b','a',2),('b','d',3),('b','g',4),('c','a',4),('c','d',5),('c','e',6),('d','b',3),('d','c',5),('d','f',6),('e','c',6),('e','f',7),('f','d',6),('f','e',7),('f','g',9),('g','b',4),('g','f',9)]

Hmm it would be better to have graph in Adjacency list format.

elToAl is a function that does exactly that.


    elToAl::(Eq a,Integral b)=>[(a,a,b)]->[(a,[(a,b)])]
    elToAl edgeList = elToAl2 edgeList []


    elToAl2::(Eq a,Integral b)=>[(a,a,b)]->[(a,[(a,b)])]->[(a,[(a,b)])]
    elToAl2 [] acc = acc
    elToAl2 ((from,to,dist):es) acc =elToAl2 es  (addToAl (from,to,dist) acc)

    -- takes first edge and add it to adjacency list, return new adjacency list
    addToAl::(Eq a,Integral b)=>(a,a,b)->[(a,[(a,b)])]->[(a,[(a,b)])]
    addToAl (from,to,dist) adjList
        | (length adjList) == 0 = [(from,[(to,dist)])]
        | from==from2 = (from2,neighs++[(to,dist)]):(tail adjList)
        | from/=from2 = (from2,neighs):(addToAl (from,to,dist) (tail adjList) )
        | otherwise  =  error "Whut"
        where (from2,neighs)= head adjList

And here is resulting adjacency list

[('a',[('b',2),('c',4)]),('b',[('a',2),('d',3),('g',4)]),('c',[('a',4),('d',5),('e',6)]),('d',[('b',3),('c',5),('f',6)]),('e',[('c',6),('f',7)]),('f',[('d',6),('e',7),('g',9)]),('g',[('b',4),('f',9)])]

Dijkstra algorithm


    --nodes have to have additional info, so lets create custom type for them

    data Node =  Node   { from :: Char
                        , adjList:: [(Char,Int)]
                        , dist :: Int
                        , prev:: Char
                        } deriving (Show)
    type EdgeList = [(Char,[(Char,Int)])]


    -- data MinQueue
    type Queue = [Node]

    --
    -- dijsktra::EdgeList->Queue->Node->Node->([Node],Int)
    -- dijkstra

    dijkstra::EdgeList->Queue->Node->Node->([Node],Int)
    dijkstra edgeList queue source target = if (length queue) > 0 -- || (u/=target)
        then    let
                    newEdgeList  =  (changeDistances queue edgeList)
                in dijkstra newEdgeList (newQueue)  source target


        else (findPath edgeList source target )




        where   (u,newQueue) = minNode queue
                initializeNodes = initialize  edgeList
                source2 = Node ( from source) ( adjList source) (0) (prev source)

    changeDistances::Queue->EdgeList->EdgeList
    changeDistances queue edgelist = edgelist --TODO

    findPath::EdgeList->Node->Node->([Node],Int)
    findPath  edgeList source target = ([source],4) --TODO

    minNode::Queue->(Node,Queue)
    minNode queue = (head queue,queue) --TODO

    initialize::EdgeList->[Node]
    initialize edgeList = map (\x ->  Node (fst x) (snd x) (maxBound) (' ') ) edgeList