Introduction to Haskell - Type Classes and User Defined Data Types

July 10, 2015

[Published in Open Source For You (OSFY) magazine, October 2014 edition.]

Haskell is purely a functional programming language and it enforces strictness with the use of types. In this article, we shall explore type classes and user defined data types.

Consider the elem function that takes an element of a type, a list, and returns ‘true’ if the element is a member of the list; and if not, it returns ‘false’. For example:

ghci> 2 `elem` [1, 2, 3]
True
ghci> 5 `elem` [1, 2, 3]
False

It’s type signature is shown below:

ghci> :t elem
elem :: Eq a => a -> [a] -> Bool

The type signature states that the type variable ‘a’ must be an instance of class ‘Eq’. The class constraint is specified after the ’::’ symbol and before the ‘=>’ symbol in the type signature. The elem function will thus work for all types that are instances of the Eq class.

The word ‘class’ has a different meaning in functional programming. A type class is a parameterised interface that defines functions. A type that is an instance of a type class needs to implement the defined functions of the type class. The Eq class defines functions to assert if two type values are equal or not. Its definition in Haskell is as follows:

class Eq a where
    (==), (/=)           :: a -> a -> Bool

    x /= y               = not (x == y)
    x == y               = not (x /= y)

The keyword class is used to define a type class. This is followed by the name of the class (starting with a capital letter). A type variable (‘a’ here) is written after the class name. Two functions are listed in this class for finding if two values of a type are equal or not. A minimal definition for the two functions is also provided. This code is available in libraries/ghc-prim/GHC/Classes.hs in the GHC (Glasgow Haskell Compiler) source code.

The example works for integers ‘2’ and ‘5’ in the example above, because they are of type Int, which is an instance of Eq. Its corresponding definition is given below:

instance Eq Int where
    (==) = eqInt
    (/=) = neInt

The keyword instance is used in the definition followed by the name of the class Eq, and a specific type Int. It uses two primitive functions eqInt and neInt for checking if the given integers are equal or not. The detailed definition is available in libraries/ghc-prim/GHC/Classes.hs in the GHC source code.

There are a number of pre-defined type classes available in the Haskell platform.

The Ord type class denotes types that can be compared. The compare function will need to be implemented by types that want to be instances of this class. The resultant values of ‘compare’ are GT, LT, or EQ. For example:

ghci> 'p' > 'q'
False
ghci> 3 > 2
True

Its type class definition is as follows:

class (Eq a) => Ord a where
    compare              :: a -> a -> Ordering
    (<), (<=), (>), (>=) :: a -> a -> Bool
    max, min             :: a -> a -> a

    compare x y = if x == y then EQ
                  else if x <= y then LT
                  else GT

    x <  y = case compare x y of { LT -> True;  _ -> False }
    x <= y = case compare x y of { GT -> False; _ -> True }
    x >  y = case compare x y of { GT -> True;  _ -> False }
    x >= y = case compare x y of { LT -> False; _ -> True }

    max x y = if x <= y then y else x
    min x y = if x <= y then x else y

The Ord type class needs to be a sub-class of the Eq class because we should be able to test for equality of two values if they need to be compared. This is also defined as a constraint in the class definition. Seven functions are provided and a minimal definition given in the code snippet. The instance definitions for Char and Int types are available from libraries/ghc-prim/GHC/Classes.hs in the GHC source code.

The Enum type class is for types whose values can be listed in an order for which you can find predecessor and successor elements. For example:

ghci> succ 'a'
b
ghci> pred EQ
LT

The class definition for Enum is given below:

class Enum a where
    succ                :: a -> a
    pred                :: a -> a
    toEnum              :: Int -> a
    fromEnum            :: a -> Int

    enumFrom            :: a -> [a]
    enumFromThen        :: a -> a -> [a]
    enumFromTo          :: a -> a -> [a]
    enumFromThenTo      :: a -> a -> a -> [a]

    succ                   = toEnum . (+ 1)  . fromEnum
    pred                   = toEnum . (subtract 1) . fromEnum
    enumFrom x             = map toEnum [fromEnum x ..]
    enumFromThen x y       = map toEnum [fromEnum x, fromEnum y ..]
    enumFromTo x y         = map toEnum [fromEnum x .. fromEnum y]
    enumFromThenTo x1 x2 y = map toEnum [fromEnum x1, fromEnum x2 .. fromEnum y]

The instance for type Ordering for the Enum class is as follows:

instance Enum Ordering where
  succ LT = EQ
  succ EQ = GT
  succ GT = error "Prelude.Enum.Ordering.succ: bad argument"

  pred GT = EQ
  pred EQ = LT
  pred LT = error "Prelude.Enum.Ordering.pred: bad argument"

  toEnum n | n == 0 = LT
           | n == 1 = EQ
           | n == 2 = GT
  toEnum _ = error "Prelude.Enum.Ordering.toEnum: bad argument"

  fromEnum LT = 0
  fromEnum EQ = 1
  fromEnum GT = 2

  -- Use defaults for the rest
  enumFrom     = boundedEnumFrom
  enumFromThen = boundedEnumFromThen

You can find the definition and instance definitions for Char and Ordering in libraries/base/GHC/Enum.lhs in the GHC source code.

The Show type class lists a show function to display or print data. For example:

ghci> show 3.1415
"3.1415
ghci> show True
"True"

The above code works for both ‘Float’ and ‘Bool’ because there are instance definitions for each in the Show type class.

The read function for the ‘Read’ type class takes as input a ‘String’ and converts it to an appropriate data type, if possible. For example:

ghci> read "1" + 2.0
3.0
ghci> read "False" || True
True

You will find the class definitions and instances for Show and Read in libraries/base/GHC/Show.lhs and libraries/base/GHC/Read.lhs respectively. The .lhs file is a literate Haskell source file in which you can combine both text and code. You can also find the definition for a class, a function or type inside GHCi using ’:i’. For example:

ghci> :i Eq
class Eq a where
  (==) :: a -> a -> Bool
  (/=) :: a -> a -> Bool
  	-- Defined in `GHC.Classes'
instance Eq Integer -- Defined in `integer-gmp:GHC.Integer.Type'
instance Eq Ordering -- Defined in `GHC.Classes'
instance Eq Int -- Defined in `GHC.Classes'
instance Eq Float -- Defined in `GHC.Classes'
instance Eq Double -- Defined in `GHC.Classes'
instance Eq Char -- Defined in `GHC.Classes'
instance Eq Bool -- Defined in `GHC.Classes'
...

ghci> :i read
read :: Read a => String -> a 	-- Defined in `Text.Read'

ghci> :i Int
data Int = GHC.Types.I# GHC.Prim.Int# 	-- Defined in `GHC.Types'
instance Bounded Int -- Defined in `GHC.Enum'
instance Enum Int -- Defined in `GHC.Enum'
instance Eq Int -- Defined in `GHC.Classes'
instance Integral Int -- Defined in `GHC.Real'
instance Num Int -- Defined in `GHC.Num'
instance Ord Int -- Defined in `GHC.Classes'
instance Read Int -- Defined in `GHC.Read'
instance Real Int -- Defined in `GHC.Real'
instance Show Int -- Defined in `GHC.Show'

Let’s suppose you input the following in a GHCi prompt:

ghci> read "3"

<interactive>:5:1:
    No instance for (Read a0) arising from a use of `read'
    The type variable `a0' is ambiguous
    Possible fix: add a type signature that fixes these type variable(s)
    Note: there are several potential instances:
      instance Read () -- Defined in `GHC.Read'
      instance (Read a, Read b) => Read (a, b) -- Defined in `GHC.Read'
      instance (Read a, Read b, Read c) => Read (a, b, c)
        -- Defined in `GHC.Read'
      ...plus 25 others
    In the expression: read "3"
    In an equation for `it': it = read "3"

The interpreter does not know what type to convert ‘3’ to, and hence you will need to explicitly specify the type:

ghci> read "3" :: Int
3
ghci> read "3" :: Float
3.0

A type synonym is an alias that you can use for a type. ‘String’ in the Haskell platform is an array of characters defined using the type keyword:

type String = [Char]

You can also create a new user data type using the data keyword. Consider a Weekday data type that has the list of days in a week:

data Weekday = Monday
             | Tuesday
             | Wednesday
             | Thursday
             | Friday
             | Saturday
             | Sunday

The data keyword is followed by the name of the data type, starting with a capital letter. After the ‘equal to’ (‘=’) sign, the various value constructors are listed. The different constructors are separated by a pipe (‘|’) symbol.

If you load the above data type in GHCi, you can test the value constructors:

ghci> :t Monday
Monday :: Weekday

Each value constructor can have many type values. The user defined data type can also derive from type classes. Since the primitive data types already derive from the basic type classes, the user defined data types can also be derived. Otherwise, you will need to write instance definitions for the same. The following is an example for a user data type ‘Date’ that derives from the ‘Show’ type class for displaying the date:

data Date = Int String Int deriving (Show)

Loading the above in GHCi, you get:

ghci> Date 3 "September" 2014 
Date 3 "September" 2014

The above code will work even if we swap the year and day because the syntax is correct but the semantics are not!

ghci> Date 2014 "September" 3 
Date 2014 "September" 3

You can also use the record syntax that can give you helper functions:

data Date = Date { day :: Int
                 , month :: String         
                 , year :: Int
                 }

This gives you three helper functions to retrieve the day, month and year from a ‘Date’.

ghci> let d = Date {day = 14, month = "September", year = 2014}
ghci> day d
14
ghci> month d
"September"

You can also make data type definition more explicit with types:

data Date = Date Day Month Year deriving (Show)

type Year = Int

type Day = Int

data Month = January 
     | February
     | March
     | April
     | May
     | June
     | July
     | August
     | September
     | October
     | November
     | December
     deriving (Show)

Loading the above in GHCi, you can use:

ghci> Date 3 September 2014
Date 3 September 2014

To support printing the date in a specific format, you can implement an instance for the ‘Show’ type class. You can also add a check to ensure that the day is within a range, and the year and day cannot be swapped:

instance Show Date where
    show (Date d m y)
         | d > 0 && d <= 31 = (show d ++ " " ++ show m ++ " " ++ show y)
         | otherwise = error "Invalid day"

Loading the code in GHCi, and running the following:

ghci> show (Date 3 September 2014)
"3 September 2014"
ghci> show (Date 2014 September 2)
"*** Exception: Invalid day

Suppose, you wish to support different Gregorian date formats, you can define a data type GregorianDate as follows:

data GregorianDate = DMY Day Month Year | YMD Year Month Day

You can also define your own type classes for functions that define their own behaviour. For example, if you wish to dump the output of a date that is separated by dashes, you can write a ‘Dashed’ class with a dash function.

class Dashed a where
    dash :: a -> String

instance Dashed Date where
    dash (Date d m y) = show d ++ "-" ++ show m ++ "-" ++ show y

Testing the above in GHCi will give the following output:

ghci> dash (Date 14 September 2014)
"14-September-2014"

Haskell allows you to define recursive data types also. A parameterized list is defined as:

data List a = Empty | Cons a (List a) deriving (Show)

Lists for the above definition can be created in GHCi, using the following commands:

ghci> Empty
Empty
ghci> (Cons 3 (Cons 2 (Cons 1 Empty)))
(Cons 3 (Cons 2 (Cons 1 Empty)))

2015 haskell

Introduction to Haskell - More Functions

June 22, 2015

[Published in Open Source For You (OSFY) magazine, September 2014 edition.]

In the third article in the series, we will focus on more Haskell functions, conditional constructs and their usage.

A function in Haskell has the function name followed by arguments. An infix operator function has operands on either side of it. A simple infix add operation is shown below:

*Main> 3 + 5
8

If you wish to convert an infix function to a prefix function, it must be enclosed within parenthesis:

*Main> (+) 3 5
8

Similarily, if you wish to convert a prefix function into an infix function, you must enclose the function name within backquotes(`). The elem function takes an element and a list, and returns true if the element is a member of the list:

*Main> 3 `elem` [1, 2, 3]
True
*Main> 4 `elem` [1, 2, 3]
False

Functions can also be partially applied in Haskell. A function that subtracts ten from a given number can be defined as:

diffTen :: Integer -> Integer
diffTen = (10 -)

Loading the file in GHCi and passing three as an argument yields:

*Main> diffTen 3
7

Haskell exhibits polymorphism. A type variable in a function is said to be polymorphic if it can take any type. Consider the last function that returns the last element in an array. Its type signature is:

*Main> :t last
last :: [a] -> a

The ‘a’ in the above snippet refers to a type variable and can represent any type. Thus, the last function can operate on a list of integers or characters (string):

*Main> last [1, 2, 3, 4, 5]
5
*Main> last "Hello, World"
'd'

You can use a where clause for local definitions inside a function, as shown in the following example, to compute the area of a circle:

areaOfCircle :: Float -> Float
areaOfCircle radius = pi * radius * radius
  where pi = 3.1415

Loading it in GHCi and computing the area for radius 1 gives:

*Main> areaOfCircle 1
3.1415

You can also use the let expression with the in statement to compute the area of a circle:

areaOfCircle :: Float -> Float
areaOfCircle radius = let pi = 3.1415 in pi * radius * radius

Executing the above with input radius 1 gives:

*Main> areaOfCircle 1
3.1415

Indentation is very important in Haskell as it helps in code readability - the compiler will emit errors otherwise. You must make use of white spaces instead of tab when aligning code. If the let and in constructs in a function span multiple lines, they must be aligned vertically as shown below:

compute :: Integer -> Integer -> Integer
compute x y =
    let a = x + 1
        b = y + 2
    in
      a * b

Loading the example with GHCi, you get the following output:

*Main> compute 1 2
8

Similarily, the if and else constructs must be neatly aligned. The else statement is mandatory in Haskell. For example:

sign :: Integer -> String
sign x =
    if x > 0 
    then "Positive"
    else
        if x < 0
        then "Negative"
        else "Zero"

Running the example with GHCi, you get:

*Main> sign 0
"Zero"
*Main> sign 1
"Positive"
*Main> sign (-1)
"Negative"

The case construct can be used for pattern matching against possible expression values. It needs to be combined with the of keyword. The different values need to be aligned and the resulting action must be specified after the ’->’ symbol for every case. For example:

sign :: Integer -> String
sign x =
    case compare x 0 of
      LT -> "Negative"
      GT -> "Positive"
      EQ -> "Zero"

The compare function compares two arguments and returns LT if the first argument is lesser than the second, GT if the first argument is greater than the second, and EQ if both are equal. Executing the above example, you get:

*Main> sign 2
"Positive"
*Main> sign 0
"Zero"
*Main> sign (-2)
"Negative"

The sign function can also be expressed using guards (‘|’) for readability. The action for a matching case must be specified after the ‘=’ sign. You can use a default guard with the otherwise keyword:

sign :: Integer -> String
sign x
    | x > 0 = "Positive"
    | x < 0 = "Negative"
    | otherwise = "Zero"

The guards have to be neatly aligned:

*Main> sign 0
"Zero"
*Main> sign 3
"Positive"
*Main> sign (-3)
"Negative"

There are three very important higher order functions in Haskell — map, filter, and fold.

The map function takes a function and a list, and applies the function to each and every element of the list. Its type signature is:

*Main> :t map
map :: (a -> b) -> [a] -> [b]

The first function argument accepts an element of type ‘a’ and returns an element of type ‘b’. An example on adding two to every element in a list can be implemented using map:

*Main> map (+ 2) [1, 2, 3, 4, 5]
[3,4,5,6,7]

The filter function accepts a predicate function for evaluation, and a list, and returns the list with those elements that satisfy the predicate. For example:

*Main> filter (> 0) [-2, -1, 0, 1, 2]
[1,2]

Its type signature is:

filter :: (a -> Bool) -> [a] -> [a]

The predicate function for filter takes as its first argument an element of type ‘a’ and returns True or False.

The fold function performs cumulative operation on a list. It takes as arguments a function, an accumulator (starting with an initial value) and a list. It cumulatively aggregates the computation of the function on the accumulator value as well as each member of the list. There are two types of folds — left and right fold.

*Main> foldl (+) 0 [1, 2, 3, 4, 5]
15
*Main> foldr (+) 0 [1, 2, 3, 4, 5]
15

Their type signatures are, respectively:

*Main> :t foldl
foldl :: (a -> b -> a) -> a -> [b] -> a
*Main> :t foldr
foldr :: (a -> b -> b) -> b -> [a] -> b

The way the fold is evaluated among the two types is different and is demonstrated below:

*Main> foldl (+) 0 [1, 2, 3]
6
*Main> foldl (+) 1 [2, 3]
6
*Main> foldl (+) 3 [3]
6

It can be represented as ‘f (f (f a b1) b2) b3’ where ‘f’ is the function, ‘a’ is the accumulator value, and ‘b1’, ‘b2’ and ‘b3’ are the elements of the list. The parenthesis is accumulated on the left for a left fold. The computation looks like:

*Main> (+) ((+) ((+) 0 1) 2) 3
6
*Main> (+) 0 1
1
*Main> (+) ((+) 0 1) 2
3
*Main> (+) ((+) ((+) 0 1) 2) 3
6

With the recursion, the expression is constructed and evaluated only when the expression is finally formed. It can thus cause stack overflow or never complete when working with infinite lists. The foldr evaluation looks like this:

*Main> foldr (+) 0 [1, 2, 3]
6
*Main> foldr (+) 0 [1, 2] + 3
6
*Main> foldr (+) 0 [1] + 2 + 3
6

It can be represented as ‘f b1 (f b2 (f b3 a))’ where ‘f’ is the function, ‘a’ is the accumulator value, and ‘b1’, ‘b2’ and ‘b3’ are the elements of the list. The computation looks like:

*Main> (+) 1 ((+) 2 ((+) 3 0)) 
6
*Main> (+) 3 0
3
*Main> (+) 2 ((+) 3 0)
5
*Main> (+) 1 ((+) 2 ((+) 3 0))
6

There are some statements like condition checking where ‘f b1’ can be computed even without requiring the subsequent arguments, and hence the foldr function can work with infinite lists. There is also a strict version of foldl (foldl’) that forces the computation before proceeding with the recursion.

If you want a reference to a matched pattern, you can use the as pattern syntax. The tail function accepts an input list and returns everything except the head of the list. You can write a tailString function that accepts a string as input and returns the string with the first character removed:

tailString :: String -> String
tailString "" = ""
tailString input@(x:xs) = "Tail of " ++ input ++ " is " ++ xs

The entire matched pattern is represented by input in the above code snippet.

Functions can be chained to create other functions. This is called as ‘composing’ functions. The mathematical definition is as under:

(f o g)(x) = f(g(x))

This dot (.) operator has the highest precedence and is left-associative. If you want to force an evaluation, you can use the function application operator ($) that has the second highest precedence and is right-associative. For example:

*Main>  (reverse ((++) "yrruC " (unwords ["skoorB", "lleksaH"])))
"Haskell Brooks Curry"

You can rewrite the above using the function application operator that is right-associative:

Prelude> reverse $ (++) "yrruC " $ unwords ["skoorB", "lleksaH"]
"Haskell Brooks Curry"

You can also use the dot notation to make it even more readable, but the final argument needs to be evaluated first; hence, you need to use the function application operator for it:

*Main> reverse . (++) "yrruC " . unwords $ ["skoorB", "lleksaH"]
"Haskell Brooks Curry"

2015 haskell

Haskell Functions

May 25, 2015

[Published in Open Source For You (OSFY) magazine, August 2014 edition.]

This second article in the series on Haskell explores a few functions.

Consider the function sumInt to compute the sum of two integers. It is defined as:

sumInt :: Int -> Int -> Int
sumInt x y = x + y

The first line is the type signature where the function name, arguments and return types are separated using a double colon (::). The arguments and the return types are separated by the symbol (->). Thus, the above type signature tells us that the sum function takes two arguments of type Int and returns an Int. Note that the function names must always begin with the letters of the alphabet in lower case. The names are usually written in CamelCase style.

You can create a Sum.hs Haskell source file using your favourite text editor, and load the file on to the Glasgow Haskell Compiler interpreter (GHCi) using the following code:

$ ghci
GHCi, version 7.6.3: http://www.haskell.org/ghc/  :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.

Prelude> :l Sum.hs
[1 of 1] Compiling Main             ( Sum.hs, interpreted )
Ok, modules loaded: Main.

*Main> :t sumInt
sumInt :: Int -> Int -> Int

*Main> sumInt 2 3
5

If we check the type of sumInt with arguments, we get:

*Main> :t sumInt 2 3
sumInt 2 3 :: Int

*Main> :t sumInt 2
sumInt 2 :: Int -> Int

The value of sumInt 2 3 is an Int as defined in the type signature. We can also partially apply the function sumInt with one argument and its return type will be Int -> Int. In other words, sumInt 2 takes an integer and will return an integer with 2 added to it.

Every function in Haskell takes only one argument. So, we can think of the sumInt function as one that takes an argument and returns a function that takes another argument and computes their sum. This return function can be defined as a sumTwoInt function that adds a 2 to an Int using the sumInt function, as shown below:

sumTwoInt :: Int -> Int
sumTwoInt x = sumInt 2 x

The ‘=’ sign in Haskell signifies a definition and not a variable assignment as seen in imperative programming languages. We can thus omit the ‘x’ on either side and the code becomes even more concise:

sumTwoInt :: Int -> Int
sumTwoInt = sumInt 2

By loading Sum.hs again in the GHCi prompt, we get the following:

*Main> :l Sum.hs
[1 of 1] Compiling Main             ( Sum.hs, interpreted )
Ok, modules loaded: Main.

*Main> :t sumTwoInt
sumTwoInt :: Int -> Int

*Main> sumTwoInt 3
5

Let us look at some examples of functions that operate on lists. Consider list ‘a’ which is defined as [1, 2, 3, 4, 5] (a list of integers) in the Sum.hs file (re-load the file in GHCi before trying the list functions).

a :: [Int]
a = [1, 2, 3, 4, 5]

The head function returns the first element of a list:

*Main> head a
1

*Main> :t head
head :: [a] -> a

The tail function returns everything except the first element from a list:

*Main> tail a
[2,3,4,5]

*Main> :t tail
tail :: [a] -> [a]

The last function returns the last element of a list:

*Main> last a
5

*Main> :t last
last :: [a] -> a

The init function returns everything except the last element of a list:

*Main> init a
[1,2,3,4]

*Main> :t init
init :: [a] -> [a]

The length function returns the length of a list:

*Main> length a
5

*Main> :t length
length :: [a] -> Int

The take function picks the first ‘n’ elements from a list:

*Main> take 3 a
[1,2,3]

*Main> :t take
take :: Int -> [a] -> [a]

The drop function drops ‘n’ elements from the beginning of a list, and returns the rest:

*Main> drop 3 a
[4,5]

*Main> :t drop
drop :: Int -> [a] -> [a]

The zip function takes two lists and creates a new list of tuples with the respective pairs from each list. For example:

*Main> let b = ["one", "two", "three", "four", "five"]

*Main> zip a b
[(1,"one"),(2,"two"),(3,"three"),(4,"four"),(5,"five")]

*Main> :t zip
zip :: [a] -> [b] -> [(a, b)]

The let expression defines the value of ‘b’ in the GHCi prompt. You can also define it in a way that’s similar to the definition of the list ‘a’ in the source file.

The lines function takes input text and splits it at newlines:

*Main> let sentence = "First\nSecond\nThird\nFourth\nFifth"

*Main> lines sentence
["First","Second","Third","Fourth","Fifth"]

*Main> :t lines
lines :: String -> [String]

The words function takes input text and splits it on white space:

*Main> words "hello world"
["hello","world"]

*Main> :t words
words :: String -> [String]

The map function takes a function and a list and applies the function to every element in the list:

*Main> map sumTwoInt a
[3,4,5,6,7]

*Main> :t map
map :: (a -> b) -> [a] -> [b]

The first argument to map is a function which is enclosed within parenthesis in the type signature (a -> b). This function takes an input of type ‘a’ and returns an element of type ‘b’. Thus, when operating over a list [a], it returns a list of type [b].

Recursion provides a means of looping in functional programming languages. The factorial of a number, for example, can be computed in Haskell, using the following code:

factorial :: Int -> Int
factorial 0 = 1
factorial n = n * factorial (n-1)

The definition of factorial with different input use cases is called as pattern matching on the function. On running the above example with GHCi, you get:

*Main> factorial 0
1
*Main> factorial 1
1
*Main> factorial 2
2
*Main> factorial 3
6
*Main> factorial 4
24
*Main> factorial 5
120

Functions operating on lists can also be called recursively. To compute the sum of a list of integers, you can write the sumList function as:

sumList :: [Int] -> Int
sumList [] = 0
sumList (x:xs) = x + sumList xs

The notation *(x:xs) represents a list, where ‘x’ is the first element in the list, and ‘xs’ is the rest of the list. On running sumList with GHCi, you get the following:

*Main> sumList []
0
*Main> sumList [1,2,3]
6

Sometimes, you will need a temporary function for a computation, which you will not need to use elsewhere. You can then write an anonymous function. A function to increment an input value can be defined as:

*Main> (\x -> x + 1) 3
4

Such functions are called as Lambda functions, and the ‘\’ represents the notation for the symbol Lambda. Another example is given below:

*Main> map (\x -> x * x) [1, 2, 3, 4, 5]
[1,4,9,16,25]

It is a good practice to write the type signature of the function first when composing programs, and then write the body of the function. Haskell is a functional programming language, and understanding the use of functions is very important.

2015 haskell

Introduction to Haskell

May 10, 2015

[Published in Open Source For You (OSFY) magazine, July 2014 edition.]

Haskell, a free and open source programming language, is the outcome of 20 years of research. It has all the advantages of functional programming and an intuitive syntax based on mathematical notation. This article flags off a series in which we will explore Haskell at length.

Haskell is a statically typed, general purpose programming language. Code written in Haskell can be compiled and also used with an interpreter. The static typing helps detect plenty of compile time bugs. The type system in Haskell is very powerful and can automatically infer types. Functions are treated as first-class citizens and you can pass them around as arguments. It is a pure functional language and employs lazy evaluation. It also supports procedural and strict evaluation similar to other programming paradigms.

Haskell code is known for its brevity and is very concise. The latest language standard is Haskell 2010. The language supports many extensions, and has been gaining wide-spread interest in the industry due to its capability to run algorithms on multi-core systems. It has support for concurrency because of the use of software transactional memory. Haskell allows you to quickly create prototypes with its platform and tools. Hoogle and Hayoo API search engines are available to query and browse the list of Haskell packages and libraries. The entire set of Haskell packages are available in Hackage.

The Haskell Platform contains all the software required to get you started on it. On GNU/Linux, you can use your distribution package manager to install the same. On Fedora, for example, you can use the following command:

# yum install haskell-platform

On Ubuntu, you can use the following:

# apt-get install haskell-platform

On Windows, you can download and run HaskellPlatform-2013.2.0.0-setup.exe from the Haskell platform web site and follow the instructions for installation.

For Mac OS X, download either the 32-bit or 64-bit .pkg file, and click on either to proceed with the installation.

The most popular Haskell interpreter is the Glasgow Haskell Compiler (GHC). To use its interpreter, you can run ghci from the command prompt on your system:

$ ghci
GHCi, version 7.6.3: http://www.haskell.org/ghc/  :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Prelude>

The Prelude prompt indicates that the basic Haskell library modules have been imported for your use.

To exit from GHCi, type :quit in the Prelude prompt:

Prelude> :quit
Leaving GHCi.

The basic data types used in Haskell are discussed below.

A Char data type is for a Unicode character. You can view the type using the command :type at the GHCi prompt:

Prelude> :type 's'
's' :: Char

The ’::’ symbol is used to separate the expression on the left with the data type on the right.

A Bool data type represents a logical value of either True or False.

Prelude> :type True
True :: Bool

Signed numbers with a fixed width are represented by the Int data type. The Integer type is used for signed numbers that do not have a fixed width.

Prelude> 5
5

The Double and Float types are used to represent decimal values. The Double type has better precision for floating point numbers:

Prelude> 3.0
3.0

The basic data types can be combined to form composite types. There are two widely used composite types in Haskell, namely, lists and tuples. A list is a collection of elements of the same data type enclosed within square parenthesis. A list of characters is shown below:

Prelude> :type ['a', 'b', 'c']
['a', 'b', 'c'] :: [Char]

The static typing in Haskell produces errors during compile or load time (in GHCi) when you mix data types inside a list. For example:

Prelude> ['a', 1, 2]

<interactive>:7:7:
    No instance for (Num Char) arising from the literal `1'
    Possible fix: add an instance declaration for (Num Char)
    In the expression: 1
    In the expression: ['a', 1, 2]
    In an equation for `it': it = ['a', 1, 2]

You can have a list of lists as long as they contain the same data type:

Prelude> :type [['a'], ['b', 'c']]
[['a'], ['b', 'c']] :: [[Char]]

A tuple is an ordered list of elements with a fixed size, enclosed within parenthesis, where each element can be of a different data type. For example:

Prelude> :type ('t', True)
('t', True) :: (Char, Bool)

Note that the tuple with type (Char, Bool) is different from the tuple with type (Bool, Char).

Prelude> :t (False, 'f')
(False, 'f') :: (Bool, Char)

Haskell originates from the theory of Lambda calculus, which was developed by Alonzo Church to formally study mathematics. In 1958, John McCarthy created Lisp, that relates programming with Lambda calculus. Robin Milner created a functional programming language called ML (meta language) for automated proofs of mathematical theorems in 1970. During the 1980s, there were a number of lazy functional programming languages scattered across the research community. Miranda was a very popular proprietary programming language released by Research Software Ltd in 1985.

A need arose to unify the different research developments, for which a committee was formed and the first version of the standard was released in 1990. It was called Haskell 1.0, after the mathematician and logician, Haskell Brooks Curry. Subsequently, there were four revisions made - 1.1, 1.2, 1.3 and 1.4. In 1997, the Haskell 98 report was released. In 2009, the Haskell 2010 standard was published and is the latest standard as on date. It has Foreign Function Interface (FFI) bindings to interface with other programming languages. The Hugs interpreter is useful for teaching, while the Glasgow Haskell Compiler (GHC) is very popular. The paper by John Hughes on “Why Functional Programming matters?” in as excellent paper to read. A number of software companies in the industry have begun to use Haskell in production systems.

We shall be exploring more features, constructs and use of the language in future articles.

References

[1] Haskell. http://haskell.org/

[2] Haskell 2010. http://www.haskell.org/haskellwiki/Haskell_2010

[3] Hoogle. http://www.haskell.org/hoogle/

[4] Hayoo. http://holumbus.fh-wedel.de/hayoo/hayoo.html

[5] Hackage. http://hackage.haskell.org/

[6] Haskell Platform. http://www.haskell.org/platform/

[7] Glasgow Haskell Compiler. http://www.haskell.org/ghc/

[8] Alonzo Church. http://www-groups.dcs.st-and.ac.uk/history/Mathematicians/Church.html

[9] John McCarthy. http://www-formal.stanford.edu/jmc/

[10] Lisp. http://en.wikipedia.org/wiki/Lisp_%28programming_language%29

[11] Robin Milner. http://www.cl.cam.ac.uk/archive/rm135/

[12] Miranda. http://miranda.org.uk/

[13] Haskell 1.0. http://www.haskell.org/definition/haskell-report-1.0.ps.gz

[14] Haskell Brooks Curry. http://www-history.mcs.st-andrews.ac.uk/Biographies/Curry.html

[15] Hugs. http://www.haskell.org/hugs/

[16] “Why Functional Programming matters?” http://www.cse.chalmers.se/~rjmh/Papers/whyfp.html

[17] Why functional programming? Why Haskell?. http://book.realworldhaskell.org/read/why-functional-programming-why-haskell.html

2015 haskell

PostgreSQL Quick Reference

April 7, 2015

Command	Functionality
$ sudo /etc/init.d/postgresql start	Start server (Ubuntu)
$ psql -U postgres	Connect
postgres=# \l	Show databases
postgres=# \h	Help
postgres=# CREATE DATABASE jerry;	Create database
postgres=# DROP DATABASE jerry;	Delete database
postgres=# SET search_path TO schema;	Use schema
$ psql -U postgres -d	Use database
postgres=# \c test	Change database
postgres=# \du	List users
postgres=# \d	List tables
postgres=# CREATE SCHEMA sausalito;	Create schema
postgres=# \dn	List schema
postgres=# DROP SCHEMA sausalito;	Drop schema
postgres=# SELECT * FROM sausalito.employees;	Select rows
postgres=# CREATE TABLE sausalito.employees (id INT);	Create table
postgres=# INSERT INTO sausalito.employees VALUES (1);	Insert record
postgres=# UPDATE sausalito.employees SET id = 4 WHERE id = 2;	Update table record
postgres=# DELETE FROM sausalito.employees WHERE id = 3;	Delete record
postgres=# DROP TABLE sausalito.employees;	Drop table
postgres=# \q	Quit from session

2015 foss documentation

Installation of NixOS 14.12

March 8, 2015

Boot from LiveCD ( nixos-graphical-14.12.140.0dbc415-x86_64-linux.iso ), with 40 GB virtual disk, and login as root (no password required).

nixos login: root

[root@nixos:~]#

Start the KDE environment using the following command:

[root@nixos:~]# start display-manager

You can then add the English Dvorak layout (optional) by selecting ‘System Settings’ -> ‘Input Devices’ -> ‘Keyboard settings’ -> ‘Layouts’ -> ‘Configure layouts’ -> ‘Add’ and use the label (dvo) for the new layout. Check that networking works as shown below:

[root@nixos:~]# ifconfig

[root@nixos:~]# ping -c3 www.google.com

You can now partition the disk using ‘fdisk /dev/sda’ and create two partitions (39 GB /dev/sda1 and swap on /dev/sda2). Create the filesystems, and turn on swap using the following commands:

# mkfs.ext4 -L nixos /dev/sda1

# mkswap -L swap /dev/sda2
# swapon /dev/sda2

Generate a basic system configuration file with nixos-generate-config:

# mount /dev/disk/by-label/nixos /mnt

# nixos-generate-config --root /mnt

Update /mnt/etc/nixos/configuration.nix with new packages that you need as illustrated below:

# Edit this configuration file to define what should be installed on
# your system.  Help is available in the configuration.nix(5) man page
# and in the NixOS manual (accessible by running ‘nixos-help’).

{ config, pkgs, ... }:

{
  imports =
    [ # Include the results of the hardware scan.
      ./hardware-configuration.nix
    ];

  # Use the GRUB 2 boot loader.
  boot.loader.grub.enable = true;
  boot.loader.grub.version = 2;
  # Define on which hard drive you want to install Grub.
  boot.loader.grub.device = "/dev/sda";

  # networking.hostName = "nixos"; # Define your hostname.
  networking.hostId = "56db3cd3";
  # networking.wireless.enable = true;  # Enables wireless.

  # Select internationalisation properties.
  # i18n = {
  #   consoleFont = "lat9w-16";
  #   consoleKeyMap = "us";
  #   defaultLocale = "en_US.UTF-8";
  # };

  # List packages installed in system profile. To search by name, run:
  # $ nix-env -qaP | grep wget
  environment.systemPackages = with pkgs; [
    wget emacs24 git python gnuplot notmuch
    haskellPackages.pandoc

    # Installing texlive is slow and incomplete on NixOS
    # (pkgs.texLiveAggregationFun { paths = [ pkgs.texLive pkgs.texLiveExtra pkgs.texLiveBeamer ]; })    
  ];
  # texLive tetex lmodern
  
  # List services that you want to enable:

  # Enable the OpenSSH daemon.
  services.openssh.enable = true;

  # Enable CUPS to print documents.
  # services.printing.enable = true;

  # Enable the X11 windowing system.
  services.xserver.enable = true;
  services.xserver.layout = "us";
  # services.xserver.xkbOptions = "eurosign:e";

  # Enable the KDE Desktop Environment.
  services.xserver.displayManager.kdm.enable = true;
  services.xserver.desktopManager.kde4.enable = true;


  # Define a user account. Don't forget to set a password with ‘passwd’.
  users.extraUsers.apollo = {
    home = "/home/apollo";
    extraGroups = [ "wheel" ];
    useDefaultShell = true;
    isNormalUser = true;
    uid = 1000;
  };

}

Install NixOS to hard disk:

# nixos-install

setting root password...
Enter new UNIX password: ***
Retype new UNIX password: ***

passwd: password updated successfully
installation finished!

You can now reboot into the system:

# reboot

After you login to the console, set a password for the ‘apollo’ user. A screenshot of the desktop is shown below:

2015 foss install

HDL Complexity Tool

February 7, 2015

[Published in Electronics For You (EFY) magazine, June 2014 edition.] Source

HCT stands for HDL Complexity Tool, where HDL stands for Hardware Description Language. HCT provides scores that represent the complexity of modules present in integrated circuit (IC) designs. It is written in Perl and released under the GPLv3 and LGPLv3 license. It employs McCabe Cyclomatic Complexity that uses the control flow graph of the program source code to determine the complexity.

There are various factors for measuring the complexity of HDL models such as size, nesting, modularity, and timing. The measured metrics can help designers in refactoring their code, and also help managers to plan project schedules, and allocate resources, accordingly. You can run the tool from the GNU/Linux terminal for Verilog, VHDL, and CDL (Computer Design Language) files or directory sources. HCT can be installed on Fedora using the command:

$ sudo yum install hct

After installation, consider the example project of uart2spi written in Verilog, which is included in this month’s EFY DVD. It implements a simple core for a UART interface, and an internal SPI bus. The uart2spi folder contains rtl/spi under the file directory in your PC: /home/guest/uart2spi/trunk/rtl/spi. Run the HCT tool on the rtl/spi Verilog sources as follows:

$ hct rtl/spi

We get the output:

Directory: /home/guest/uart2spi/trunk/rtl/spi

verilog, 4 file(s)
+--------------------+--------------+------+-------+----------+--------+
| FILENAME           | MODULE       | IO   | NET   | MCCABE   | TIME   |
+--------------------+--------------+------+-------+----------+--------+
| spi_ctl.v                           20     1       1          0.1724 |
|                      spi_ctl        20     1       1                 |
+----------------------------------------------------------------------+
| spi_core.v                          0      0       1          0.0076 |
|                      spi_core       0      0       1                 |
+----------------------------------------------------------------------+
| spi_cfg.v                           0      0       1          0.0076 |
|                      spi_cfg        0      0       1                 |
+----------------------------------------------------------------------+
| spi_if.v                            15     3       1          0.0994 |
|                      spi_if         15     3       1                 |
+----------------------------------------------------------------------+

The output includes various attributes that are described below:

FILENAME is the file that is being parsed. The parser uses the file name extension to recognize the programming language.
MODULE refers to the specific module present in the file. A file can contain many modules.
IO refers to the input/output registers used in the module.
NET includes the network entities declared in the given module. For Verilog, it can be ‘wire’, ‘tri’, ‘supply0’ etc.
MCCABE provides the McCabe Cyclomatic Complexity of the module or file.
TIME refers to the time taken to process the file.

A specific metric can be excluded from the output using the “–output-exclude=LIST” option. For example, type the following command on a GNU/Linux terminal:

$ hct --output-exclude=TIME rtl/spi

The output will be;

Directory: /home/guest/uart2spi/trunk/rtl/spi

verilog, 4 file(s)
+----------------------+----------------+--------+---------+-----------+
| FILENAME             | MODULE         | IO     | NET     | MCCABE    |
+----------------------+----------------+--------+---------+-----------+
| spi_ctl.v                               20       1         1         |
|                        spi_ctl          20       1         1         |
+----------------------------------------------------------------------+
| spi_core.v                              0        0         1         |
|                        spi_core         0        0         1         |
+----------------------------------------------------------------------+
| spi_cfg.v                               0        0         1         |
|                        spi_cfg          0        0         1         |
+----------------------------------------------------------------------+
| spi_if.v                                15       3         1         |
|                        spi_if           15       3         1         |
+----------------------------------------------------------------------+

If you want only the score to be listed, you can remove the MODULE listing with the “–output-no-modules” option:

$ hct --output-no-modules rtl/spi

Directory: /home/guest/uart2spi/trunk/rtl/spi

verilog, 4 file(s)
+-----------------------+---------+----------+-------------+-----------+
| FILENAME              | IO      | NET      | MCCABE      | TIME      |
+-----------------------+---------+----------+-------------+-----------+
| spi_ctl.v               20        1          1             0.16803   |
+----------------------------------------------------------------------+
| spi_core.v              0         0          1             0.007434  |
+----------------------------------------------------------------------+
| spi_cfg.v               0         0          1             0.00755   |
+----------------------------------------------------------------------+
| spi_if.v                15        3          1             0.097721  |
+----------------------------------------------------------------------+

The tool can be run on individual files, or recursively on subdirectories with the “-R” option. The output the entire uart2spi project sources is given below:

$ hct -R rtl

Directory: /home/guest/uart2spi/trunk/rtl/uart_core

verilog, 4 file(s)
+--------------------+--------------+------+-------+----------+--------+
| FILENAME           | MODULE       | IO   | NET   | MCCABE   | TIME   |
+--------------------+--------------+------+-------+----------+--------+
| uart_rxfsm.v                        10     0       1          0.1379 |
|                      uart_rxfsm     10     0       1                 |
+----------------------------------------------------------------------+
| clk_ctl.v                           0      0       1          0.0146 |
|                      clk_ctl        0      0       1                 |
+----------------------------------------------------------------------+
| uart_core.v                         18     1       1          0.1291 |
|                      uart_core      18     1       1                 |
+----------------------------------------------------------------------+
| uart_txfsm.v                        9      0       1          0.1129 |
|                      uart_txfsm     9      0       1                 |
+----------------------------------------------------------------------+

Directory: /home/guest/uart2spi/trunk/rtl/top

verilog, 1 file(s)
+--------------------+--------------+------+-------+----------+--------+
| FILENAME           | MODULE       | IO   | NET   | MCCABE   | TIME   |
+--------------------+--------------+------+-------+----------+--------+
| top.v                               16     0       1          0.0827 |
|                      top            16     0       1                 |
+----------------------------------------------------------------------+

Directory: /home/guest/uart2spi/trunk/rtl/spi

verilog, 4 file(s)
+--------------------+--------------+------+-------+----------+--------+
| FILENAME           | MODULE       | IO   | NET   | MCCABE   | TIME   |
+--------------------+--------------+------+-------+----------+--------+
| spi_ctl.v                           20     1       1          0.1645 |
|                      spi_ctl        20     1       1                 |
+----------------------------------------------------------------------+
| spi_core.v                          0      0       1          0.0074 |
|                      spi_core       0      0       1                 |
+----------------------------------------------------------------------+
| spi_cfg.v                           0      0       1          0.0073 |
|                      spi_cfg        0      0       1                 |
+----------------------------------------------------------------------+
| spi_if.v                            15     3       1          0.0983 |
|                      spi_if         15     3       1                 |
+----------------------------------------------------------------------+

Directory: /home/guest/uart2spi/trunk/rtl/lib

verilog, 1 file(s)
+--------------------+--------------+------+-------+----------+--------+
| FILENAME           | MODULE       | IO   | NET   | MCCABE   | TIME   |
+--------------------+--------------+------+-------+----------+--------+
| registers.v                         5      0       1          0.0382 |
|                      bit_register   5      0       1                 |
+----------------------------------------------------------------------+

Directory: /home/guest/uart2spi/trunk/rtl/msg_hand

verilog, 1 file(s)
+--------------------+--------------+------+-------+----------+--------+
| FILENAME           | MODULE       | IO   | NET   | MCCABE   | TIME   |
+--------------------+--------------+------+-------+----------+--------+
| uart_msg_handler.v                  0      0       1          0.0192 |
|                      uart_m~ndler   0      0       1                 |
+----------------------------------------------------------------------+

The default behaviour is to dump the output to the terminal. It can be redirected to a file with the “–output-file=FILE” option. You can also specify an output file format, such as “csv” with the “–output-format=FORMAT” option:

$ hct --output-file=/home/guest/project-metrics.csv --output-format=csv rtl/spi 

$ cat /home/guest/project-metrics.csv

Directory: /home/guest/uart2spi/trunk/rtl/spi

verilog, 4 file(s)

 FILENAME    , MODULE    , IO   , NET  , MCCABE  , SLOC  , COMMENT_LINES  , TIME
 spi_ctl.v   ,           , 20   , 1    , 1       , 110   , 48             , 0.1644
             , spi_ctl   , 20   , 1    , 1       , 68    , 6              ,
 spi_core.v  ,           , 0    , 0    , 1       , 46    , 43             , 0.0073
             , spi_core  , 0    , 0    , 1       , 4     , 1              ,
 spi_cfg.v   ,           , 0    , 0    , 1       , 46    , 43             , 0.0075
             , spi_cfg   , 0    , 0    , 1       , 4     , 1              ,
 spi_if.v    ,           , 15   , 3    , 1       , 80    , 44             , 0.0948
             , spi_if    , 15   , 3    , 1       , 38    , 2              ,

There are various yyparse options that are helpful to understand the lexical parsing of the source code. They can be invoked using the following command:

$ hct --yydebug=NN sources

The NN options and their meaning is listed below:

0x01	Lexical tokens
0x02	Information on States
0x04	Shift, reduce, accept driver actions
0x08	Dump of the parse stack
0x16	Tracing for error recovery
0x31	Complete output for debugging

HCT can also be used with VHDL, and Cyclicity CDL (Cycle Description Language) programs. For VHDL, the filenames must end with a .vhdl extension. You can rename .vhd files recursively in a directory (in Bash, for example) using the following script:

for file in `find $1 -name "*.vhd"`
do
  mv $file ${file/.vhd/.vhdl}
done

The “$1” refers to the project source directory that is passed as an argument to the script. Let us take the example of sha256 core written in VHDL, which is also included in this month’s EFY DVD. The execution of HCT on the sha256core project is as follows:

 $  hct rtl

Directory: /home/guest/sha256core/trunk/rtl

vhdl, 6 file(s)
+--------------------+--------------+------+-------+----------+--------+
| FILENAME           | MODULE       | IO   | NET   | MCCABE   | TIME   |
+--------------------+--------------+------+-------+----------+--------+
| sha_256.vhdl                        29     0       1          0.9847 |
|                      sha_256        29     0       1                 |
+----------------------------------------------------------------------+
| sha_fun.vhdl                        1      1       1          0.3422 |
|                                     1      1       1                 |
+----------------------------------------------------------------------+
| msg_comp.vhdl                       20     0       1          0.4169 |
|                      msg_comp       20     0       1                 |
+----------------------------------------------------------------------+
| dual_mem.vhdl                       7      0       3          0.0832 |
|                      dual_mem       7      0       3                 |
+----------------------------------------------------------------------+
| ff_bank.vhdl                        3      0       2          0.0260 |
|                      ff_bank        3      0       2                 |
+----------------------------------------------------------------------+
| sh_reg.vhdl                         19     0       1          0.6189 |
|                      sh_reg         19     0       1                 |
+----------------------------------------------------------------------+

The “-T” option enables the use of threads to speed up computation. The LZRW1 (Lempel–Ziv Ross Williams) compressor core project implements a lossless data compression algorithm. The output of HCT on this project, without threading and with threads enabled, is shown below:

$ time hct HDL

Directory: /home/guest/lzrw1-compressor-core/trunk/hw/HDL

vhdl, 8 file(s)
...
real	0m3.725s
user	0m3.612s
sys     0m0.013s

$ time hct HDL -T

Directory: /home/guest/lzrw1-compressor-core/trunk/hw/HDL

vhdl, 8 file(s)
...
real	0m2.301s
user	0m7.029s
sys     0m0.051s

The supported input options for HCT can be viewed with the “-h” option.

The invocation of HCT can be automated, rechecked for each code check-in that happens to a project repository. The complexity measure is thus recorded periodically. The project team will then be able to monitor, analyse the complexity of each module and decide on any code refactoring strategies.

2015 foss

GNU Unified Parallel C

January 20, 2015

[Published in Open Source For You (OSFY) magazine, May 2014 edition.]

This article guides readers through the installation of GNU Unified Parallel C, which is designed for high performance computing on large scale parallel machines.

GNU Unified Parallel C is an extension to the GNU C compiler (GCC), which supports execution of Unified Parallel C (UPC) programs. UPC uses the Partitioned Global Address Space (PGAS) model for its implementation. The current version of UPC is 1.2, and a 1.3 draft specification is available. GNU UPC is released under the GPL license, while, the UPC specification is released under the new BSD license. To install it on Fedora, you need to first install the gupc repository:

$ sudo yum install http://www.gccupc.org/pub/pkg/rpms/gupc-fedora-18-1.noarch.rpm

You can then install the gupc RPM using the following command:

$ sudo yum install gupc-gcc-upc

The installation directory is /usr/local/gupc. You will also require the numactl (library for tuning Non-Uniform Memory Access machines) development packages:

$ sudo yum install numactl-devel numactl-libs

To add the installation directory to your environment, install the environment-modules package:

$ sudo yum install environment-modules

You can then load the gupc module with:

# module load gupc-x86_64

Consider the following simple ‘hello world’ example:

#include <stdio.h>

int main()
{
   printf("Hello World\n");
   return 0;
}

You can compile it using:

# gupc hello.c -o hello

Then run it with:

# ./hello -fupc-threads-5

Hello World
Hello World
Hello World
Hello World
Hello World

The argument -fupc-threads-N specifies the number of threads to be run. The program can also be executed using:

# ./hello -n 5

The gupc compiler provides a number of compile and run-time options. The ’-v’ option produces a verbose output of the compilation steps. It also gives information on GNU UPC. An example of such an output is shown below:

# gupc hello.c -o hello -v

Driving: gupc -x upc hello.c -o hello -v -fupc-link
Using built-in specs.
COLLECT_GCC=gupc
COLLECT_LTO_WRAPPER=/usr/local/gupc/libexec/gcc/x86_64-redhat-linux/4.8.0/lto-wrapper
Target: x86_64-redhat-linux
Configured with: ...
Thread model: posix
gcc version 4.8.0 20130311 (GNU UPC 4.8.0-3) (GCC) 
COLLECT_GCC_OPTIONS='-o' 'hello' '-v' '-fupc-link' '-mtune=generic' '-march=x86-64'
...
GNU UPC (GCC) version 4.8.0 20130311 (GNU UPC 4.8.0-3) (x86_64-redhat-linux)
	compiled by GNU C version 4.8.0 20130311 (GNU UPC 4.8.0-3),
        GMP version 5.0.5, MPFR version 3.1.1, MPC version 0.9
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
...
#include "..." search starts here:
#include <...> search starts here:
 /usr/local/gupc/lib/gcc/x86_64-redhat-linux/4.8.0/include
 /usr/local/include
 /usr/local/gupc/include
 /usr/include
End of search list.
GNU UPC (GCC) version 4.8.0 20130311 (GNU UPC 4.8.0-3) (x86_64-redhat-linux)
	compiled by GNU C version 4.8.0 20130311 (GNU UPC 4.8.0-3), 
        GMP version 5.0.5, MPFR version 3.1.1, MPC version 0.9
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: 9db6d080c84dee663b5eb4965bf5012f
COLLECT_GCC_OPTIONS='-o' 'hello' '-v' '-fupc-link' '-mtune=generic' '-march=x86-64'
 as -v --64 -o /tmp/cccSYlmb.o /tmp/ccTdo4Ku.s
...
COLLECT_GCC_OPTIONS='-o' 'hello' '-v' '-fupc-link' '-mtune=generic' '-march=x86-64'
...

The -g option will generate debug information. To output debugging symbol information in DWARF-2 (Debugging With Attributed Record Formats), use the -dwarf-2-upc option. This can be used with GDB-UPC, a GNU debugger that supports UPC.

The -fupc-debug option will also generate filename and the line numbers in the output.

The optimization levels are similar to the ones supported by GCC: ’-O0’, ’-O1’, ’-O2’, and ’-O3’.

Variables that are shared among threads are declared using the ‘shared’ keyword. Examples include:

shared int i;
shared int a[THREADS];
shared char *p;

‘THREADS’ is a reserved keyword that represents the number of threads that will get executed run-time. Consider a simple vector addition example:

#include <upc_relaxed.h>
#include <stdio.h>

shared int a[THREADS];
shared int b[THREADS];
shared int vsum[THREADS];

int
main()
{
  int i;

  /* Initialization */
  for (i=0; i<THREADS; i++) {
    a[i] = i + 1;               /* a[] = {1, 2, 3, 4, 5}; */
    b[i] = THREADS - i;         /* b[] = {5, 4, 3, 2, 1}; */
  }

  /* Computation */
  for (i=0; i<THREADS; i++)
    if (MYTHREAD == i % THREADS)
      vsum[i] = a[i] + b[i];

  upc_barrier;

  /* Output */
  if (MYTHREAD == 0) {
    for (i=0; i<THREADS; i++)
      printf("%d ", vsum[i]);
  }

  return 0;
}

‘MYTHREAD’ indicates the thread that is currently running. upc_barrier is a blocking synchronization primitive that ensures that all threads complete before proceeding further. Only one thread is required to print the output, and THREAD 0 is used for the same. The program can be compiled, and executed using:

# gupc vector_addition.c -o vector_addition
# ./vector_addition -n 5

6 6 6 6 6

The computation loop in the above code can be simplified with the upc_forall statement:

#include <upc_relaxed.h>
#include <stdio.h>

shared int a[THREADS];
shared int b[THREADS];
shared int vsum[THREADS];

int
main()
{
  int i;

  /* Initialization */
  for (i=0; i<THREADS; i++) {
    a[i] = i + 1;               /* a[] = {1, 2, 3, 4, 5}; */
    b[i] = THREADS - i;         /* b[] = {5, 4, 3, 2, 1}; */
  }

  /* Computation */
  upc_forall(i=0; i<THREADS; i++; i)
      vsum[i] = a[i] + b[i];

  upc_barrier;

  if (MYTHREAD == 0) {
    for (i=0; i<THREADS; i++)
      printf("%d ", vsum[i]);
  }

  return 0;
}

The upc_forall construct is similar to a for loop, except, that it accepts a fourth parameter, the affinity field. It indicates the thread on which the computation runs. It can be an integer that is internally represented as integer % THREADS, or it can be an address corresponding to a thread. The program can be compiled and tested with:

# gupc upc_vector_addition.c -o upc_vector_addition
# ./upc_vector_addition -n 5

6 6 6 6 6

The same example can also be implemented using shared pointers:

#include <upc_relaxed.h>
#include <stdio.h>

shared int a[THREADS];
shared int b[THREADS];
shared int vsum[THREADS];

int
main()
{
  int i;
  shared int *p1, *p2;

  p1 = a;
  p2 = b;

  /* Initialization */
  for (i=0; i<THREADS; i++) {
    *(p1 + i) = i + 1;          /* a[] = {1, 2, 3, 4, 5}; */
    *(p2 + i) = THREADS - i;    /* b[] = {5, 4, 3, 2, 1}; */
  }

  /* Computation */
  upc_forall(i=0; i<THREADS; i++, p1++, p2++; i)
      vsum[i] = *p1 + *p2;

  upc_barrier;

  if (MYTHREAD == 0)
	for (i = 0; i < THREADS; i++)
		printf("%d ", vsum[i]);

  return 0;
}

# gupc pointer_vector_addition.c -o pointer_vector_addition
# ./pointer_vector_addition -n 5

6 6 6 6 6

Memory can also be allocated dynamically. The upc_all_alloc function will allocate collective global memory that is shared among threads. A collective function will be invoked by every thread. The upc_global_alloc function will allocate non-collective global memory which will be different for all threads in the shared address space. The upc_alloc function will allocate local memory for a thread. Their respective declarations are as follows:

shared void *upc_all_alloc (size_t nblocks, size_t nbytes);
shared void *upc_global_alloc (size_t nblocks, size_t nbytes);
shared void *upc_alloc (size_t nbytes);

To protect access to shared data, you can use the following synchronization locks:

void upc_lock (upc_lock_t *l)
int upc_lock_attempt (upc_lock_t *l)
void upc_unlock(upc_lock_t *l)

There are two types of barriers for synchronizing code. The upc_barrier construct is blocking. The non-blocking barrier uses upc_notify (non-blocking), and upc_wait (blocking) constructs. For example:

#include <upc_relaxed.h>
#include <stdio.h>

int
main()
{
  int i;

  for (i=0; i<THREADS; i++) {
    upc_notify;

    if (i == MYTHREAD)
      printf("Thread: %d\n", MYTHREAD);

    upc_wait;
  }

  return 0;
}

The corresponding output is shown below:

# gupc count.c -o count
# ./count -n 5

Thread:  0
Thread:  1
Thread:  2
Thread:  3
Thread:  4

You can refer the GUPC user guide for more information.

2015 foss

Drawtiming

December 26, 2014

[Published in Electronics For You (EFY) magazine, January 2014 edition.] Source

Drawtiming is a free/open source software tool that can be used to generate digital circuit timing diagrams. It is released under the GNU General Public License (GPL). The tool accepts text file as an input from the user and generates image output in different file formats such as PNG, JPEG, PostScript, and PDF. The input needs to adhere to a specific syntax to describe the different signals, their inter-relationships, and transitions as described in the next section with examples.

Before you can try the examples, you need to install the drawtiming tool in your GNU/Linux system. Installation steps may be different from system to system. To install this software in Ubuntu 12.10, just click on Ubuntu Software Centre option from desktop and search for Drawtiming to install it.

For a basic input command example to generate a clock for three periods, the syntax is written as:

CLOCK=0.
CLOCK=1.
CLOCK=0.

A period (.) is used to mark the end of each clock cycle. The first clock period lists and sets the inital states of the signals. Save the above three-line syntax as a .txt file. Here, we name the file as ‘basic.txt’ and save it in a folder named as, say, EFYdraw. Note that the syntax is case sensitive. To generate the timing diagram, open EFYdraw from the terminal window using ‘cd’ command and enter the following command against the prompt, and you will get the output diagram as shown in Fig. 1.

$ drawtiming -o basic.jpg basic.txt

Figure 1.

The tick keyword can also indicate a clock cycle. The following syntax creates a single period clock as shown below:

CLOCK=tick.

The output diagram is shown in Fig. 2.

Figure 2.

The order in which the signal commands are listed determines the order in which they appear in the output. Multiple signals are separated by commas as shown below:

CLOCK=tick, FLAG=0.

The output diagram is shown in Fig. 3.

Figure 3.

A change in one signal that triggers a change in another can be expressed using arrow notation (=>). In the following example (Fig. 4), when the address latch enable (ALE) signal changes state, the read signal (RD) changes its state in the subsequent cycle. The syntax is given below:

ALE=tick,    RD=0.
ALE => RD=1, ALE=0.

Figure 4.

If the transition is in the same clock cycle, it can be indicated in the same clock period. For instance, both CLK and CLK2 change in the second clock period:

CLK=0,    CLK2=0.
CLK=tick, CLK2=1.

The output is shown in Fig. 5.

Figure 5.

If there are multiple signal changes, you can separate them with a comma. For example, LED is turned on when both the chip select (CS) and output enable (OE) signals change state; then the syntax is written as:

CS=0, OE=0, LED=Z.
CS=1, OE=1.
CS, OE => LED=ON.

The output is as shown in Fig. 6.

Figure 6.

A semicolon can be used to separate a list of dependency signal changes for the same clock period, which is illustrated in the third clock period (refer Fig. 7) in the following example. The syntax is as given below:

CS=0, OE=0, LED=Z, DATA=Z.
CS=1, OE=1.
CS, OE => LED=ON;
DATA=DATA.

Figure 7.

Drawtiming has numerous command line options which can be seen by using the -h option. The supported output image formats are listed in ImageMagick website. A PDF output of the basic.txt example can be created by specifying the .pdf filename extension as shown below:

$ drawtiming -o basic.pdf basic.txt

Here, the output will be the same as shown in Fig. 1, only file format is changed to .pdf.

The output image can be scaled with the -x option. To generate an image that is twice the default size, you can use the command as given below:

$ drawtiming -x 2 -o basic-twice.jpg basic.txt

The output is as shown in Fig. 8.

Figure 1.

Figure 8.

If you want to be explicit about the pixel width and height, the -p option followed by width ‘x’ height can be specified as given below:

$ drawtiming -p 250x75 -o basic-p.jpg basic.txt

The output is as shown in Fig. 9.

Figure 1.

Figure 9.

The default cell width and height are 64 and 48 pixels, respectively. The -w and -c options are used to modify them:

$ drawtiming -c 100 -o basic-c.jpg basic.txt

The output is as shown in Fig. 10.

Figure 1.

Figure 10.

The line width can be changed using the -l option. For example:

$ drawtiming -l 3 -o basic-l.jpg basic.txt

The output is as shown in Fig. 11.

Figure 1.

Figure 11.

The default font size is 25 points. You can change it with the -f option. You can also specify a different font using the –font option.

$ drawtiming -f 10 -o basic-f.jpg basic.txt

The output is as shown in Fig. 12.

Figure 1.

Figure 12.

Timing delays can be displayed in the output using -tD> construct between any two signals. In a simple I/O write cycle, the minimum time between a valid address and the I/O write (IOW) strobe is shown as 92 units (minimum) as follows:

ADDRESS="VALID ADDRESS", DATA="", IOW=1.
DATA="", IOW=0.
ADDRESS -92min> IOW;
DATA="VALID DATA".
IOW=1.
IOW -516min> DATA;
ADDRESS="", DATA="".

The output is as shown in Fig. 13.

Figure 13.

For a real-world example, let us consider the write cycle of a 68008 microprocessor. During the write cycle, the CPU first asserts the read/write (RW) signal for write operation, followed by the address and data signals. The data strobe (DS) signal is then asserted, and the CPU waits for the receiver to acknowledge the same, as indicated by the data transfer acknowledge (DTACK) signal. The syntax for this ‘write’ cycle is as given below:

CLK=0, ADDRESS=Z, AS=1, RW=X, DS=1, DATA=X, DTACK=1.
CLK=1, RW=1, DATA=Z.
CLK=0, ADDRESS="VALID ADDR".
CLK=1, AS=0, RW=0.
CLK=0, DATA="VALID DATA".
CLK=1, DS=0.
CLK=0, DTACK=0.
CLK=1.
CLK=0, AS=1, DS=1, DTACK=1.
CLK=1, ADDRESS="", RW=1, DATA="".
.

The output is as shown in Fig. 14.

Figure 14.

The 68008 read cycle is similar to the write cycle, except that the read signal in RW is asserted, and the DS signal is asserted one clock period earlier as shown below:

CLK=0, ADDRESS=Z, AS=1, RW=X, DS=1, DATA=X, DTACK=1.
CLK=1, RW=1, DATA=Z.
CLK=0, ADDRESS="VALID ADDR".
CLK=1, AS=0, DS=0.
CLK=0.
CLK=1.
CLK=0, DTACK=0.
CLK=1.
CLK=0, AS=1, DS=1, DATA="DATA", DTACK=1.
CLK=1, ADDRESS="", DATA=Z.
.

The output is as shown in Fig. 15.

Figure 15.

2014 foss

GNU Parallel

November 27, 2014

[Published in Open Source For You (OSFY) magazine, October 2013 edition.]

GNU Parallel is a tool for running jobs in parallel in a Bash environment. The job can be a single command or a script, with variable arguments. The simultaneous execution can occur on remote machines as well. Released under the GPLv3+ license, you can install it on Fedora using the following command:

$ sudo yum install parallel

After installation, you need to remove ’–tollef’ from the /etc/parallel/config file, if it is present. This option will be permanently removed in future releases.

GNU Parallel takes a command and a list of arguments for processing. The arguments are provided in the command line after the notation ’:::’, and the command is executed for each argument. For example:

$ parallel echo ::: alpha beta gamma
alpha
beta
gamma

You can pass multiple arguments to GNU parallel, and it will run the command for every combination of the input, as shown below:

$ parallel echo ::: 0 1 ::: 0 1
0 0
0 1
1 0
1 1

The order in the output may be different. The tool provides a number of replacement string options. The default string ‘{}’ represents the input:

$ parallel echo {} ::: /tmp
/tmp

The replacement string ‘{/}’ removes everything up to and including the last forward slash:

$ parallel echo {/} ::: /tmp/stdio.h
stdio.h

If you want to return the path only, use the ‘{//}’ string:

$ parallel echo {//} ::: /tmp/stdio.h
/tmp

The string ‘{.}’ removes any filename extension:

$ parallel echo {.} ::: /tmp/stdio.h
/tmp/stdio

The output of a GNU Parallel command may not necessarily be in the order in which the input arguments are listed. For example:

$ parallel sleep {}\; echo {} ::: 5 2 1 4 3
1
2
4
3
5

If you wish to enforce the order of execution, use the ’-k’ option, as shown below:

$ parallel -k sleep {}\; echo {} ::: 5 2 1 4 3
5
2
1
4
3

A test script, for example, may need to be run ‘N’ times for the same argument. This can be accomplished with the following code:

$ seq 10 | parallel -n0 echo "Hello, World"
Hello, World
Hello, World
Hello, World
Hello, World
Hello, World
Hello, World
Hello, World
Hello, World
Hello, World
Hello, World

The ’-n’ option represents the maximum number of arguments in the command line.

The commands that will get executed by GNU Parallel can be observed with the ’–dry-run’ option, as illustrated below:

$ parallel --dry-run -k sleep {}\; echo {} ::: 5 2 1 4 3
sleep 5; echo 5
sleep 2; echo 2
sleep 1; echo 1
sleep 4; echo 4
sleep 3; echo 3

The ’–eta’ option will give an estimate on the time it will take to complete a job:

$ parallel --eta -k sleep {}\; echo {} ::: 5 2 1 4 3

Computers / CPU cores / Max jobs to run
1:local / 4 / 4

Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
ETA: 5s 1left 1500avg  local:1/4/100%/1.0s 
2
1
4
3
ETA: 1s 0left 1.00avg  local:0/5/100%/1.0s

Suppose you have a large number of log files that you wish to zip and archive, you can run the gzip command in Parallel, as shown below:

$ parallel gzip ::: *.log

To unzip them all, you can use the following command:

$ parallel gunzip ::: *.gz

The ‘convert’ command is useful to transform image files. High resolution images can be scaled to a lower resolution using the following command:

$ convert -resize 512x384 file.jpg file_web.jpg

If you have a large number of files that you wish to resize, you can parallelize the task, as shown below:

$ find . -name '*.jpg' | parallel convert -resize 512x384 {} {}_web.jpg

GNU Parallel with wget can help in parallel downloads of large Linux kernel releases, as shown below:

$ parallel wget ::: www.kernel.org/pub/linux/kernel/v3.x/linux-3.11.tar.xz \
                    www.kernel.org/pub/linux/kernel/v3.x/linux-3.10.10.tar.xz

The URLs can also be stored in a text file (“input.txt”), and passed as an argument to Parallel:

$ parallel -a input.txt wget

The file “input.txt” contains:

https://www.kernel.org/pub/linux/kernel/v3.x/linux-3.11.tar.xz
https://www.kernel.org/pub/linux/kernel/v3.x/linux-3.10.10.tar.xz

The downloaded kernel images can also be extracted in Parallel:

$ find . -name \*.tar.xz | parallel tar xvf

A ‘for’ loop in a Bash script can be parallelised. In the following script, the file sizes of all the text files are printed:

#!/bin/sh

for file in `ls *.txt`; do
  ls -lh "$file"
done | cut -d' ' -f 5

The parallelized version is as follows:

$ ls *.txt | parallel "ls -lh {}" | cut -d' ' -f 5

The number of CPUs and cores in your system can be listed with GNU Parallel:

$ parallel --number-of-cpus
1
$ parallel --number-of-cores
4

The ’-j’ option specifies the number of jobs to be run in parallel. If the value 0 is given, GNU Parallel will try to start as many jobs as possible. The ‘+ N’ option with ’-j’ adds N jobs to the CPU cores. For example:

$ find . -type f -print | parallel -j+2 ls -l {}

The input to GNU parallel can also be provided in a tabular format. Suppose you want to run ping tests for different machines, you can have a text file with the first column indicating the ping count, and the second column listing the hostname or the IP address. For example:

$ cat hosts.txt 
1 127.0.0.1
2 localhost

You can run the tests in parallel using the following code:

$ parallel -a hosts.txt --colsep ' ' ping -c {1} {2}

PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.074 ms

--- 127.0.0.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.074/0.074/0.074/0.000 ms

PING localhost.localdomain (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost.localdomain (127.0.0.1): icmp_seq=1 ttl=64 time=0.035 ms
64 bytes from localhost.localdomain (127.0.0.1): icmp_seq=2 ttl=64 time=0.065 ms

--- localhost.localdomain ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.035/0.050/0.065/0.015 ms

GNU Parallel can also execute jobs on remote machines, for which you need to first test that ssh works:

$ SERVER1=localhost
$ ssh $SERVER1 echo "Eureka"
guest@localhost's password: 
Eureka

You can then invoke commands or scripts to be run on SERVER1, as shown below:

$  parallel -S $SERVER1 echo "Eureka from " ::: $SERVER1
guest@localhost's password: 
Eureka from localhost

Files can also be transferred to remote machines using the ’–transfer’ option. Rsync is used internally for the transfer. An example is shown below:

$  parallel -S $SERVER1 --transfer cat ::: /tmp/host.txt 
guest@localhost's password: 
1 127.0.0.1
2 localhost

Refer to the GNU Parallel tutorial and manual page for more options and examples.

2014 foss

« OLDER POSTSNEWER POSTS »