Friends , Functions in Pig come in four types
1. Eval function
– A function that takes one or more expressions and returns another expression.
– Some function are aggregate function like MAX
– Some functions are algebraic, which means that the result of the function may be calculated incrementally.
– In MapReduce term algebric functions make use of combiner and are much more efficient to calculate .
– Supports UDF by importing org.apache.pig.EvalFunc , extend EvalFunc & overriding exec method
2. Filter function :
– It returns logical boolean results
– FILTER removes unwanted rows
– EX: IsEmpty
– Supports UDF by importing org.apache.pig.FilterFunc , extend FilterFunc & overriding exec method
3. Load function
– Loads the data into a relation from external storage
– Supports UDF by importing org.apache.pig.LoadFunc , extend LoadFunc but override different other function like setLocation , getInputFormat , prepareTORead , getNext methods.
4. Store function
– Specifies how to save the contents of a relation to external storage
– Ex: PigStorage which loads data from delimited text files , can store data in the same format.
Detailed list is given below:
|Pig Built-in Function|
|Eval||AVG||Calculate Avg(Mean) value of entries in a bag|
|CONCAT||Concatenates byte arrays or chareacter array together|
|COUNT||Calculate number of non-null entries in a bag|
|COUNT_STAR||Calculate all entries including nulls|
|DIFF||Calculates the set difference of two bags. If the two arguments are not bags|
|, returns a bag containing both if they are equal;otherwise,returns a nempty bag|
|SIZE||for character arrays, it is the num of char. For byte arrays the number of bytes|
|for containers(tuple , bag,map) it is number of entries|
|SUM||Calculate summation of the values of entries in a bag|
|TOBAG||Convert one or emore expresssions to individual tuple which are then put in a bag|
|TOKANIZE||Tokenizes a character array into a bag of it’s constituent words|
|TOMAP||Converts an even number of expressions to a map of key-value pairs|
|TOP||Calculate top n tuples in a bag|
|TOTUPLE||Convert one or more expresssions to a tuple|
|Filter||IsEmpty||Test weather bag or map is empty|
|Load/Sttore||PigStorage||Loads or stores relations using a field-delimited text format defaults to a tab character|
|BinStorage||Loads or store relations from or to binary files in a pig specific format that uses HadoopWritable Object|
|TextLoader||Loads relations from a plain-text format.|
|JsonLoader,JsonStorage||Loads or store s relations from or to a JSON format.|
|HBaseStorage||Loads or stores relation from or to Hbase|