Friends , Functions in Pig come in four types
1. Eval function
– A function that takes one or more expressions and returns another expression.
– Some function are aggregate function like MAX
– Some functions are algebraic, which means that the result of the function may be calculated incrementally.
– In MapReduce term algebric functions make use of combiner and are much more efficient to calculate .
– Supports UDF by importing org.apache.pig.EvalFunc , extend EvalFunc & overriding exec method
2. Filter function :
– It returns logical boolean results
– FILTER removes unwanted rows
– EX: IsEmpty
– Supports UDF by importing org.apache.pig.FilterFunc , extend FilterFunc & overriding exec method
3. Load function
– Loads the data into a relation from external storage
– Supports UDF by importing org.apache.pig.LoadFunc , extend LoadFunc but override different other function like setLocation , getInputFormat , prepareTORead , getNext methods.
4. Store function
– Specifies how to save the contents of a relation to external storage
– Ex: PigStorage which loads data from delimited text files , can store data in the same format.
Detailed list is given below:
Pig Built-in Function |
|
|
Eval |
AVG |
Calculate Avg(Mean) value of entries in a bag |
|
CONCAT |
Concatenates byte arrays or chareacter array together |
|
COUNT |
Calculate number of non-null entries in a bag |
|
COUNT_STAR |
Calculate all entries including nulls |
|
DIFF |
Calculates the set difference of two bags. If the two arguments are not bags |
|
|
, returns a bag containing both if they are equal;otherwise,returns a nempty bag |
|
MAX |
Calculate max |
|
MIN |
Calculate Min |
|
SIZE |
for character arrays, it is the num of char. For byte arrays the number of bytes |
|
|
for containers(tuple , bag,map) it is number of entries |
|
SUM |
Calculate summation of the values of entries in a bag |
|
TOBAG |
Convert one or emore expresssions to individual tuple which are then put in a bag |
|
TOKANIZE |
Tokenizes a character array into a bag of it’s constituent words |
|
TOMAP |
Converts an even number of expressions to a map of key-value pairs |
|
TOP |
Calculate top n tuples in a bag |
|
TOTUPLE |
Convert one or more expresssions to a tuple |
|
|
|
Filter |
IsEmpty |
Test weather bag or map is empty |
|
|
|
Load/Sttore |
PigStorage |
Loads or stores relations using a field-delimited text format defaults to a tab character |
|
BinStorage |
Loads or store relations from or to binary files in a pig specific format that uses HadoopWritable Object |
|
TextLoader |
Loads relations from a plain-text format. |
|
JsonLoader,JsonStorage |
Loads or store s relations from or to a JSON format. |
|
HBaseStorage |
Loads or stores relation from or to Hbase |