Friends , Functions in Pig come in four types
1. Eval function
– A function that takes one or more expressions and returns another expression.
– Some function are aggregate function like MAX
– Some functions are algebraic, which means that the result of the function may be calculated incrementally.
– In MapReduce term algebric functions make use of combiner and are much more efficient to calculate .
– Supports UDF by importing org.apache.pig.EvalFunc , extend EvalFunc & overriding exec method
2. Filter function :
– It returns logical boolean results
– FILTER removes unwanted rows
– EX: IsEmpty
– Supports UDF by importing org.apache.pig.FilterFunc , extend FilterFunc & overriding exec method
3. Load function
– Loads the data into a relation from external storage
– Supports UDF by importing org.apache.pig.LoadFunc , extend LoadFunc but override different other function like setLocation , getInputFormat , prepareTORead , getNext methods.
4. Store function
– Specifies how to save the contents of a relation to external storage
– Ex: PigStorage which loads data from delimited text files , can store data in the same format.
Detailed list is given below:
Pig Built-in Function | ||
Eval | AVG | Calculate Avg(Mean) value of entries in a bag |
CONCAT | Concatenates byte arrays or chareacter array together | |
COUNT | Calculate number of non-null entries in a bag | |
COUNT_STAR | Calculate all entries including nulls | |
DIFF | Calculates the set difference of two bags. If the two arguments are not bags | |
, returns a bag containing both if they are equal;otherwise,returns a nempty bag | ||
MAX | Calculate max | |
MIN | Calculate Min | |
SIZE | for character arrays, it is the num of char. For byte arrays the number of bytes | |
for containers(tuple , bag,map) it is number of entries | ||
SUM | Calculate summation of the values of entries in a bag | |
TOBAG | Convert one or emore expresssions to individual tuple which are then put in a bag | |
TOKANIZE | Tokenizes a character array into a bag of it’s constituent words | |
TOMAP | Converts an even number of expressions to a map of key-value pairs | |
TOP | Calculate top n tuples in a bag | |
TOTUPLE | Convert one or more expresssions to a tuple | |
Filter | IsEmpty | Test weather bag or map is empty |
Load/Sttore | PigStorage | Loads or stores relations using a field-delimited text format defaults to a tab character |
BinStorage | Loads or store relations from or to binary files in a pig specific format that uses HadoopWritable Object | |
TextLoader | Loads relations from a plain-text format. | |
JsonLoader,JsonStorage | Loads or store s relations from or to a JSON format. | |
HBaseStorage | Loads or stores relation from or to Hbase |