Pig – Functions

Friends , Functions in Pig come in four types
1. Eval function
– A function that takes one or more expressions and returns another expression.
– Some function are aggregate function like MAX
– Some functions are algebraic, which means that the result of the function may be calculated incrementally.
– In MapReduce term algebric functions make use of combiner and are much more efficient to calculate .
– Supports UDF by importing org.apache.pig.EvalFunc , extend EvalFunc & overriding exec method

2. Filter function :
– It returns logical boolean results
– FILTER removes unwanted rows
– EX: IsEmpty
– Supports UDF by importing org.apache.pig.FilterFunc , extend FilterFunc & overriding exec method

3. Load function
– Loads the data into a relation from external storage
– Supports UDF by importing org.apache.pig.LoadFunc , extend LoadFunc but override different other function like setLocation , getInputFormat , prepareTORead , getNext methods.

4. Store function
– Specifies how to save the contents of a relation to external storage
– Ex: PigStorage which loads data from delimited text files , can store data in the same format.

Detailed list is given below:

Pig Built-in Function
Eval AVG Calculate Avg(Mean) value of entries in a bag
CONCAT Concatenates byte arrays or chareacter array together
COUNT Calculate number of non-null entries in a bag
COUNT_STAR Calculate all entries including nulls
DIFF Calculates the set difference of two bags. If the two arguments are not bags
, returns a bag containing both if they are equal;otherwise,returns a nempty bag
MAX Calculate max
MIN Calculate Min
SIZE for character arrays, it is the num of char. For byte arrays the number of bytes
for containers(tuple , bag,map) it is number of entries
SUM Calculate summation of the values of entries in a bag
TOBAG Convert one or emore expresssions to individual tuple which are then put in a bag
TOKANIZE Tokenizes a character array into a bag of it’s constituent words
TOMAP Converts an even number of expressions to a map of key-value pairs
TOP Calculate top n tuples in a bag
TOTUPLE Convert one or more expresssions to a tuple
Filter IsEmpty Test weather bag or map is empty
Load/Sttore PigStorage Loads or stores relations using a field-delimited text format defaults to a tab character
BinStorage Loads or store relations from or to binary files in a pig specific format that uses HadoopWritable Object
TextLoader Loads relations from a plain-text format.
JsonLoader,JsonStorage Loads or store s relations from or to a JSON format.
HBaseStorage Loads or stores relation from or to Hbase

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s