HBase shell commands

  1. describe ‘tablename’ :
    Displays the metadata of particular table
    Example :

    describe 'employee' 
    
    {NAME => 'address', BLOOMFILTER => 'ROW', VERSIONS => '5', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', 
    
    TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} 
    
    {NAME => 'personal_info', BLOOMFILTER => 'ROW', VERSIONS => '2', IN_MEMORY => 'true', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NO
    
    NE', TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'} 
    
    {NAME => 'professional_info', BLOOMFILTER => 'ROW', VERSIONS => '3', IN_MEMORY => 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING =
    
    > 'NONE', TTL => 'FOREVER', COMPRESSION => 'GZ', MIN_VERSIONS => '0', BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
  2. disable ’employee’ :
    Always disable table before performing any DDL operation
  3. alter ’employee’, {NAME => ‘col_fam1’, COMPRESSION => ‘GZ’}
    With alter command you can add columns on fly as well as add different parameter to columns like storing IN_MEMORY , Setting compression for particular column family , setting number of versions etc.
  4. list :
    List all the tables stored in Hbase :
    Example :
    list
    [“employee”, “table1”, “user_hcat_load_table”]
  5. enable_all ‘t.*’ :
    Enables all the table matching regex
  6. exists ‘student’ :
    Check weather student table exist or not
  7. show_filters :
    Shows all the available filter in Hbase
  8. alter_status :
    Get the status of the alter command. Indicates the number of regions of the table that have received the updated schema Pass table name.
  9. alter_async :
    Alter column family schema, does not wait for all regions to receive the
    schema changes. Pass table name and a dictionary specifying new column
    family schema. Dictionaries are described on the main help command output.
    Dictionary must include name of column family to alter.
    To change or add the ‘f1’ column family in table ‘t1’ from defaults
    to instead keep a maximum of 5 cell VERSIONS, do:hbase> alter_async ‘t1’, NAME => ‘f1’, VERSIONS => 5To delete the ‘f1’ column family in table ‘t1’, do:

    hbase> alter_async ‘t1’, NAME => ‘f1’, METHOD => ‘delete’or a shorter version:hbase> alter_async ‘t1’, ‘delete’ => ‘f1’
    You can also change table-scope attributes like MAX_FILESIZE
    MEMSTORE_FLUSHSIZE, READONLY, and DEFERRED_LOG_FLUSH.

    For example, to change the max size of a family to 128MB, do:

    hbase> alter ‘t1’, METHOD => ‘table_att’, MAX_FILESIZE => ‘134217728’

    There could be more than one alteration in one command:

    hbase> alter ‘t1’, {NAME => ‘f1’}, {NAME => ‘f2’, METHOD => ‘delete’}

    To check if all the regions have been updated, use alter_status <table_name>

  10. count :
    Counts the number of rows in table. COUNT interval is by default 1000, one can increase the interval as well as set scan caching on count scan by default.
    hbase> count ‘t1’, INTERVAL => 100000
    hbase> count ‘t1’, CACHE => 1000
    hbase> count ‘t1’, INTERVAL => 10, CACHE => 1000

  11. delete :
    Put a delete cell value at specified table/row/column and optionally
    timestamp coordinates. Deletes must match the deleted cell’s
    coordinates exactly. When scanning, a delete cell suppresses older
    versions. To delete a cell from ‘t1’ at row ‘r1’ under column ‘c1’
    marked with the time ‘ts1’, do:hbase> delete ‘t1’, ‘r1’, ‘c1’, ts1
  12. deleteall :
    Delete all cells in a given row; pass a table name, row, and optionally
    a column and timestamp. Examples:hbase> deleteall ‘t1’, ‘r1’
    hbase> deleteall ‘t1’, ‘r1’, ‘c1’
    hbase> deleteall ‘t1’, ‘r1’, ‘c1’, ts1

  13. get :
    Get row or cell contents; pass table name, row, and optionally
    a dictionary of column(s), timestamp, timerange and versions.
    Examples:
    hbase> get ‘t1’, ‘r1’
    hbase> get ‘t1’, ‘r1’, {TIMERANGE => [ts1, ts2]}
    hbase> get ‘t1’, ‘r1’, {COLUMN => ‘c1’}
    hbase> get ‘t1’, ‘r1’, {COLUMN => [‘c1’, ‘c2’, ‘c3’]}
    hbase> get ‘t1’, ‘r1’, {COLUMN => ‘c1’, TIMESTAMP => ts1}
    hbase> get ‘t1’, ‘r1’, {COLUMN => ‘c1’, TIMERANGE => [ts1, ts2], VERSIONS => 4}
    hbase> get ‘t1’, ‘r1’, {COLUMN => ‘c1’, TIMESTAMP => ts1, VERSIONS => 4}
    hbase> get ‘t1’, ‘r1’, {FILTER => “ValueFilter(=, ‘binary:abc’)”}
    hbase> get ‘t1’, ‘r1’, ‘c1’
    hbase> get ‘t1’, ‘r1’, ‘c1’, ‘c2’
    hbase> get ‘t1’, ‘r1’, [‘c1’, ‘c2’]
  14.  put :
    Put a cell ‘value’ at specified table/row/column and optionally
    timestamp coordinates. To put a cell value into table ‘t1’ at
    row ‘r1’ under column ‘c1’ marked with the time ‘ts1’, do:hbase> put ‘t1’, ‘r1’, ‘c1’, ‘value’, ts1

  15. scan :
    Scan a table; pass table name and optionally a dictionary of scanner
    specifications. Scanner specifications may include one or more of:
    TIMERANGE, FILTER, LIMIT, STARTROW, STOPROW, TIMESTAMP, MAXLENGTH,
    or COLUMNS, CACHEIf no columns are specified, all columns will be scanned.
    To scan all members of a column family, leave the qualifier empty as in
    ‘col_family:’.The filter can be specified in two ways:
    1. Using a filterString – more information on this is available in the
    Filter Language document attached to the HBASE-4176 JIRA
    2. Using the entire package name of the filter.Some examples:hbase> scan ‘.META.’
    hbase> scan ‘.META.’, {COLUMNS => ‘info:regioninfo’}
    hbase> scan ‘t1’, {COLUMNS => [‘c1’, ‘c2’], LIMIT => 10, STARTROW => ‘xyz’}
    hbase> scan ‘t1’, {COLUMNS => ‘c1’, TIMERANGE => [1303668804, 1303668904]}
    hbase> scan ‘t1’, {FILTER => “(PrefixFilter (‘row2’) AND
    (QualifierFilter (>=, ‘binary:xyz’))) AND (TimestampsFilter ( 123, 456))”}
    hbase> scan ‘t1’, {FILTER =>
    org.apache.hadoop.hbase.filter.ColumnPaginationFilter.new(1, 0)}
  16. truncate :
    Disables, drops and recreates the specified table.
    Examples:
    hbase>truncate ‘t1’

    Reference :
    https://learnhbase.wordpress.com/2013/03/02/hbase-shell-commands/

    https://www.cloudera.com/documentation/enterprise/5-5-x/topics/admin_hbase_filtering.html

 

 

Elasticsearch – Aggregation

Elasticsearch Aggregation provides capability similar to RDBMS group by opeartor.
Facets provide a great way to aggregate data within a document set context. This context is defined by the executed query in combination with the different levels of filters that can be defined (filtered queries, top-level filters, and facet level filters). While powerful, their implementation is not designed from the ground up to support complex aggregations and is thus limited.
An aggregation can be seen as a unit-of-work that builds analytic information over a set of documents.
There are many different types of aggregation, each with it’s own purpose & output. To Better understand these type, It is often best to break down into 2 families.
1. Bucketing
– A family of aggregations that build buckets , where each bucket is associated with key and a document criterion
– When the aggregation is executed, all the buckets criteria are evaluated on every document in the context and when a criterion matches, the document is considered to “fall in” the relevant bucket
– By the end of the aggregation process, we’ll end up with a list of buckets – each one with a set of documents that “belong” to it.
2. Metric
– Aggregations that keep track and compute metrics over a set of documents.
Different kinds of aggregation is listed below:
1.Min Aggregation
2.Max Aggregation
3.Sum Aggregation
4.Avg Aggregation
5.Stats Aggregation
6.Extended Stats Aggregation
7.Value Count Aggregation
8.Percentiles Aggregation
9.Percentile Ranks Aggregation
10.Cardinality Aggregation
11.Geo Bounds Aggregation
12.Top hits Aggregation
13.Scripted Metric Aggregation
14.Global Aggregation
15.Filter Aggregation
16.Filters Aggregation
17.Missing Aggregation
18.Nested Aggregation
19.Reverse nested Aggregation
20.Children Aggregation
21.Terms Aggregation
22.Significant Terms Aggregation
23.Range Aggregation
24.Date Range Aggregation
25.IPv4 Range Aggregation
26.Histogram Aggregation
27.Date Histogram Aggregation
28.Geo Distance Aggregation
29.GeoHash grid Aggregation

Using nginx to provide authentication to Elasticsearch / Kibana

Friends authentication & authorization is always an important requirement for development of any application.

In this post i am going to show you how to provide authentication to elasticsearch / kibana using Nginx server.

Steps are given below:

1. Install the nginx server

You can follow below given link for reference.

https://www.digitalocean.com/community/tutorials/how-to-install-nginx-on-ubuntu-14-04-lts

2. Create the configuration file as kibana.conf or elasticsearch.con under /etc/nginx/conf.d (under configuration directory)

3. Add the following code to kibana.conf

server {
listen 80;
server_name yourdomain.com; ## Replace with your domain name
location / {
auth_basic "Restricted";
auth_basic_user_file /etc/nginx/conf.d/kibana.htpasswd;
proxy_pass http://yourdomain.com:5601; ## Replace with your kibana instance as kibana runs on 5601 for ES use port number 9200
}

4. Create kibana.htpasswd file under /etc/nginx/conf.d directory

5. Run the following command to generate the username / password for authentication

sudo htpasswd -c /etc/nginx/conf.d/kibana.htpasswd bhavesh

It will ask for password to set for username bhavesh. Enter it. It will store the generated username password in respected file colon seperated in encrypted form.

6. Restart the nginx

7. Point your browser to yourdomain.com & verify.

The above steps will provide authorization. One can provide authorization based on indices using kibana shield plugin. Otherwise you can follow below link describing tricks for MultiRole Authorization

https://www.elastic.co/blog/playing-http-tricks-nginx