본문 바로가기

Bigdata 동영상

HBase install and basic command [하둡 동영상 강의]


HBase 설치 진행내역을 담았습니다.   [column base db]


Prerequisite : HDFS








[refered to sites below and downloaded hbase  tar.gz file]


http://hbase.apache.org/

http://www.apache.org/dyn/closer.cgi/hbase/

http://hbase.apache.org/book/quickstart.html



# vi /etc/profile 


export HBASE_HOME=/home/hadoop/hbase

export PATH=$PATH:$JAVA_HOME/bin:$ANT_HOME:$HADOOP_HOME/bin:$HBASE_HOME/bin


# source /etc/profile   

# cat hbase-env.sh 

# The java implementation to use.  Java 1.6 required.


 export JAVA_HOME=/usr/local/java

 export HBASE_CLASSPATH=/home/hadoop/hbase/conf

 export HBASE_MANAGER_ZK=true


# The maximum amount of heap to use, in MB. Default is 1000.

# export HBASE_HEAPSIZE=1000


# Extra Java runtime options.

# Below are what we set by default.  May only work with SUN JVM.

# For more on why as well as other possible settings,

# see http://wiki.apache.org/hadoop/PerformanceTuning


export HBASE_OPTS="-XX:+UseConcMarkSweepGC"


# Uncomment one of the below three options to enable java garbage collection logging for the server-side processes.


# This enables basic gc logging to the .out file.

# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps"


# This enables basic gc logging to its own file.

# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .

# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH>"


# This enables basic GC logging to its own file with automatic log rolling. Only applies to jdk 1.6.0_34+ and 1.7.0_2+.

# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .

# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M"


# Uncomment one of the below three options to enable java garbage collection logging for the client processes.


# This enables basic gc logging to the .out file.

# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps"


# This enables basic gc logging to its own file.

# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .

# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH>"


# This enables basic GC logging to its own file with automatic log rolling. Only applies to jdk 1.6.0_34+ and 1.7.0_2+.

# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .

# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M"


# Uncomment below if you intend to use the EXPERIMENTAL off heap cache.

# export HBASE_OPTS="$HBASE_OPTS -XX:MaxDirectMemorySize="

# Set hbase.offheapcache.percentage in hbase-site.xml to a nonzero value.



# Uncomment and adjust to enable JMX exporting

# See jmxremote.password and jmxremote.access in $JRE_HOME/lib/management to configure remote password access.

# More details at: http://java.sun.com/javase/6/docs/technotes/guides/management/agent.html

#

# export HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false"

# export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10101"

# export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10102"

# export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10103"

# export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10104"


# File naming hosts on which HRegionServers will run.  $HBASE_HOME/conf/regionservers by default.

# export HBASE_REGIONSERVERS=${HBASE_HOME}/conf/regionservers


# File naming hosts on which backup HMaster will run.  $HBASE_HOME/conf/backup-masters by default.

# export HBASE_BACKUP_MASTERS=${HBASE_HOME}/conf/backup-masters


# Extra ssh options.  Empty by default.

# export HBASE_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HBASE_CONF_DIR"


# Where log files are stored.  $HBASE_HOME/logs by default.

# export HBASE_LOG_DIR=${HBASE_HOME}/logs


# Enable remote JDWP debugging of major HBase processes. Meant for Core Developers 

# export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8070"

# export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8071"

# export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8072"

# export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8073"


# A string representing this instance of hbase. $USER by default.

# export HBASE_IDENT_STRING=$USER


# The scheduling priority for daemon processes.  See 'man nice'.

# export HBASE_NICENESS=10


# The directory where pid files are stored. /tmp by default.

# export HBASE_PID_DIR=/var/hadoop/pids


# Seconds to sleep between slave commands.  Unset by default.  This

# can be useful in large clusters, where, e.g., slave rsyncs can

# otherwise arrive faster than the master can service them.

# export HBASE_SLAVE_SLEEP=0.1


# Tell HBase whether it should manage it's own instance of Zookeeper or not.

# export HBASE_MANAGES_ZK=true



[hadoop@h001 conf]$ vi base-site.xml 

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>


<configuration>


 <property> 

  <name>hbase.rootdir</name>

  <value>hdfs://192.168.73.71:9000/hbase</value> 

 </property>


 <property> 

  <name>hbase.master</name> 

  <value>192.168.73.71:60000</value> 

 </property> 


 <property> 

  <name>hbase.zookeeper.quorum</name> 

  <value>192.168.73.71,192.168.73.72,192.168.73.73,192.168.73.74</value> 

 </property> 


 <property> 

  <name>hbase.zookeeper.property.dataDir</name> 

  <value>/home/hadoop/zk_data</value> 

 </property> 


 <property> 

  <name>hbase.cluster.distributed</name> 

  <value>true</value> 

 </property> 


 <property> 

  <name>dfs.support.append</name> 

  <value>true</value> 

 </property> 


 <property> 

  <name>dfs.datanode.max.xcievers</name> 

  <value>4096</value> 

 </property>


</configuration>


[hadoop@h001 conf]$ cat regionservers 

192.168.73.72 192.168.73.73 192.168.73.74


$tar cvzf  hbase.tar.gz  hbase
$scp hbase.tar.gz hadoop@h002:.
$ssh h002  tar xvzf  hbase.tar.gz

$ ./bin/start-hbase.sh


$ ./bin/hbase shell


[hadoop@h001 hbase]$ jps

3855 HQuorumPeer  <-- Zookeeper

3935 HMaster          <--- HBase

4134 Jps

3137 NameNode

3333 JobTracker


http://h001:60010

http://h001:50070/dfshealth.jsp

http://h001:50030/jobtracker.jsp


[hadoop@h001 hbase]$ ./bin/hbase shell


[hadoop@h001 hbase]$ ./bin/hbase shell

HBase Shell; enter 'help<RETURN>' for list of supported commands.

Type "exit<RETURN>" to leave the HBase Shell

Version 0.94.7, r1471806, Wed Apr 24 18:48:26 PDT 2013



hbase(main):001:0> help

HBase Shell, version 0.94.7, r1471806, Wed Apr 24 18:48:26 PDT 2013

Type 'help "COMMAND"', (e.g. 'help "get"' -- the quotes are necessary) for help on a specific command.

Commands are grouped. Type 'help "COMMAND_GROUP"', (e.g. 'help "general"') for help on a command group.


COMMAND GROUPS:

  Group name: general

  Commands: status, version, whoami


  Group name: ddl

  Commands: alter, alter_async, alter_status, create, describe, disable, disable_all, drop, drop_all, enable, enable_all, exists, is_disabled, is_enabled, list, show_filters


  Group name: dml

  Commands: count, delete, deleteall, get, get_counter, incr, put, scan, truncate


  Group name: tools

  Commands: assign, balance_switch, balancer, close_region, compact, flush, hlog_roll, major_compact, move, split, unassign, zk_dump


  Group name: replication

  Commands: add_peer, disable_peer, enable_peer, list_peers, remove_peer, start_replication, stop_replication


  Group name: snapshot

  Commands: clone_snapshot, delete_snapshot, list_snapshots, restore_snapshot, snapshot


  Group name: security

  Commands: grant, revoke, user_permission


SHELL USAGE:

Quote all names in HBase Shell such as table and column names.  Commas delimit

command parameters.  Type <RETURN> after entering a command to run it.

Dictionaries of configuration used in the creation and alteration of tables are

Ruby Hashes. They look like this:


  {'key1' => 'value1', 'key2' => 'value2', ...}


and are opened and closed with curley-braces.  Key/values are delimited by the

'=>' character combination.  Usually keys are predefined constants such as

NAME, VERSIONS, COMPRESSION, etc.  Constants do not need to be quoted.  Type

'Object.constants' to see a (messy) list of all constants in the environment.


If you are using binary keys or values and need to enter them in the shell, use

double-quote'd hexadecimal representation. For example:


  hbase> get 't1', "key\x03\x3f\xcd"

  hbase> get 't1', "key\003\023\011"

  hbase> put 't1', "test\xef\xff", 'f1:', "\x01\x33\x40"


The HBase shell is the (J)Ruby IRB with the above HBase-specific commands added.

For more on the HBase Shell, see http://hbase.apache.org/docs/current/book.html



hbase(main):002:0> help 'create'
$ ./bin/hbase shell
Create table; pass table name, a dictionary of specifications per
column family, and optionally a dictionary of table configuration.
Dictionaries are described below in the GENERAL NOTES section.
Examples:

  hbase> create 't1', {NAME => 'f1', VERSIONS => 5}
  hbase> create 't1', {NAME => 'f1'}, {NAME => 'f2'}, {NAME => 'f3'}
  hbase> # The above in shorthand would be the following:
  hbase> create 't1', 'f1', 'f2', 'f3'
  hbase> create 't1', {NAME => 'f1', VERSIONS => 1, TTL => 2592000, BLOCKCACHE => true}
  hbase> create 't1', 'f1', {SPLITS => ['10', '20', '30', '40']}
  hbase> create 't1', 'f1', {SPLITS_FILE => 'splits.txt'}
  hbase> # Optionally pre-split the table into NUMREGIONS, using
  hbase> # SPLITALGO ("HexStringSplit", "UniformSplit" or classname)
  hbase> create 't1', 'f1', {NUMREGIONS => 15, SPLITALGO => 'HexStringSplit'}

hbase(main):003:0> 

hbase(main):039:0> create 'ta1', 'cf1'

$ ./bin/hbase shell

hbase(main):035:0> describe 'ta1'

$ ./bin/hbase shell

DESCRIPTION                                          ENABLED                    

 'ta1', {NAME => 'cf1', DATA_BLOCK_ENCODING => 'NONE false                      

 ', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0',                            

  VERSIONS => '3', COMPRESSION => 'NONE', MIN_VERSIO                            

 NS => '0', TTL => '2147483647', KEEP_DELETED_CELLS                             

 => 'false', BLOCKSIZE => '65536', IN_MEMORY => 'fal                            

 se', ENCODE_ON_DISK => 'true', BLOCKCACHE => 'true'                            

 }                                                                              

1 row(s) in 0.0660 seconds



hbase(main):039:0> disable 'ta1'
hbase(main):036:0> is_enabled 'ta1'

false                                                                           

0 row(s) in 0.0160 seconds



hbase(main):037:0> enable 'ta1'

0 row(s) in 2.1680 seconds



hbase(main):038:0> is_enabled 'ta1'

true                                                                            

0 row(s) in 0.0170 seconds

hbase(main):044:0> drop 'ta1'


ERROR: Table ta1 is enabled. Disable it first.'


Here is some help for this command:

Drop the named table. Table must first be disabled: e.g. "hbase> drop 't1'"




hbase(main):045:0> disable 'ta1'

0 row(s) in 2.1170 seconds



hbase(main):046:0> drop 'ta1'

0 row(s) in 1.1630 seconds


==> drop 시  hdfs 에 생성되어 있던 /hbase/ta1  디렉토리가 삭제 처리됨


hbase(main):047:0> list

TABLE                                                                           

hbase009                                                                        

regionsplit_table                                                               

test                                                                            

textxx                                                                          

4 row(s) in 0.0570 seconds



hbase(main):048:0> create 'table01', 'cf'
0 row(s) in 1.1100 seconds


hbase(main):049:0> put 'table01', 'row001', 'cf:a', 'value i wanna put'
0 row(s) in 0.1260 seconds


hbase(main):050:0> put 'table01', 'row001', 'cf:b', 'value b i wanna put'
0 row(s) in 0.0200 seconds


hbase(main):051:0> put 'table01', 'row001', 'cf:c', 'value c i wanna put'
0 row(s) in 0.0150 seconds


hbase(main):052:0> put 'table01', 'row002', 'cf:2a', 'value 2a i wanna put'
0 row(s) in 0.0120 seconds


hbase(main):053:0> put 'table01', 'row003', 'cf:3a', 'value 3a i wanna put'
0 row(s) in 0.0260 seconds


hbase(main):054:0> scan 'table01'

ROW                   COLUMN+CELL                                               
 row001               column=cf:a, timestamp=1372602441690, value=value i wanna  put                                                       
 row001               column=cf:b, timestamp=1372602450824, value=value b i wanna  put                                                     
 row001               column=cf:c, timestamp=1372602456583, value=value c i wanna put                                                     
 row002               column=cf:2a, timestamp=1372602470758, value=value 2a i wanna put                                                   
 row003               column=cf:3a, timestamp=1372602481567, value=value 3a i wanna put                                                   
3 row(s) in 0.1360 seconds


hbase(main):054:0> scan 'table01'

ROW                               COLUMN+CELL                                                                                    
 row001                           column=cf:a, timestamp=1372602441690, value=value i wanna put                                  
 row001                           column=cf:b, timestamp=1372602450824, value=value b i wanna put                                
 row001                           column=cf:c, timestamp=1372602456583, value=value c i wanna put                                
 row002                           column=cf:2a, timestamp=1372602470758, value=value 2a i wanna put                              
 row003                           column=cf:3a, timestamp=1372602481567, value=value 3a i wanna put                              
3 row(s) in 0.0500 seconds

hbase(main):067:0> get 'table01', 'row001'


COLUMN                            CELL                                                                                           
 cf:a                             timestamp=1372602441690, value=value i wanna put                                               
 cf:b                             timestamp=1372602450824, value=value b i wanna put                                             
 cf:c                             timestamp=1372602456583, value=value c i wanna put                                             
3 row(s) in 0.0240 seconds

hbase(main):068:0> get 'table01', 'row001', 'cf:a'


COLUMN                            CELL                                                                                           
 cf:a                             timestamp=1372602441690, value=value i wanna put                                               
1 row(s) in 0.0220 seconds

hbase(main):070:0> get 'table01', 'row001', 'cf:a', 'cf:b'


COLUMN                            CELL                                                                                           
 cf:a                             timestamp=1372602441690, value=value i wanna put                                               
 cf:b                             timestamp=1372602450824, value=value b i wanna put                                             
2 row(s) in 0.0270 seconds

hbase(main):071:0> get 'table01', 'row001', ['cf:a', 'cf:b']

COLUMN                            CELL                                                                                           
 cf:a                             timestamp=1372602441690, value=value i wanna put                                               
 cf:b                             timestamp=1372602450824, value=value b i wanna put                                             
2 row(s) in 0.0330 seconds

hbase(main):072:0> 

hbase(main):071:0> get 'table01', 'row001', ['cf:a', 'cf:b']

hbase(main):074:0> scan 'table01'

ROW                               COLUMN+CELL                                                                                    

 row001                           column=cf:a, timestamp=1372602441690, value=value i wanna put                                  

 row001                           column=cf:b, timestamp=1372602450824, value=value b i wanna put                                

 row001                           column=cf:c, timestamp=1372602456583, value=value c i wanna put                                

 row002                           column=cf:2a, timestamp=1372602470758, value=value 2a i wanna put                              

 row003                           column=cf:3a, timestamp=1372602481567, value=value 3a i wanna put                              

3 row(s) in 0.0630 seconds

hbase(main):075:0> import java.util.Date 

=> Java::JavaUtil::Date


hbase(main):076:0> Date.new(1372602456583).toString()

=> "Sun Jun 30 23:27:36 KST 2013"




$ hbase org.apache.hadoop.hbase.util.RegionSplitter test_table HexStringSplit -c 3 -f f1




[hadoop@h001 hbase]$ ps auxk -rss|less

[hadoop@h001 hbase]$ jmap -heap 
[hadoop@h001 hbase]$ lsof -uhadoop | wc -l
[hadoop@h001 hbase]$ ps -o pid, comm,user, thcount -u hadoop



[refered to http://hortonworks.com/blog/apache-hbase-region-splitting-and-merging/]

[refered to this HBase 클러스터 구축과 관리]