[refered to the sites below]
http://hadoopbook.com/code.html
The book’s example code is available from GitHub at http://github.com/tomwhite/hadoop-book/
The code for the third edition is at https://github.com/tomwhite/hadoop-book/tree/3e
A sample of the NCDC weather dataset that is used throughout the book can be found at https://github.com/tomwhite/hadoop-book/tree/master/input/ncdc/all
[hadoop@h001 MaxTemperaturebyMonth]$
javac -classpath /home/hadoop/hadoop/hadoop-core-1.0.4.jar -d . *.java
[ " -d " <-- destination " *.java " <-- compile target ]
[hadoop@h001 javafolder]$ jar -cvf ./FindMax.jar ./*.class
added manifest
adding: MaxTemperature.class(in = 1418) (out= 800)(deflated 43%)
adding: MaxTemperatureMapper.class(in = 1876) (out= 804)(deflated 57%)
adding: MaxTemperatureReducer.class(in = 1660) (out= 704)(deflated 57%)
cf. [hadoop@h001 Temp]$ jar xf FindMaxTemperature.jar
[hadoop@h001 hadoop]$ ./bin/hadoop jar FindMax.jar MaxTemperature /user/hadoop/wx/ /user/hadoop/wx/out
[ Need to add package name like kr.jacob.mr.MaxTemperature if you used package ]
13/07/05 19:45:19 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/07/05 19:45:19 INFO input.FileInputFormat: Total input paths to process : 2
13/07/05 19:45:20 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/07/05 19:45:20 WARN snappy.LoadSnappy: Snappy native library not loaded
13/07/05 19:45:20 INFO mapred.JobClient: Running job: job_201307051837_0001
13/07/05 19:45:21 INFO mapred.JobClient: map 0% reduce 0%
13/07/05 19:45:38 INFO mapred.JobClient: map 50% reduce 0%
13/07/05 19:45:44 INFO mapred.JobClient: map 100% reduce 0%
13/07/05 19:45:53 INFO mapred.JobClient: map 100% reduce 100%
13/07/05 19:45:58 INFO mapred.JobClient: Job complete: job_201307051837_0001
13/07/05 19:45:58 INFO mapred.JobClient: Counters: 29
13/07/05 19:45:58 INFO mapred.JobClient: Job Counters
13/07/05 19:45:58 INFO mapred.JobClient: Launched reduce tasks=1
13/07/05 19:45:58 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=24398
13/07/05 19:45:58 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/07/05 19:45:58 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/07/05 19:45:58 INFO mapred.JobClient: Launched map tasks=2
13/07/05 19:45:58 INFO mapred.JobClient: Data-local map tasks=2
13/07/05 19:45:58 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=12526
13/07/05 19:45:58 INFO mapred.JobClient: File Output Format Counters
13/07/05 19:45:58 INFO mapred.JobClient: Bytes Written=18
13/07/05 19:45:58 INFO mapred.JobClient: FileSystemCounters
13/07/05 19:45:58 INFO mapred.JobClient: FILE_BYTES_READ=144425
13/07/05 19:45:58 INFO mapred.JobClient: HDFS_BYTES_READ=1777370
13/07/05 19:45:58 INFO mapred.JobClient: FILE_BYTES_WRITTEN=353220
13/07/05 19:45:58 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=18
13/07/05 19:45:58 INFO mapred.JobClient: File Input Format Counters
13/07/05 19:45:58 INFO mapred.JobClient: Bytes Read=1777168
13/07/05 19:45:58 INFO mapred.JobClient: Map-Reduce Framework
13/07/05 19:45:58 INFO mapred.JobClient: Map output materialized bytes=144431
13/07/05 19:45:58 INFO mapred.JobClient: Map input records=13130
13/07/05 19:45:58 INFO mapred.JobClient: Reduce shuffle bytes=144431
13/07/05 19:45:58 INFO mapred.JobClient: Spilled Records=26258
13/07/05 19:45:59 INFO mapred.JobClient: Map output bytes=118161
13/07/05 19:45:59 INFO mapred.JobClient: Total committed heap usage (bytes)=336338944
13/07/05 19:45:59 INFO mapred.JobClient: CPU time spent (ms)=5610
13/07/05 19:45:59 INFO mapred.JobClient: Combine input records=0
13/07/05 19:45:59 INFO mapred.JobClient: SPLIT_RAW_BYTES=202
13/07/05 19:45:59 INFO mapred.JobClient: Reduce input records=13129
13/07/05 19:45:59 INFO mapred.JobClient: Reduce input groups=2
13/07/05 19:45:59 INFO mapred.JobClient: Combine output records=0
13/07/05 19:45:59 INFO mapred.JobClient: Physical memory (bytes) snapshot=430219264
13/07/05 19:45:59 INFO mapred.JobClient: Reduce output records=2
13/07/05 19:45:59 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2167259136
13/07/05 19:45:59 INFO mapred.JobClient: Map output records=13129
[hadoop@h001 hadoop]$
[hadoop@h001 MaxTemperaturebyMonth]$ hadoop fs -cat /user/hadoop/wx/out/part-r-00000
1901 317
1902 244
>> Year 에서 Month 로 변경 후 Map, Combine, Reduce 개수의 Input Output 개수의 변화 확인.
13/07/05 20:09:08 INFO mapred.JobClient: Map-Reduce Framework
13/07/05 20:09:08 INFO mapred.JobClient: Map output materialized bytes=170689
13/07/05 20:09:08 INFO mapred.JobClient: Map input records=13130
13/07/05 20:09:08 INFO mapred.JobClient: Reduce shuffle bytes=170689
13/07/05 20:09:08 INFO mapred.JobClient: Spilled Records=26258
13/07/05 20:09:08 INFO mapred.JobClient: Map output bytes=144419
13/07/05 20:09:08 INFO mapred.JobClient: Total committed heap usage (bytes)=336338944
13/07/05 20:09:08 INFO mapred.JobClient: CPU time spent (ms)=4870
13/07/05 20:09:08 INFO mapred.JobClient: Combine input records=0
13/07/05 20:09:08 INFO mapred.JobClient: SPLIT_RAW_BYTES=202
13/07/05 20:09:08 INFO mapred.JobClient: Reduce input records=13129
13/07/05 20:09:08 INFO mapred.JobClient: Reduce input groups=24
13/07/05 20:09:08 INFO mapred.JobClient: Combine output records=0
13/07/05 20:09:08 INFO mapred.JobClient: Physical memory (bytes) snapshot=435335168
13/07/05 20:09:08 INFO mapred.JobClient: Reduce output records=24
13/07/05 20:09:08 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2167193600
13/07/05 20:09:08 INFO mapred.JobClient: Map output records=13129
[hadoop@h001 MaxTemperaturebyMonth]$ hadoop fs -cat /user/hadoop/wxout02/part-r-00000
190101 44
190102 17
190103 50
190104 194
190105 256
190106 278
190107 317
190108 283
190109 211
190110 156
190111 89
190112 117
190201 33
190202 117
190203 44
190204 83
190205 211
190206 239
190207 244
190208 206
190209 183
190210 106
190211 94
190212 50
'Dev tips and tips' 카테고리의 다른 글
hadoop 완벽가이드 기상데이터 source 읽기 (0) | 2013.07.05 |
---|---|
Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient (0) | 2013.06.25 |
Write failed: Broken pipe (0) | 2013.06.20 |