Hadoop 분산 클러스터링 설치하기

Hadoop 분산 클러스터링 설치하기

2011. 2. 24. 16:30ㆍ프로그래밍일반

Hadoop 분산 클러스터링을 설정해보자

* 사용환경

- ubuntu 10.4 64bit

- hadoop 0.20.2

- 주의: 설치 경로및 옵션은 master와 slave 모두 동일해야 한다.

1. /home/app 폴더에 hadoop를 설치한다.

/home/app/hadoop0.20.2

2. ssh 와 rsync 를 설치한다.

이미 설치되어 있으면 무시

$ sudo apt-get install ssh (slave와의 통신에 ssh 사용)

$ sudo apt-get install rsync (hadoop replication 에 사용)

3. /etc/hosts 에 다음과 같이 설정한다. (장비 IP에 맞게)

  # master와 slave 모두 동일해야 한다.
    (hostnamed에 - 같은것을 쓰지 마라. 인식할수 없다.)
            http://en.wikipedia.org/wiki/Hostname
         아래와 같은 형식을 추천한다.

- hadoop.master : 192.168.0.203    => hadoop 마스터 서버의 위치
- hadoop.slave 192.168.0.202 => hadoop 슬레이브 서버의 위치

3. hadoop/conf 의 아래 3개의 xml을 아래와 같이 수정

Master(hadoop.master)

* conf/core-site.xml

<name>fs.default.name</name>

<value>hdfs://hadoop.master:9100</value>

</property>

* conf/hdfs-site.xml

<name>dfs.replication</name>

</property>

* conf/mapred-site.xml

<name>mapred.job.tracker</name>

<value>hadoop.slave:9101</value>

</property>

Slave(hadoop_slave)

: 기본적으로 master에 설정한 내용과 동일해야한다.
* conf/core-site.xml

<name>fs.default.name</name>

<value>hadoop.master:9100</value> (Master의 위치를 가리켜야 한다.)

</property>

* conf/hdfs-site.xml

<name>dfs.replication</name>

</property>

* conf/mapred-site.xml

<name>mapred.job.tracker</name>

<value>hadoop.slave:9101</value>

</property>

4. 인증 코드 발급

# ssh-keygen -t rsa

Generating public/private rsa key pair.

Enter file in which to save the key (/root/.ssh/id_rsa):

/root/.ssh/id_rsa already exists.

Overwrite (y/n)? y

Enter passphrase (empty for no passphrase):

Enter same passphrase again:

Your identification has been saved in /root/.ssh/id_rsa.

Your public key has been saved in /root/.ssh/id_rsa.pub.

The key fingerprint is:

a5:f9:0d:96:77:57:8d:0c:c4:70:0f:19:5a:f2:d0:3e root@ubuntu

============================================

# cd /root/.ssh # cp id_rsa.pdu authorized_keys

이제 만들어진 인증키를 모든 data node에 복사하도록 한다.

(즉, master에서 생성한 auth key를 slave 서버 쪽에 복사시킨다.)

# scp /root/.ssh/authorized_keys root@devcluster02:/nutch/home/.ssh/authorized_keys

7. namenode를 Format 한다.

: 반드시 Master 서버에서만 실행한다.(자동으로 Slave 서버쪽도 구동되니 명심)

 bin/hadoop namenode -format

# 기존의 데이터는 삭제 안된다.(format에 혼동되지 말자)

8. Master서버의 hadoop 을 구동한다.

(Master를 구동하면 slave도 자동으로 구동되니 명심)

start-all.sh

Slave에서 자동으로 구동되는 프로세스(jps로 확인가능하다.)

- TaskTracker

- DataNode
* Master에서 시작후에 slave 서버의 data/dfs/current 폴더를 보면 Master에 있는 데이터들이 스스로 복사되는 과정을 볼수 있다.

9. 기타

* Port 오픈

9101, 9000, 9001, 50010, 50020, 50030, 50060, 50070, 50075, 50090, 50470, 50475

를 오픈한다.

ex>

iptables -A INPUT -p tcp -j ACCEPT (모든 tcp open)

iptables -A INPUT -p tcp --dport 9000 ACCEPT (9000번포트 오픈)

10. 웹으로 확인 (Master주소로 연결)

http://192.168.0.203:50070 => Namenode 확인

http://192.168.0.203:50030 => JobTracker 확인

TED

TED

최근글

공지사항

아카이브

관련글

티스토리툴바