Pages

Thursday, April 12, 2012

Building a Beowulf cluster with Ubuntu 11.04 Server

This lesson is going to document how to build a small Beowulf cluster, something I have wanted to do for quite some time. A Beowulf cluster is a cluster of identical (or nearly identical)computers linked together in such a way as to permit parallel processing.

When I say nearly identical, I mean each computer should be roughly comparable in terms of speed and memory capacity. This will help ensure consistent performance across the cluster. The one exception is the master computer. If you have one computer which is better than the others, it should be your master node, as this will where where you compile and run programs from. It will also be th only one to need a keyboard and monitor permanently attached once everything is up and running.

I have a 1.3 Ghz AMD with 20 GB hard drive and 256 MB RAM with shared video memory as my slave node and a 1.6 Ghz P4 with 80 GB hard drive 256 MB Ram and a 32 MB dedicated video card as the master.

Since this is Advanced Linux, I am assuming that you know something of Ubuntu and/or Linux in general and are comfortable navigating it, installing software, compiling from source code, and editing files. Also, before editing system files, namely those under the /etc directory that we will be working with, always back them up before making changes.

I am using Ubuntu Server 11.04 and the MPICH2 clustering software. Right now, I only have two PCs but keep in mind the more machines you have, the more powerful your cluster will be. I will assume two machines designated as master_node and slave_node with a common user mpi. Replace with your own system names as appropriate.

Now, the actual setup:

The first step was to do a fresh install of Ubuntu Server 11.04 on two blank machines. As part of the install process, you are always prompted to create at least one user for the system, and this user will have administrative access via the sudo command. The root account is locked out by default, so everything will be done with this default user account. Keep in mind that Ubuntu Server is a command line only system, so all this will be done from a text prompt. You also need to be sure you have identical user accounts on all machines for this to work.

I am using NFS to provide a common directory to all machines, and it also means I only need to configure and install the MPICH2 software in one location. I am also using a router with DHCP to configure the network interfaces. If you use a switch or crossover cables. you will probably need to configure network interfaces manually.

There are some things that need to be set up on each machine.

Open the /etc/hosts file of each machine and add the ip address and hostname of every machine.
Also take out the line which associates 127.0.0.1 with the machines hostname. Leave the one which says 127.0.0.1 localhost. This ensures that the loopback address only refers to the local machine.

Ex.:

127.0.0.1 localhost
192.168.1.105 master_node
192.168.1.103 slave_node

There may be other lines in this file, but these must be there for the cluster to function. Replace these ip addresses with your own.

Also modify the /etc/hosts.allow file and add the line ALL : ALL

With those steps out of the way, we can move on.

To configure the machines that will act as slave nodes:

Install package nfs-common
Install package openssh-server


If these were already installed, Ubuntu will simply tell you they are already the newest version.
Running the install commands just to be sure doesn't hurt anything.

Once nfs and the ssh server are installed, configuration of the slave nodes can proceed.

The first thing to do is set up an nfs mount in the /etc/fstab file. I am using the home directory of the default user on my main node as an nfs share, which is one reason why user accounts need to be identical across all machines. This way, every machine has the exact same home directory.

Change to the /etc directory and modify the fstab file.

You want to append a line like this to the end of the file for the nfs directory to be mounted

master_node:/home/mpi /home/mpi nfs user,exec


That sets up the slave nodes to receive the directory exported by the master node.

Change back to your home directory.

Now set up a public key for passwordless ssh logins. The reason for passwordless ssh is to be able to use all the nodes without having to login to each one every time you run MPICH2. The running of the cluster should be automatic across all nodes.

ssh-keygen -t dsa will generate a private/public key pair

When prompted for a passphrase, hit enter to leave it blank.

change to the .ssh directory and do the following:

cat id_dsa.pub >> authorized_keys

Now configure openssh by modifying the /etc/ssh/sshd_config file.

Change to the /etc/ssh directory and open the sshd_config file
Make the following changes to these lines:

RSAAuthentication yes
PubKeyAuthentication yes

Uncomment this line:
AuthorizedKeyfile %h/.ssh/authorized_keys

Uncomment this line and change yes to no:
PasswordAuthentication yes

Set the UsePAM line to no
Also set StrictModes to no

issue sudo /etc/init.d/ssh restart to restart the ssh server.

If you do this on each machine you intend to use as a slave node, they should be all set.


Configuring the master node.

Install the NFS server on the master node and configure the export:

Install package nfs-kernel-server


add this line to the /etc/exports file:

/home/mpi *(rw,insecure,sync)

sudo exportfs -r to export the directory to all slave nodes

Installing MPICH2

Make sure you install the package build-essential on the main node, otherwise you will have no build tools or compilers.

Download MPICH2

http://www.mcs.anl.gov/research/projects/mpich2/downloads/index.php?s=downloads

I used stable version 1.4.1p1

unpack the tar file.

There are actually two different ways to build MPICH2. The typical ./configure,make,sudo make install works just fine, but MPICH2 docs recommend this:

./configure 2>1& | tee c.txt
make 2>1& | tee m.txt
make install 2>1& | tee mi.txt

It makes no difference to the software ,but will if you have trouble building MPICH2 and seek support from the main site. Runnnig the command this way generates the files c.txt, m.txt, and mi.txt, which they will expect you to have in order to determine the problem.

Make a new directory in your home directory for the mpich2 install. I used mpich2

change into the mpich2-1.4.1p1 directory and run ./configure with the options you need.

For my system the command was

./configure --prefix=/home/mpi/mpich2 --disable-f77 --disable-fc

The fc and f77 options were used since I don't have Fortran compilers installed. The prefix tells install where to place the program files. Other options can be found by viewing the mpich2 README file.


Once the configuration is done, simply do make followed by sudo make install, or use the alternate above if you wish.

Everything should now be in place.

The last steps to setting everything up are to put the mpich2 folder on the path so that it can be found by the system

export PATH=/home/mpi/mpich2/bin:$PATH
export PATH
LD_LIBRARY_PATH="/home/mpi/mpich2/lib:$LD_LIBRARY_PATH"
export LD_LIBRARY_PATH

sudo echo /home/mpi/mpich2/bin >> /etc/environment

Everything should now be installed and ready to go.

To test this, use the following commands:

which mpirun
which mpiexec

Very last thing to do is set up a hosts file for the cluster in the user directory.

The file should be named hosts and should be set up as follows;

One line for each machine in the network listed by hostname

Ex:

master_node
slave_node

To test the cluster, run the example program cpi

mpiexec -f hosts -n 2 ./mpich2-1.4.1p1/examples/cpi

-f hosts tells mpich2 which host file and thus which machines to use. -n is the number of processes to run. This is usually equal to the number of machines available, but doesn't have to be.

If all has gone well, there should be a listing of each process and where it ran followed by the program output.

If not, let me know and I can help you work it out.

These steps should work as is on any recent version of Ubuntu and probably most other Debian based distributions. Other distributions will differ in some details, but I can provide advice for many distributions.

Wednesday, April 13, 2011

Using NFS with iptables

Having recently learned how to solve this problem myself, I thought I would share it. I recently switched from Ubuntu Desktop and Ubuntu Server to CentOS 5.5 on both my Linux machines, and I was having some trouble getting the NFS share to mount past the servers firewall. Turns out this is due to some extra configuring needed on the server side. This works on CentOS and should work on any other distribution based on Red Hat Enterprise Linux, as well as Red Hat itself and probably Fedora.

Some people recommend simply turning the firewall off on the server, but this is a bad idea and really not the proper way to go about it as far as I am concerned, especially if maintaining security is vital to the systems in question.

So, here is what you should do. You will need to be working with root access to do this.

First, you need to modify the /etc/sysconfig/nfs file so that the required services will use fixed ports rather than dynamically assigned ports, as dynamic ports cannot be protected by port filtering firewalls such as iptables. Assigning dynamic ports is the default behavior of the portmapper, so this must be changed. Add something similar to the numbers below to the end of the etc/sysconfig/nfs file. These numbers are taken from my own CentOS 5.5 nfs server, and work just fine.

LOCKD_TCPPORT=32803
LOCKD_UDPPORT=32769
MOUNTD_PORT=892
STATD_PORT=662

This establishes fixed ports for these services.

Now you will need to restart the portmap and nfs services with the following commands:

service portmap restart
service nfs restart



Add the following lines to the /etc/sysconfig/iptables file, ensuring that they appear before the final LOG and DROP lines for the RH-Firewall-1-INPUT chain. Also, replace the IP address portion of these statements with addresses that apply to your network.

-A RH-Firewall-1-INPUT -s 192.168.1.0/24 -m state --state NEW -p udp --dport 111 -j ACCEPT
-A RH-Firewall-1-INPUT -s 192.168.1.0/24 -m state --state NEW -p tcp --dport 111 -j ACCEPT
-A RH-Firewall-1-INPUT -s 192.168.1.0/24 -m state --state NEW -p tcp --dport 2049 -j ACCEPT
-A RH-Firewall-1-INPUT -s 192.168.1.0/24 -m state --state NEW -p tcp --dport 32803 -j ACCEPT
-A RH-Firewall-1-INPUT -s 192.168.1.0/24 -m state --state NEW -p udp --dport 32769 -j ACCEPT
-A RH-Firewall-1-INPUT -s 192.168.1.0/24 -m state --state NEW -p tcp --dport 892 -j ACCEPT
-A RH-Firewall-1-INPUT -s 192.168.1.0/24 -m state --state NEW -p udp --dport 892 -j ACCEPT
-A RH-Firewall-1-INPUT -s 192.168.1.0/24 -m state --state NEW -p tcp --dport 662 -j ACCEPT
-A RH-Firewall-1-INPUT -s 192.168.1.0/24 -m state --state NEW -p udp --dport 662 -j ACCEPT

If you look closely, you will see that what we are doing is configuring the iptables firewall to accept connections from the services previously configured.

Finally, restart the iptables service:

service iptables restart

Assuming your NFS shares are configured properly, you should now have access to them.

Thursday, April 7, 2011

Designing a hard disk layout.

Designing a hard disk layout is a good way to tailor a system to your particular needs. Linux allows for the creation of multiple partitions, and each partition can be used as a different mount point for various directories of the file system.

Most default Linux installs use a generic partition scheme which usually creates two partitions, one for the Linux filesystem and one used as a swap space, similar to virtual memory under Windows. This is generally sufficient for most users, but better control and system performance can be gained with a well designed layout.

Here is a brief overview of the movable Linux directories and what they contain:

/boot: contains the systems critical boot files.
/home: contains the data files and home directories for each user on the system.
/mnt: used as a mount point for removable media
/media: similar to /mnt
/opt: Contains Linux files and programs associated with third party software.
/tmp: Contains temporary files created by ordinary users.
/usr: Contains most Linux program and data files.
/usr/local: contains programs and files unique to particular installation.
/var: holds files associated with the day to day functioning of the computer.
The other directories(/etc,/bin,/sbin,/lib, and /dev) should never be placed on separate partitions as they are critical to the functioning of the base system and should reside under the main Linux partition.

The best time to set up partitions is when doing a fresh installation or when re-formating and re-installing. While it may be possible to re-work an installed system from a live cd or rescue disk, I highly doubt this would be a good idea.

With all that out of the way, let's look at some possible layouts.

For a basic home system, making a /boot partition, a /(root) partition, a /home partition, and a swap partition. The /boot partition can be ext2 while the others should be ext3, ext4, or some other type of journaling filesystem. Having a separate /home partition allows you to re-install the rest of the operating system without having to reformat and lose the information on the
/home partition. When re-installing, simply format /boot and / and leave /home alone.

As for sizes, /boot should be 50-100 MB, the swap partition is generally 1.5 to 2 times system RAM, and the / and /home partitions should each be half of the remaining space. This arrangement should give a Desktop system good performance and reliability, with plenty of space for both user files and software installations/updates.

For servers, a similar arrangement can be used, but /var should also be a separate partition as it is often used by various server programs. Also, on a server /home likely does not need to be as large as the / or /var partitions since servers generally don't make large use of /home.

Keep in mind that these are guidelines, and a starting point for your options. Also, if you plan to create more than four partitions, you will have to use logical partitions for some or all of them as you are only allowed four primary partitions on a hard disk.

Welcome

This blog is for more advanced topics concerning the use and configuration of the Linux operating system. Unlike my basic lessons blog, this one assumes that you already have some knowledge of Linux and computers in general. You do not need to be a Linux guru or computer expert, but you should understand computers and operating systems well enough to be comfortable making changes and experimenting.