Dist. y Paralelos

miércoles, 16 de mayo de 2012

Configurando los Nodos Ganglia

Configuración si estas en la VPN
Archivos de configuración y manual

Configuración fuera de la VPN
Archivos de configuración y manual

Contiene todo lo necesario para instalar ganglia en los nodos del cluster

Ganglia

Para el monitoreo del cluster se ha elegido Ganglia por sus ventajas y porque está ampliamente testeado en muchos cluster que están en la actualidad funcionando.
Los componentes principales de Ganglia son:

gmond: demonio que se encarga de recoger y distribuir el estado del nodo. Debe correr en TODOS los nodos
gmetad: sirve de parser, obtiene los datos de los gmond y los procesa. Sólo corre en el nodo front-end, es decir, en el nodo con el que nos comunicamos
web front-end: es una interfaz web en php que nos muestra el estado del cluster de manera gráfica. Se instala en el nodo que tenga el gmetad. Está escrito en php4 y no se visualiza bien si utilizamos php5. Se puede obtener las últimas versiones de php4 para openSuSE en esta dirección http://download.opensuse.org/repositories/home:/michal-m:/php4/openSUSE_10.2/ (se puede añadir como canal RPM-MetaData de Smart)

INSTALACIÓN

sudo apt-get install rrdtool librrds-perl librrd2-dev php5-gd

 wget http://downloads.sourceforge.net/ganglia/ganglia-3.0.7.tar.gz

 cd ganglia*
./configure --with-gmetad
make

mkdir /var/www/ganglia
sudo cp -r web/* /var/www/ganglia

gmond se ha de instalar en TODAS las máquinas a monitorizar.
En la máquina front-end del cluster instalamos gmetad y el web-front-end. Para compilar el gmetad hay que añadir a la orden de configuración: ./configure –with-gmetad. El fichero de configuración de gmetad, gmetad.conf se puede encontrar en el directorio correspondiente dentro de las fuentes y se puede copiar a /etc. La instalación de este último es solo descomprimir las fuentes dentro del DocumentRoot del servidor web.
Opcionalmente, podemos instalar gexec, para correr la misma orden en varias o todas las máquinas del cluster simultáneamente.
CONFIGURACIÓN
gmond: el archivo de configuración de gmond es /etc/gmond.conf Para crearlo se puede usar gmond -t > /etc/gmond.conf que crea un archivo de configuración por defecto. Lo más importante de este archivo es:

Dentro de globals:
daemonize = yes # para que corra como demonio del sistema
gexec = yes # si tenemos instalado gexec
Dentro de cluster:
name = “nombre” # nombre del cluster, es mejor especificar alguno
udp_send_channel: para enviar la información del host. Todos los nodos de un mismo cluster han de tener la misma configuración aquí. Se puede enviar por multicast o unicast y se pueden poner todos los canales de envío que se deseen (varios unicast, varios multicast,…)
Multicast: utilizar una dirección IP multicast. Deberemos variarla y poner cada cluster con una IP multicast diferente. Ejemplo: 239.2.11.72, 239.2.11.73,… La que asignemos aquí también la tenemos que cambiar en udp_recv_channel. El puerto no es necesario modificarlo y es mejor dejarlo así. Ejemplo
udp_send_channel {

    mcast_join = 239.2.11.71

    port       = 8649

  }
Unicast: se manda la información al nodo principal directamente. Puede ser más útil en algunos casos que el multicast dé problemas.
udp_send_channel {

    host = 192.168.3.4

    port = 8649
    }
udp_recv_channel: para el nodo encargado de recibir toda la información y crear el fichero XML. Pueden tenerlo todos los nodos o sólo algunos. Generalmente, irá en el nodo cabecera (con gmetad y la interfaz web).

Multicast
  udp_recv_channel {

      mcast_join = 239.2.11.71

      bind       = 239.2.11.71

      port       = 8649

    }
Unicast
udp_recv_channel {

    port = 8666

    family = inet4 # o inet6

  }
Posteriormente, viene una lista con las métricas que serán recolectadas por gmond. Su configuración por defecto es correcta, aunque se puede modificar la frecuencia con la que se recolectan y añadir o borrar algunos.
Más información en: http://ganglia.sourceforge.net/docs/gmond.conf.html

gmetad: El principal aspecto a configurar son las fuentes de donde recopila la información. Es de la forma
data_source “nombre” [intervalo de sondeo] dirección1:puerto dirección2:puerto…
Puede haber tantas fuentes como se desee. Para ello, se añade una línea como la de arriba por fuente. Por ejemplo
data_source “Proteus” 192.168.2.4, 192.168.2.5
data_source “Servidor” 127.0.0.1
en la primera fuente, hay 2 ips redundantes, por si alguna de las máquinas se cae.
Hay que crear el directorio donde gmetad almacenará los datos de los distintos nodos, en formato de base de datos round-robin. Este directorio viene dado por el atributo rrd_rootdir del fichero de configuración gmetad.conf. Por defecto, es el directorio /var/lib/ganglia/rrds y a de pertenecer al usuario nobody.nobody
web front-end
Se puede elegir entre los templates disponibles.

Monitorización de la temperatura

Por defecto, Ganglia no monitoriza la temperatura pero podemos añadir esta opción utilizando la herramienta gmetric que incorpora. Esta herramienta permite añadir las métricas que se deseen desde la línea de comandos. Más información: http://ganglia.sourceforge.net/docs/ganglia.html#gmetric
Para medir la temperatura, usaremos el programa sensors. Lo primero de todo es configurarlo con sensors-detect. Este programa busca el tipo de sensores de temperatura, revoluciones de ventiladores, voltaje,… que tiene nuestro equipo y crea el fichero /etc/sysconfig/lm_sensors donde se indica que módulos cargar para manejarlos. Si los sensores no estuvieran soportados por el kernel, tendríamos que buscar las correspondientes fuentes y recompilar el kernel. También se crea el fichero /etc/init.d/lm_sensors para cargar estos módulos automáticamente al inicio del sistema.
Sólo nos falta enviar periódicamente la información de los sensores. Para ello, añadimos una entrada a cron del tipo:
*/5 * * * * root /admin/ganglia/temp.sh
donde temp.sh es un fichero que coge la información de sensors y utiliza gmetric para enviarla:
#!/bin/bash
GMETRIC=/usr/bin/gmetric
SENSORS=/usr/bin/sensors
# send cpu temps if gmond is running
`/sbin/service gmond status > /dev/null`
if [ $? -eq 0 ]; then
# send cpu temperatures
let count=0
for temp in `${SENSORS} | grep “CPU Temp” | cut -b 12-16`; do
$GMETRIC -t float -n “cpu${count}_temp” -u “C” -v $temp
let count+=1
done
# send cpu fan speed
let count=0
for fan in `${SENSORS} | grep CPU_Fan | cut -b 12-15`; do
$GMETRIC -t uint32 -n “cpu${count}_fan” -u “RPM” -v $fan
let count+=1
done
fi

miércoles, 9 de mayo de 2012

Reporte de Actividades

Esta semana se trabajo en tener la interfaz para el portal del Cluster se integro una web basada en SSH :

http://en.wikipedia.org/wiki/Web-based_SSH
http://liftoffsoftware.com/Products/GateOne
http://code.google.com/p/shellinabox/

También se hicieron pruebas sobre PVM

Se instalo y se corrieron ejemplos

(Esta en proceso un tutorial para la instalacion de PVM) Para compilar Codigo Esclavo Compilar

Nominaciones:

Juan Carlos http://jcecdps.blogspot.mx/2012/05/dps-class-benchmarks.html
Osvaldo : http://4imedio.blogspot.mx/2012/05/reporte-semana-14-paralelos.html

Parallel Processing Clusters & PVM

PVM (Parallel Virtual Machine) is a software package that permits a heterogeneous collection of Unix and/or Windows computers hooked together by a network to be used as a single large parallel computer. Thus large computational problems can be solved more cost effectively by using the aggregate power and memory of many computers. The software is very portable. The source, which is available free thru netlib, has been compiled on everything from laptops to CRAYs.

PVM enables users to exploit their existing computer hardware to solve much larger problems at minimal additional cost. Hundreds of sites around the world are using PVM to solve important scientific, industrial, and medical problems in addition to PVM's use as an educational tool to teach parallel programming. With tens of thousands of users, PVM has become the de facto standard for distributed computing world-wide.

freely available network clustering software (http://www.csm.ornl.gov/pvm/) that provides a scalable network for parallel processing. Developed at the Oak Ridge National Lab and similar in purpose to the Beowulf cluster, PVM supports applications written in Fortran and C/C++. In this article, I explain how to set up parallel processing clusters and present C++ applications that demonstrate multiple tasks executing in parallel.

Setting up PVM-based parallel processing clusters is straightforward and can be done with existing workstations that are also used for other purposes. There is no need to dedicate computers to the cluster; the only requirements are that the workstations must be on a network and use UNIX/Linux. PVM creates a single logical host from multiple workstations and uses message passing for task communication and synchronization

My motivation for setting up a parallel processing cluster was to provide a system that students could use for coursework and research projects in parallel processing. My specific goals were to set up a working cluster and demonstrate with test software that multiple tasks could execute in parallel using the cluster.

Why Use PVM?

Granted, there is other software—most notably Beowulf—for clustering workstations together for parallel processing. So why PVM? The main reasons I decided to use PVM were that it is freely available, requires no special hardware, is portable, and that many UNIX/Linux platforms are supported. The fact that I could use Linux workstations that were already available in our computer lab without dedicating the use of those machines to PVM was a major advantage for it.

Other important PVM features include:

A PVM cluster can be heterogeneous, combining workstations of different architectures. For example, Intel-based computers, Sun SPARC workstations, and Cray supercomputers could all be in the same cluster. Also, workstations from different types of networks could be combined into one cluster.

PVM is scalable. The cluster can become more robust and powerful by just adding additional workstations to the cluster.

PVM can be configured dynamically by using the PVM console utility or under program control using the PVM API. For example, workstations can be added or deleted while the cluster is operational.

PVM supports both the SPMD and MPMD parallel processing models. SPMD is single program/multiple data. With PVM, multiple copies of the same task can be spawned to execute on different sets of data. MPMD is multiple program/multiple data. With PVM, different tasks can be spawned to execute with their own set of data.

How PVM Works

A PVM background task is installed on each workstation in the cluster. The pvm daemon (pvmd) is used for interhost communication. Each pvmd communicates with the other pvm daemons via User Datagram Protocol (UDP). PVM tasks written using the PVM API communicate with pvmd via Transmission Control Protocol (TCP). Parallel-executing PVM tasks can also communicate with each other using TCP. Communication between tasks using UDP or TCP is commonly referred to as "communication using sockets"

The pvmd task also acts as a task scheduler for user-written PVM tasks using available workstations (hosts) in the cluster. In addition, each pvmd manages the list of tasks that are running on its host computer. When a parent task spawns a child task, the parent can specify which host computer the child task runs on, or the parent can defer to the PVM task scheduler which host computer is used.

A PVM console utility gives users access to the PVM cluster. Users can spawn new tasks, check the cluster configuration, and change the cluster using the PVM console utility. For example, a typical cluster change would be to add/delete a workstation to/from the cluster. Other console commands list all the current tasks that are running on the cluster. The halt command kills all pvm daemons running on the cluster. In short, haltessentially shuts the cluster down.

The PVM console utility can be started from any workstation in the cluster. For example, if workstations in the cluster are separated by some physical distance, access to the cluster may be from different locations. However, when the cluster is shut down, the first use of the PVM console utility restarts the PVM software on the cluster. The machine on which the first use of the console utility occurs is the "master host." The console utility starts the pvmd running on the master host, then starts pvmd running on all the other workstations in the cluster. The original pvmd (running on the master host) can stop or start the pvm daemon on the other machines in the cluster. All console output from PVM tasks is directed to the master host. Any machine in the cluster can be a master host. Once the cluster is started up, only one machine in the cluster is considered the master host.

http://www.csm.ornl.gov/pvm/

http://pvm-plus-plus.sourceforge.net/

http://www.parawiki.org/index.php/PVM

http://www.itec.uni-klu.ac.at/~harald/PVM/pvm_guide.html

http://en.wikipedia.org/wiki/Parallel_Virtual_Machine

Examples

Example2