17 posts

Cloudera Hadoop RHEL/CentOS 6 Install Guide

This guide contains everything you need to get a basic Hadoop cluster up and running. It is intended as a condensed and easy-to-understand supplement to the official documentation, so lengthy descriptions are omitted. For the full documentation, see Cloudera’s Install Guide.

Whether you want to start with a basic two-node cluster, or add hundreds or even thousands of nodes, the concepts here apply. Adding nodes can be done at any time without interrupting the cluster’s workflow, so as long as you have two machines, you’re ready to begin installing.

Read More →


Performance tuning: Intel 10-gigabit NIC

By default, Linux networking is configured for best reliability, not performance. With a 10GbE adapter, this is especially apparent. The kernel’s send/receive buffers, TCP memory allocations, and packet backlog are much too small for optimal performance. This is where a little testing & tuning can give your NIC a big boost.

There are three performance-tuning changes you can make, as listed in the Intel ixgb driver documentation. Here they are in order of greatest impact:

  1. Enabling jumbo frames on your local host(s) and switch.
  2. Using sysctl to tune kernel settings.
  3. Using setpci to tune PCI settings for the adapter.

Read More →


Automating GIS Metadata Conversion

Any sysadmin working in an earth-science organization may find themselves learning the skills of the programmers around them in order to help out with everyday tasks. This is one such example, where automating conversion of the XML metadata proved helpful to GIS analysts and sysadmins alike.

FGDC to ISO-19115 Conversion on Import (GeoNetwork)

  1. Download the csdgm2iso19115-2.xslt stylesheet from the FGDC website.
  2. Replace root element “gmi:MI_Metadata” with “gmd:MD_Metadata”
    ( this is the root element: <xsl:template match=”/”>
    so right beneath that, place <gmd:md_metadata> )
  3. Completely remove the gmi namespace to allow compatibility with GeoNetwork
  4. Under this element, <xsl:stylesheet>, add the following so Saxon can use the necessary types.
    xmlns:saxon=”” saxon:allow-all-built-in-types=”yes”
  5. Save the file, then copy it to /usr/local/geonetwork/web/geonetwork/xsl/conversion/import/FGDC_to_ISO19115.xsl

This will allow you to select FGDC_to_ISO19115 from the Stylesheet drop-down menu in your GeoNetwork web interface when you import files. When this is selected, GeoNetwork can import directories of FGDC metadata and automatically convert them to ISO19115.

For the lazy, you can find the above file, FGDC_to_ISO19115.xsl, pre-edited and ready for use with GeoNetwork.

Read More →


Increasing mobility and availability of laptop data with DRBD

I tried this out recently as an experiment and fun home project. I wanted to be able to seamlessly switch between my mobile workstation that my employer had given me, and my desktop workstation that I preferred to use at home.

I was able to create an exact replica of my laptop data, its environment, account- and application-specific settings that would sync automatically with my desktop workstation when connected to my home network.

This was really useful for me, since I could just close my laptop and be completely set up with my entire work environment when I sat down at my desktop. On top of that, it provided a backup copy of all my latest work, so that if one machine were to fail, I’d suffer no downtime or data loss. It tickled my cluster-admin side to see yet another single point of failure eliminated.

Hence, this is purely for fun and not recommended for everyone…

Read More →


Getting more from yum with RPMforge and EPEL Repos

RPMforge and EPEL repos can give your CentOS or RHEL install the extra functionality you need. Providing useful packages like git, clusterssh, htop, and thousands more, these are a must-have for anyone who could use some extra packages in yum.

When adding extra repositories, it’s always a good idea to put some protections in place. The plugin ‘yum-protectbase’ can help avoid potential package conflicts that arise when adding 3rd-party repositories.

yum -y install yum-protectbase

Download, install, and import keys from RPMforge & EPEL. (CentOS/RHEL 6, 64-bit)

rpm --import # import keys
rpm -K rpmforge-release-0.5.2-2.el6.rf.*.rpm # verify package integrity
rpm -i rpmforge-release-0.5.2-2.el6.rf.*.rpm # install
# install EPEL
rpm -Uvh

Or for CentOS/RHEL 5…

rpm --import
rpm -K rpmforge-release-0.5.2-2.el5.rf.*.rpm
rpm -i rpmforge-release-0.5.2-2.el5.rf.*.rpm
rpm -Uvh

Then, go through your /etc/yum.repos.d/ directory and edit the repo config files, adding protect=1 to every section of CentOS-Base.repo, and protect=0 to every section of EPEL.repo. The RPMforge repo should already contain the necessary protect=0 statements.

# example
name=Extra Packages for Enterprise Linux 6 - $basearch
mirrorlist=$basearch failovermethod=priority

Your system can now safely use RPMforge and EPEL.


Notes on Pacemaker Resource Types and Constraints

The keyword for a standard resource is “primitive”.
You can add a primitive by entering the “configure” sub-menu of the CRM shell.

   crm configure

    crm(live)configure# primitive ClusterIP \
    ocf:heartbeat:IPaddr2 params ip=”″ cidr_netmask=”24″

Available Resource Agent (RA) Classes:
– heartbeat
– lsb
– ocf
– stonith

How to specify classes at the CRM shell:

crm(live)configure# primitive <some_name> <class>:<provider>:<script> <params>

What operations (op) exist for primitive resources?
– monitor
– stop
– start

Resource types:
* primitive
* master/slave
* colocation
* group

Syntax, by resource type:

   crm configure primitive <class>:<provider>:<script>
   crm configure ms <class>:<provider>:<script>
   crm configure colocation <resource1> <resource2>
   crm configure group <resource1> <resource2> …

* colocation
* group

Colocation is a location constraint that tells Pacemaker which resources to run together. It can specify any number of resources, and either an Inf: or -Inf: to specify whether these should be always run at the same location, or never at the same location.

The Inf:  means “infinity points”, when referring to how many points something has. This is how Pacemaker weighs its decisions to move something. Something with infinity points is always done.

Groups are a simplified constraint to keep multiple resources running together. You can tell a group to start in a particular location, before or after some other service is already running. Grouping resources makes them easier to manage.