29 June, 2009

Bacula Installation Notes

Bacula is especially hard to configure as there are many options. My backup plan was to
be able to automatically take backup from various hosts. These might be user's machines,
in which case, depending on the operating system their /home (for linux) or My Documents
(for windows hosts) would be taken. What made it especially hard was the need to take
server backups. The servers are hosts to many websites, as well as other lab services.
This created the need to take automatic (and consistent backups) of the web site and the
associated database. The solution I devised was a set of scripts that allow to
take LVM snapshots and then backup these snapshots.

I had to write a number of scripts so that this would be scalable to many hosts,
and also found extremely useful the script mylvmbackup... The bacula conf files are
a word in progress. Especially in an effort to automate the various processes. Here I just document (for my own sake) the input-output to the scripts I wrote. For the bacula terminology see at the end of this post.

Bacula Installation Notes

The first problem when installing bacula is that the new version (3.0.x) is not yet officially packaged (although there  exists an unofficial PPA package). It seems that the ubuntu server team will be preparing an official ppa package but nothing has been done yet. I decided to use the old version and upgrade in a few months as the new version becomes
available. (In fact, I tried the PPA version and it would not work.)

To enable ssl follow the steps below:

apt-get build-dep bacula
apt-get install build-essential libssl-dev fakeroot devscripts
apt-get source bacula
cd bacula-2.2.4
(edit debian/rules to add the openssl option)
dch -i -Djaunty
fakeroot dpkg-buildpackage

and then install the deb packages to commence the installation.

After answering the questions (creating a separate db user with access privileges to the bacula catalog).



Add here stuff about the pools and how to create them.....


Backup Websites (or other applications that have a file and a db part)

Step 1: Download mylvmbackup and mylvmbackup.conf
Edit and place them in the bacula scripts directory (diffs follow)

mylvmbackup
19,26d18
<
< #
< # Note I have edited two things here.
< # a. $configfile to point to the actual file. Due to a bug I could not pass it as an option
< # b. removed the default user from being the root (since the my.cnf will be used).
< # c. and of course, I edited the file mylvmbackup.conf
<
<
45c37
< my $configfile = "/etc/bacula/scripts/mylvmbackup.conf";
---
/> my $configfile = "/etc/mylvmbackup.conf";
116c108
< }
---
/> }
411c403
<   $user = '';
---
/>   $user = 'root';

mylvmbackup.conf

16c16
< user=
---
/> user=root
18c18
< host=localhost
---
/> host=
21c21
< mycnf=/etc/mysql/my.cnf
---
/> mycnf=/etc/my.cnf
27,28c27,28
< vgname=
< lvname=
---
/> vgname=mysql
/> lvname=data
30c30
< lvsize=10G
---
/> lvsize=5G
88c88
< skip_hooks=1
---
/> skip_hooks=0


Step 2: In the file director with the website insert commented out the following in the file deamon (client) configuration file:
# WebSite {
#  Name = "Joomla_Website"
#  dbuser ="joomuser"; dbpassword ="dbpasswd"
#  dbname "Joomla";dbdir = "/path to db";
#  dbvgname="dbvgname"; dblvname="database"; dbxfs=0;
#  webdir ="/path to website";
#  webvgname="dbwebname";weblvname="websites"; webxfs=0;
# }

With the following information:

REQUIRED
Name: Unique name to identify the database
dbuser: User name to access the database
dbpassword: Password to access the database
dbname: Name of the database (used for the dump in the non-lvm case)
dbdir:
      In the non lvm case, full path to dir where the temp sql dump will be placed.
      In the lvm case, the relative path (in the lv) where the db is located.

OPTIONAL, 
If the optional values are provided an LVM snapshot is used.

Database options
dbvgname: Name of the volume group where the database resides.
dblvname: The name of the logical volume where the database resides.
dbxfs: Set to 1 if the snapshot volume has the xfs filesystem.

Website Data Directory options
webdir: Directory where the data files reside
       In the non-lvm case, this should be the actual directory.
       In the lvm case, the relative path (in the lv) where the website is located
       If not specified no website backup will be taken.
webvgname: The name of the volume group where the data dir resides
weblvname: The logical volume name where the data dir resides.
webxfs: Set to 1 if the snapshot volume has the xfs filesystem.

Also in the scripts directory copy the scripts mylvmbackup (see note above),
backup_website  (and)
backup_website_awk

The awk script scans the file for configuration information and then the backup_website (sh)
script is doing the actual work. In particular, to invoke the script

backup_website _mode_of_operation_    jobname

mode_of_operation has three possible choices:  snapshot, release, filelist
jobname: Is the jobname as created by bacula.

Step 3: In the director I use the following





Terminology

1.
Glossary on data storage schemes
Volume: A Volume is a single physical tape (or possibly a single file) on which Bacula will write your backup data.
Pools: Pools group together Volumes so that a backup is not restricted to the length of a single Volume (tape).
Label:Before Bacula will read or write a Volume, the physical Volume must have a Bacula software label so that Bacula can be sure the correct Volume is mounted.
Console: The program that interfaces to the Director allowing the user or system administrator to control Bacula.

2. There are a number of deamons used to facilitate the operation:
Bacula-Director: The director is used to orchestrate all the backup operations
Bacula-SD (Storage Deamon): The storage demo is in charge of handling the storage devices
Bacula-FD (File Deamon) essentially this is the client software installed on the machine to be backed up.
Upon installation all these deamons require (a minimal) configuration by editing their configuration
files that reside on the /etc/bacula subdirectory.

3. Other utilities/interfaces of note:
Bconsole: Console utility that starts whenever a user logs onto the console.
Bsmtp: smtp utility used to send messages to the administrators
BootStrapRecord: Is the crucial information used to recover files in case of a catastrophic failure of the server itself.

4. Types of backups:
Full: A full backup
Differential: A backup that includes all files that have changed since the last full backup,
Incremental: A backup that includes all the files changed since the last Full, Differential, or Incremental backup started.

5. Bacula Jobs (Configuration Resource)
A configuration resource that defines work that Bacula must perform to backup a particular client. It consists of:
Type: Backup, restore, verify, etc
Level: Full, Incremental, Differential
Fileset: A Resource contained in a configuration file that defines the files to be backed up. It consists of a list
   included files or directories, a list of excluded files, and how the file is to be stored.
Storage:
Storage Device, Media Pool

6. Types of Resources
Jobs: See 5 above
Restore: Describes the process of recovering a file from backup media.
Schedule: Defines when a job will be scheduled for execution
Verify: Operation (Job) to verify restored data.
Scan: A scan operation causes the contents of a Volume or a series of Volumes to be scanned.

7. Other terminology and information repositories
Resource: Part of a configuration file that defines a specific unit of information that is available to bacula.
Bootstrap file: Is an ASCII file containing commands that allow Bacula to restore the contents of one or more volumes.
Catalog: The catalog stores summary information about Jobs, Clients, and Files that were backed up on a Volume. 
Retention Period: The most important are the File Retention Period, Job Retention Period, and the Volume Retention Period. Each of these retention periods applies to the time that specific records will be kept in the Catalog database.
  • This period is important for two reasons:the first is that as long as File records remain in the database, you
    can ”browse” the database with a console program and restore any individual file. Once the File records are removed or pruned from the database, the individual files of a backup job can no longer be ”browsed”. The second reason for carefully choosing the File Retention Period is because the volume of the database File records use the most storage space in the database. As a consequence, you must ensure that regular ”pruning” of the database file records is done to keep your
    database from growing too large.
  • The Job Retention Period is the length of time that Job records will be kept in the database. Note, all the File records are tied to the Job that saved those files. The File records can be purged leaving the Job records. In this case, information will be available about the jobs that ran, but not the details of the files that were backed up. Normally, when a Job record is purged, all its File records will also be purged.


28 June, 2009

RAID/LVM Notes

General Notes on the concept:

The following link provides a comprehensive description of the fundamental ideas behind LVM

IBM Tutorial

Following the (excellent) discussion above, LVM is an interesting solution because it offers the following possibilities:

  • In multiple disk installations, it offers the possibility of having filesystems larger than any of the disks
  • Add disks/partitions to your disk-pool and extend existing filesystems online
  • Replace two 80GB disks with one 160GB disk without the need to bring the system offline or manually move data between disks
  • Shrink filesystems and remove disks from the pool when their storage space is no longer necessary
  • Perform consistent backups using snapshots (more on this later in the article)
  • This as we see below is not that big of a deal as long as one has a thorough understanding of the concepts.
    All this flexibility comes at a small added complexity in the sense that one has to properly describe the abstraction using CLI commands.
    The LVM is structured in three elements:
    • Volumes: physical and logical volumes and volume groups
    • Extents: physical and logical extents
    • Device mapper: the Linux kernel module

    Volume

    Linux LVM is organized into:

    • physical volumes (PVs),

    • volume groups (VGs), and

    • logical volumes (LVs).

    Physical volumes are physical disks or physical disk partitions (as in /dev/hda or /dev/hdb1). A volume group is an aggregation of physical volumes. And a volume group can be logically partitioned into logical volumes.

    Figure 1: Physical-to-logical volume mapping

    Physical to logical volume mapping

    Extends

    In order to do the n-to-m, physical-to-logical volumes mapping, PVs and VGs must share a common quantum size for their basic blocks; these are called physical extents (PEs) and logical extents (LEs). Despite the n-physical to m-logical volume mapping, PEs and LEs always map 1-to-1. The
    following image illustrate this concept.

    Physical to logical extent mapping

    Different extent sizes means different VG granularity. For instance, if you choose an extent size of 4GB, you can only shrink/extend LVs in steps of 4GB. Of importance is also the extent allocation policy. LVM2 doesn't always allocate PEs contiguously; for more details, see the Linux man page on lvm. The system administrator can set different allocation policies, but that isn't normally necessary, since the default one (called the normal allocation policy) uses common-sense rules such as not placing parallel stripes on the same physical volume.

    Device Mapper
    When creating VGs and LVs, you can give them a meaningful name (as opposed to the previous examples where, for didactic purposes, the names VG0, LV0, and LV1 were used). It is the Device mapper's job to map these names correctly to the physical devices. Using the previous examples, the Device mapper would create the following device nodes in the /dev filesystem:
    • /dev/mapper/VG0-LV0
    with /dev/VG0/LV0 a link to the above.
    Note: Many distributions provide utilities to partition using LVM and/or RAID. RedHat has a very nice tool, but I will be using
    Ubuntu (since I very much prefer the apt-package management system). In Ubuntu, the alternate installation CD has
    partman and support for LVM/RAID... But this does not offer much flexibility in setting extent sizes, stripe sizes, etc. So I will be
    using the CLI to do much of the partitioning. Also note that LVM (and RAID) support must be included in the initrd for the
    system to be able to boot from an LVM volume. Ubuntu does this automatically, from version 9.04, what you need though is
    the server edition.


    References

    1. IBM Tutorial , Logical Volume Management
    2. LVM-HOWTO, LVM Howto


    23 June, 2009

    Περί ελέγχου του σκληρού δίσκου

    This is a post documenting efforts to recover data from a failed hard drive. The drive had a reiserfs filesystem and failed suddenly. I can't mount it or in any other way access my data, so I will be documenting here the investigations…

    Smartmon Tools:

    It is possible to use the smartmon tools to check the health of the hard drive…

    1. Check the health of the drive

      smartctl –H –d ata /dev/sda (if PASSED this is a good indication)

    2. One can do more elaborate tests

      smartctl -t short –d ata /dev/sda (or)

      smartctl -t long –d ata /dev/sda

          smartctl -l selftest –d ata /dev/sda (to display results)


     

    1. And can also display the following

      smartctl -a /dev/sda

      smartctl -A /dev/sda

    Gives read failures by going for short and extended periods offline. Not good.

    reiserfsk:

        reiserfsck –check /dev/sda

    This gives out a warning that there is some sort of hardware failure. (Will get back to this later)

    seatools:

    Now I moved it over to windows and tried tools offered by Seagate (it turns out that some of their drives are shipped with buggy firmware and this can cause an unexpected crash. The idea is to run their diagnostic tests and see if they pass. Tried with the Seagate web-site and it turns out that for my serial no firmware update is required. I run the updater utility (to update the firmware on my other drive) and it also updated the firmware in the messed up one as well. Some of the status messages changed but no change whatsoever on the drive accessibility. I do get errors with all their diagnostic tests (long/short dst, generic dst).

    PCB:

    After looking around a little it seems that for people having problems with their drives one way to fix them is to replace their PCB boards. This is probably not an option for me as it seems that this is necessary when the drives are destroyed by a power surge or some anomaly. In my case the drive works "perfectly" (i.e. rotates) and the filesystems are recognized.

    badblocks:

    It is now time to investigate into bad blocks and the potential of, at least, partially recovering some data.

    References

    1. Smartmon Tools
    2. Linux Journal Article
    3. Ubuntu Data Recovery