29 June, 2009

Bacula Installation Notes

Bacula is especially hard to configure as there are many options. My backup plan was to
be able to automatically take backup from various hosts. These might be user's machines,
in which case, depending on the operating system their /home (for linux) or My Documents
(for windows hosts) would be taken. What made it especially hard was the need to take
server backups. The servers are hosts to many websites, as well as other lab services.
This created the need to take automatic (and consistent backups) of the web site and the
associated database. The solution I devised was a set of scripts that allow to
take LVM snapshots and then backup these snapshots.

I had to write a number of scripts so that this would be scalable to many hosts,
and also found extremely useful the script mylvmbackup... The bacula conf files are
a word in progress. Especially in an effort to automate the various processes. Here I just document (for my own sake) the input-output to the scripts I wrote. For the bacula terminology see at the end of this post.

Bacula Installation Notes

The first problem when installing bacula is that the new version (3.0.x) is not yet officially packaged (although there  exists an unofficial PPA package). It seems that the ubuntu server team will be preparing an official ppa package but nothing has been done yet. I decided to use the old version and upgrade in a few months as the new version becomes
available. (In fact, I tried the PPA version and it would not work.)

To enable ssl follow the steps below:

apt-get build-dep bacula
apt-get install build-essential libssl-dev fakeroot devscripts
apt-get source bacula
cd bacula-2.2.4
(edit debian/rules to add the openssl option)
dch -i -Djaunty
fakeroot dpkg-buildpackage

and then install the deb packages to commence the installation.

After answering the questions (creating a separate db user with access privileges to the bacula catalog).



Add here stuff about the pools and how to create them.....


Backup Websites (or other applications that have a file and a db part)

Step 1: Download mylvmbackup and mylvmbackup.conf
Edit and place them in the bacula scripts directory (diffs follow)

mylvmbackup
19,26d18
<
< #
< # Note I have edited two things here.
< # a. $configfile to point to the actual file. Due to a bug I could not pass it as an option
< # b. removed the default user from being the root (since the my.cnf will be used).
< # c. and of course, I edited the file mylvmbackup.conf
<
<
45c37
< my $configfile = "/etc/bacula/scripts/mylvmbackup.conf";
---
/> my $configfile = "/etc/mylvmbackup.conf";
116c108
< }
---
/> }
411c403
<   $user = '';
---
/>   $user = 'root';

mylvmbackup.conf

16c16
< user=
---
/> user=root
18c18
< host=localhost
---
/> host=
21c21
< mycnf=/etc/mysql/my.cnf
---
/> mycnf=/etc/my.cnf
27,28c27,28
< vgname=
< lvname=
---
/> vgname=mysql
/> lvname=data
30c30
< lvsize=10G
---
/> lvsize=5G
88c88
< skip_hooks=1
---
/> skip_hooks=0


Step 2: In the file director with the website insert commented out the following in the file deamon (client) configuration file:
# WebSite {
#  Name = "Joomla_Website"
#  dbuser ="joomuser"; dbpassword ="dbpasswd"
#  dbname "Joomla";dbdir = "/path to db";
#  dbvgname="dbvgname"; dblvname="database"; dbxfs=0;
#  webdir ="/path to website";
#  webvgname="dbwebname";weblvname="websites"; webxfs=0;
# }

With the following information:

REQUIRED
Name: Unique name to identify the database
dbuser: User name to access the database
dbpassword: Password to access the database
dbname: Name of the database (used for the dump in the non-lvm case)
dbdir:
      In the non lvm case, full path to dir where the temp sql dump will be placed.
      In the lvm case, the relative path (in the lv) where the db is located.

OPTIONAL, 
If the optional values are provided an LVM snapshot is used.

Database options
dbvgname: Name of the volume group where the database resides.
dblvname: The name of the logical volume where the database resides.
dbxfs: Set to 1 if the snapshot volume has the xfs filesystem.

Website Data Directory options
webdir: Directory where the data files reside
       In the non-lvm case, this should be the actual directory.
       In the lvm case, the relative path (in the lv) where the website is located
       If not specified no website backup will be taken.
webvgname: The name of the volume group where the data dir resides
weblvname: The logical volume name where the data dir resides.
webxfs: Set to 1 if the snapshot volume has the xfs filesystem.

Also in the scripts directory copy the scripts mylvmbackup (see note above),
backup_website  (and)
backup_website_awk

The awk script scans the file for configuration information and then the backup_website (sh)
script is doing the actual work. In particular, to invoke the script

backup_website _mode_of_operation_    jobname

mode_of_operation has three possible choices:  snapshot, release, filelist
jobname: Is the jobname as created by bacula.

Step 3: In the director I use the following





Terminology

1.
Glossary on data storage schemes
Volume: A Volume is a single physical tape (or possibly a single file) on which Bacula will write your backup data.
Pools: Pools group together Volumes so that a backup is not restricted to the length of a single Volume (tape).
Label:Before Bacula will read or write a Volume, the physical Volume must have a Bacula software label so that Bacula can be sure the correct Volume is mounted.
Console: The program that interfaces to the Director allowing the user or system administrator to control Bacula.

2. There are a number of deamons used to facilitate the operation:
Bacula-Director: The director is used to orchestrate all the backup operations
Bacula-SD (Storage Deamon): The storage demo is in charge of handling the storage devices
Bacula-FD (File Deamon) essentially this is the client software installed on the machine to be backed up.
Upon installation all these deamons require (a minimal) configuration by editing their configuration
files that reside on the /etc/bacula subdirectory.

3. Other utilities/interfaces of note:
Bconsole: Console utility that starts whenever a user logs onto the console.
Bsmtp: smtp utility used to send messages to the administrators
BootStrapRecord: Is the crucial information used to recover files in case of a catastrophic failure of the server itself.

4. Types of backups:
Full: A full backup
Differential: A backup that includes all files that have changed since the last full backup,
Incremental: A backup that includes all the files changed since the last Full, Differential, or Incremental backup started.

5. Bacula Jobs (Configuration Resource)
A configuration resource that defines work that Bacula must perform to backup a particular client. It consists of:
Type: Backup, restore, verify, etc
Level: Full, Incremental, Differential
Fileset: A Resource contained in a configuration file that defines the files to be backed up. It consists of a list
   included files or directories, a list of excluded files, and how the file is to be stored.
Storage:
Storage Device, Media Pool

6. Types of Resources
Jobs: See 5 above
Restore: Describes the process of recovering a file from backup media.
Schedule: Defines when a job will be scheduled for execution
Verify: Operation (Job) to verify restored data.
Scan: A scan operation causes the contents of a Volume or a series of Volumes to be scanned.

7. Other terminology and information repositories
Resource: Part of a configuration file that defines a specific unit of information that is available to bacula.
Bootstrap file: Is an ASCII file containing commands that allow Bacula to restore the contents of one or more volumes.
Catalog: The catalog stores summary information about Jobs, Clients, and Files that were backed up on a Volume. 
Retention Period: The most important are the File Retention Period, Job Retention Period, and the Volume Retention Period. Each of these retention periods applies to the time that specific records will be kept in the Catalog database.
  • This period is important for two reasons:the first is that as long as File records remain in the database, you
    can ”browse” the database with a console program and restore any individual file. Once the File records are removed or pruned from the database, the individual files of a backup job can no longer be ”browsed”. The second reason for carefully choosing the File Retention Period is because the volume of the database File records use the most storage space in the database. As a consequence, you must ensure that regular ”pruning” of the database file records is done to keep your
    database from growing too large.
  • The Job Retention Period is the length of time that Job records will be kept in the database. Note, all the File records are tied to the Job that saved those files. The File records can be purged leaving the Job records. In this case, information will be available about the jobs that ran, but not the details of the files that were backed up. Normally, when a Job record is purged, all its File records will also be purged.


No comments: