wiki:TroubleShooting

Version 27 (modified by Bruno Cornec, 13 years ago) ( diff )

--

Trouble-Shooting mindi

Launch mindi using the verbose option of the shell:

bash -x /usr/sbin/mindi 2>&1 | tee /tmp/mindi.log

If you want to have those mindi traces from mondoarchive, then change the mindi script by adding at the begining :

set -x

Trouble-Shooting mondo

mondo basically consists of two C programs, mondoarchive and mondorestore. To trouble-shoot mondo therefore may mean to debug. This sounds scarier than it is - just read on. ;-)

Creating Backtraces

Backtraces can be very helpful when trouble-shooting issues like segmentation faults. To create a useful backtrace, you need gdb (the GNU Debugger) installed and an application (and possibly libraries) with debugging symbols built in. The following will explain how to do this.

gdb

gdb should be part of your distribution just use your favourite way to install the package, e.g.

apt-get install gdb

for Debian and friends (such as Ubuntu) or

yum install gdb

for Fedora/RedHat/Mandriva

mondoarchive/mondorescue with debugging symbols

To get mondoarchive and mondorescue with debugging symbols built in, you need to build from the source.

Get the latest stable mondo source package from ftp://ftp.mondorescue.org/src/, e.g. mondo-2.0.9.tar.gz, unpack:

tar xvzf mondo-2.0.9.tar.gz

enter into the new directory and build using make:

cd mondo-2.0.9
./configure --prefix=/usr
make

You will end up with binary in the following locations which are non-stripped, i.e. they contain debugging symbols:

file mondo/mondoarchive/mondoarchive
mondo/mondoarchive/mondoarchive: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.2.0, dynamically linked (uses shared libs), not stripped

and

file mondo/mondorestore/mondorestore
mondo/mondorestore/mondorestore: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.2.0, dynamically linked (uses shared libs), not stripped

Make backups of the original mondoarchive and mondorestore binaries and copy they newly created over the original ones.

On the ftp://ftp.mondorescue.org server, you'll also find rpm packages with the debug symbol in them that you could use alongside your normal packages to add debug support to your environment.

Trouble-Shooting mondoarchive

The best approach is to run the mondoarchive binary you just created with debugging symbols built in from its location in the built directory under gdb. So:

cd mondo/mondoarchive
gdb ./mondoarchive
set logging on    # Which will generate a gdb.txt output file
run <usual arguments you use>

Note: Running it from within its build directory makes it so that more valuable information about source lines will be available in the backtrace.

When the segmentation fault happens, enter:

bt

and send the output to the list.

Another possibility is to run valgrind mondoarchive [params] (if you have valgrind installed on your system) as it will give even more information on potential related memory issues.

Trouble-Shooting partition issue

With new Linux distributions harddisk partition mount points are stored in /etc/fstab as by-id e.g:

/dev/disk/by-id/scsi-SATA_SAMSUNG_HM120JIS09GJ30LB01772-part2   /       ext3    acl,user_xattr 1 1

When by-id is used mindi doesn't seem to see all partitions and is thus unable to resque the system afterwards.

The solution is to change the by-id to static mounts, for SLES10 and SLES10 this is described in:

http://www.novell.com/support/search.do?cmd=displayKC&docType=kc&externalId=3580082&sliceId=SAL_Public&dialogID=54562329&stateId=0%200%2054564189

Also please help fixing that issue by filling #406

Trouble-Shooting mondorestore

If you do partial restore onto a live system, the same approach as described for mondoarchive can be used.

However, more likely you will experience a segmentation fault during restore. To run a backtrace in that situation proceeed as follows:

First, you need a mondorestore binary with debugging symbols. This should already been taken care of if you copied the newly compiled binaries as described above. Next, you need to make sure that gdb is available on your restore media. To achieve this, add this to /etc/mindi/deplist.txt before doing a mondoarchive run:

gdb
libthread_db.so.1

Boot the restore media into expert mode. Then start mondorestore like this:

gdb /usr/sbin/mondorestore
set logging on    # Which will generate a gdb.txt output file
run

As described previously, once the segmentation fault happens, do:

bt

and send the output to the list. (If you can't get the backtrace copied as text, you can use a photo of the screen as the last resort.

Advanced Topics

Troubleshoot mondorestore on RHEL via valgrind in NFS recipe

If you encouter a crash of mondorestore during restoration, a way to help the dev team fix the issue is by reporting information on the crash using the debug environment.

We suppose that you're at the prompt after the crash. Prepare on your NFS server the needed conten for debugging the case:

First download both the normal and the debug mondo packages:

# cd /dir/exported/to/mondo
# wget ftp://ftp.mondorescue.org/test/rhel/5/mondo-2.2.9-0.20090729004531.rhel5.x86_64.rpm
# wget ftp://ftp.mondorescue.org/test/rhel/5/mondo-debuginfo-2.2.9-0.20090729004531.rhel5.x86_64.rpm

Then on the original platform make the backup the way you're used to using the downloaded mondo package (and mindi of course). On the same platform you also have to install the valgrind package. Then create a tar file containing the mondo debig info and valgrind content that you make available on your NFS server:

# mkdir tmp
# cd tmp
# rpm2cpio ../mondo-debuginfo-2.2.9-0.20090729004531.rhel5.x86_64.rpm | cpio -idum
# tar czf ../mondo.tgz .
# cd ..
# rm -rf tmp
# tar czf valgrind.tgz /usr/bin/valgrind /usr/lib*/valgrind

Then on the restored client, at the prompt you can extract the content, and use it:

# cd /
# tar xzf /tmp/isodir/valgrind.tgz
# tar xzf /tmp/isodir/mondo.tgz
# valgrind --log-file=/tmp/valg.log --show-reachable=yes --track-origins=yes --leak-check=full mondorestore -K 99 -Z interactive

Then send those files to the dev team with a picture of the crash:

/var/log/mondorestore.log
/tmp/valg.log

Attaching to Running Processes

You can attach to a running process using:

gdb /usr/sbin/mondorestore <pid>

where <pid> is the process ID.

This can be particulary useful when running mondoarchive with the '-g' or when running mondorestore.

Using libraries with debugging symbols

The libraries used by a binary can be determined using the ldd command, e.g.:

ldd /usr/sbin/mondoarchive
                libmondo.so.2 => /usr/lib/libmondo.so.2 (0xb7f8d000)
        libmondo-newt.so.1 => /usr/lib/libmondo-newt.so.1 (0xb7f82000)
        libnewt.so.0.51 => /usr/lib/libnewt.so.0.51 (0xb7f71000)
        libdl.so.2 => /lib/tls/libdl.so.2 (0xb7f6e000)
        libpthread.so.0 => /lib/tls/libpthread.so.0 (0xb7f5f000)
        libc.so.6 => /lib/tls/libc.so.6 (0xb7e2a000)
        libslang.so.1-UTF8 => /lib/libslang.so.1-UTF8 (0xb7db7000)
        libm.so.6 => /lib/tls/libm.so.6 (0xb7d94000)
        /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0xb7fea000)

some of those libraries may come with debugging symbols built in as an alternative package, others can be buiolt from scratch and installed with debugging sysmbols installed. Using your distribtuion's standard built process is probably a good idea for this.

Worthwhile gcc Flags

-Wextra -Wshadow -Wstack-protector -fstack-protector

Getting the entire kernel log on restore media

The kernel ring buffer that dmesg reads defaults to 32k on recent kernels. This is not enough to capture the entire sequence of kernel message when Mondo Rescue boots off a restore media.

To increase the kernel ring buffer to 128k at boot time (and without recompilation) add the following kernel boot parameter:

log_buf_len=128k

e.g.

export log_buf_len=128k

dmesg needs to be told what buffer size to use to ensure that everything is displayed from the start. The -s parameter can be used for this like this:

dmesg -s 131072 | less

Unable to boot restored server

When a restored server fails to boot to the grub menu (seen on SLES10) do: Boot from SLES10 DVD and choose the recovery option and let it boot. Mount your boot partition e.g.

mount /dev/sda1 /boot

Run

grub-install

and reboot.

Note: before using grub-install, check that URL.

Note from mindi NEWS file: "try standard grub-install in grub-MR restore script before trying anything fancy (Andree Leidenfrost)".

Note: See TracWiki for help on using the wiki.